Tag Archives: Amazon OpenSearch Service

Zeta reduces banking incident response time by 80% with Amazon OpenSearch Service observability

2025-08-21 Deepesh Dhapola

Post Syndicated from Deepesh Dhapola original https://aws.amazon.com/blogs/big-data/zeta-reduces-banking-incident-response-time-by-80-with-amazon-opensearch-service-observability/

This is a guest post co-written with Shashidhar Soppin, Manochandra Menni and Anchal Kansal from Zeta.

Zeta is a core banking technology provider that enables banks to rapidly launch extensible banking assets and liability products. Zeta’s primary products are Olympus and Tachyon. Olympus is a platform as a service (PaaS) that simplifies building and operating cloud-native, secure and distributed multi-tenant software as a service (SaaS) products. It blends infrastructure as code and GitOps methodologies for efficient and consistent deployment of SaaS products. Its architecture prioritizes strong tenant isolation, real-time event processing, and comprehensive observability, supporting robust API integrations and seamless deployment. Zeta’s Tachyon is a full-stack, cloud-native, API-first digital-banking SaaS service delivered via Olympus. The banking services of Tachyon include payment engines (for UPI, credit, debit, and prepaid cards), savings & checking account management, etc. Tachyon is a modern debit processing product with personal finance management and card controls. It is designed to increase usage, upsell credit, reduce fraud, and improve customer satisfaction. The Tachyon product offers comprehensive provisioning, payments, and account management APIs and SDKs, enabling seamless integration of financial products into third-party apps without compromising privacy and security. Zeta operates Tachyon as a multi-tenant SaaS product, serving customers who are configured as individual tenants within the system. Zeta’s technology stack is monitored by their Customer Service Navigator product (CSN), which is part of Olympus.

As a global SaaS provider, Zeta needed a solution capable of monitoring tenants, measuring SLAs, meeting local regulatory requirements, and scaling efficiently with both new tenant onboarding and seasonal usage spikes. Zeta sought a cost-effective, scalable system that would provide a unified “single pane of glass” to monitor the application services, cloud infrastructure, open-source components, and third-party products.

Zeta faced a formidable challenge in orchestrating a cohesive monitoring system across a rapidly expanding multi-tenant environment, diverse domains, and numerous tools. As more tenants joined their system, the complexity grew exponentially, making Zeta’s monitoring solution increasingly difficult to maintain. The primary challenge stemmed from fragmented monitoring tools that made it difficult to quickly identify root causes across interconnected systems, leading to prolonged troubleshooting times and potential service degradation. When users reported issues, such as credit card payment problems, Site Reliability Engineering (SRE) team had to navigate through a several disparate monitoring tools and siloed data, and the lack of integrated observability resulted in time-consuming manual correlation efforts. This multi-tenant, multi-solution landscape significantly complicated the ability to maintain consistent monitoring standards and service levels. The challenge was further complicated by the complex regulatory landscape, where global expansion required adherence to diverse local regulations, necessitating a flexible architecture capable of accommodating varying data retention policies and access controls across different jurisdictions. Each new tenant addition multiplied the complexity of balancing the monitoring needs of internal SRE teams and customers, requiring sophisticated data segregation and access management. Additionally, Zeta required comprehensive anomaly detection capabilities across systems, components, infrastructure, and operations, requiring a solution that could scale dynamically while establishing dynamic baselines and identifying subtle patterns that might indicate emerging issues. As the tenant base continued to grow, the need for a unified, scalable monitoring solution that could streamline these processes, enhance operational visibility, and maintain system integrity became critical.

Zeta’s goal was to streamline their processes and enhance operational visibility across the entire technology landscape. By addressing these challenges, Zeta aimed to create a unified observability solution that would significantly improve incident response times, enhance regulatory compliance posture, and ultimately deliver a more reliable and performant service to their global customer base.

In this post we explain how Zeta built a more unified monitoring solution using Amazon OpenSearch Service that improved performance, reduced manual processes, and increased end-user satisfaction. Zeta has achieved over an 80% reduction in mean time to resolution (MTTR), with incident response times decreasing from 30+ minutes to under 5 minutes.

Solution overview

Zeta designed and built an observability system, CSN, to deliver comprehensive visibility across the service environment. CSN is part of the Olympus suite of products. CSN serves as the primary interface for the SRE team, offering real-time service health dashboards, infrastructure monitoring, SLA performance analytics, and an admin panel for user management. The system is equipped with single sign-on (SSO) integration and enforces role-based access control (RBAC) to enable secure, granular access. With CSN, SREs can efficiently monitor system health, receive actionable alerts and warnings, and manage operational workflows across critical services.

CSN is powered by OpenSearch Service to provide an integrated solution for DevOps and Site Reliability Engineers to help identify critical events and issues. Zeta chose OpenSearch Service because it offers a fully managed, open-source search analytics engine that scales effortlessly to handle the increasing number of tenants, associated data growth, and analytics needs. It’s seamless integration with AWS services, robust security features, and support for real-time data ingestion and querying make it ideal for powering the CSN dashboards and analytics workloads. The following diagram illustrates the CSN deployment architecture.

The OpenSearch Service domain uses the Multi-AZ with Standby deployment model, following AWS best practices for high availability and fault tolerance. Nodes—including dedicated cluster manager nodes, data nodes, and UltraWarm nodes—are distributed evenly across three Availability Zones in the same AWS Region. Availability Zones 1 and 2 handle active indexing and search traffic, and Availability Zone 3 contains standby nodes that remain passive during normal operations. If an Availability Zone failure occurs, OpenSearch Service automatically promotes standby nodes to active status, maintaining cluster operations with minimal disruption and no need for data redistribution.

The OpenSearch cluster consists of three dedicated cluster manager nodes and a multiple-of-three data node count to maintain quorum and balanced shard allocation. Each index uses at least two replicas, providing redundant copies of data across the Availability Zones. This Multi-AZ with Standby configuration delivers high resilience and rapid failover, supporting continuous service availability and robust disaster recovery for the observability workloads.

Data collection and ingestion

The observability strategy centers on a data collection and ingestion pipeline designed to handle the complexity and scale. The architecture, as shown in the following diagram, addresses three critical data types: AWS resource logs, application logs, and distributed traces, with each data type using tailored collection and processing methods optimized for the workloads.

AWS resource logs collection

The infrastructure spans multiple AWS services including Amazon Elastic Kubernetes Service(Amazon EKS), Amazon Relational Database Service(Amazon RDS), Amazon Redshift, Application Load Balancer, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Elastic Compute Cloud (Amazon EC2) and more. Zeta uses Amazon CloudWatch Logs as the primary collection point for AWS service logs, which provides native integration with these services.

AWS services send their logs directly to CloudWatch Logs, which are then pulled by Fluentd running on the Amazon EKS cluster for centralized processing. This approach natively captures operational data from the AWS resources, including:

Database operational logs and audit trails from Amazon RDS instances
Data warehouse query execution logs from Amazon Redshift
Application Load Balancer access logs capturing traffic patterns and performance metrics
Kafka cluster operational logs from Amazon MSK
AWS API invocation audit trails from AWS CloudTrail
Container runtime and operating system logs from Amazon EC2
During the log collection, personally identifiable information (PII) is filtered out. The solution adheres strictly to PCI-DSS guidelines throughout this process.

Zeta used Amazon MSK as a scalable and reliable backbone for collecting and streaming logs from various sources across the AWS resources. Logs are ingested into Amazon MSK, providing a durable and fault-tolerant buffer that decouples log producers from consumers. This architecture enables real-time log streaming and supports advanced processing pipelines before the logs are routed to the OpenSearch Service. By integrating Amazon MSK into the logging workflow, scalability, resilience, and flexibility is improved, so that high log volumes are efficiently managed without impacting downstream systems. This approach, combined with native AWS integrations, minimizes operational complexity and maintains comprehensive, centralized log visibility across the cloud environment.

Fluentd processes these logs and routes them directly to OpenSearch Service, maintaining the benefits of AWS integration while providing centralized accessibility. This centralized logging approach with built-in buffering capabilities reduces the direct load on OpenSearch Service by batching and optimizing log delivery, helping to prevent potential ingestion bottlenecks during high-volume periods. The approach alleviates the need for custom log shipping agents on AWS resources, reducing operational overhead while maintaining comprehensive coverage of the cloud infrastructure.

Application logs processing

For application-level observability, a pipeline using Fluentd is deployed as Kubernetes DaemonSet. Application microservices running on Amazon EKS generate logs that Fluentd DaemonSets collect, parses, and enrich with metadata such as pod names, namespaces, and service identifiers. The processed logs then flow through Amazon MSK for reliable, high-throughput message streaming before final processing by Fluentd and indexing in OpenSearch Service.

This Kafka-based approach provides several advantages:

Decoupling – This helps producers and consumers to operate independently, so that Zeta can scale ingestion and processing separately based on demand.
Backpressure handling – Using Kafka’s buffering capabilities, this manages traffic spikes during peak banking hours, absorbing sudden increases in log volume while maintaining system stability during seasonal usage surges.
Durability of logs – The system maintains logs durably so that no log data is lost during system maintenance or unexpected failures through message persistence.

The logs then pass through a second Fluentd layer for final processing and routing to OpenSearch Service, where they’re indexed across service-specific indexes (app-index, falco-index, kong-index).

Distributed trace collection

To address the challenge of correlating issues across Zeta’s microservices architecture, system uses distributed tracing using Jaeger, an open-source, end-to-end distributed tracing system. Jaeger enables monitoring and troubleshooting transactions in complex distributed systems by tracking requests as they flow through multiple services. The application services and Kong API Gateway are instrumented with Jaeger client libraries that generate trace data including spans, which represent individual operations within a trace. Each span contains metadata such as operation names, start and finish timestamps, tags, and logs that provide context about the operation being performed. The Jaeger Collector aggregates these spans from multiple services, performing validation, indexing, and transformation before forwarding the data.

The traces flow through Amazon MSK for the same reliability benefits as the logging pipeline – providing durability, decoupling, and backpressure handling during high-volume periods. Jaeger Ingester then consumes traces from Amazon MSK and processes them for storage in the jaeger-index within OpenSearch Service.

This data collection and ingestion strategy provides complete end-to-end visibility and builds an observability system that enables SRE teams to monitor, troubleshoot, and optimize the services across the entire technology stack.

Storage tiering

To manage the log, metric, and trace data at scale—about 3TB generated daily—the solution implemented OpenSearch Service storage tiering to balance performance, retention, and cost. Zeta requires near real-time search and retrieval for at least a week, while retaining logs and traces for up to 10 years. Keeping this data in active clusters would impact search performance and significantly increase costs, so the solution uses the OpenSearch Service hot, UltraWarm, and cold storage tiers to optimize the data lifecycle. The following diagram illustrates storage tiering in OpenSearch Service.

Hot storage is used for the most recent and frequently accessed data, supporting real-time indexing and low-latency queries. This tier relies on high-performance storage attached to standard data nodes, making it ideal for powering live dashboards and analytics where speed is critical. The solution uses AWS Graviton 2 powered m6g.4xlarge.search instance types to run the OpenSearch Service domain which provides upto 40% lower cost compared to x86 based instances. Each hot data node has an attached gp3 EBS volume to store indexes. Zeta maintains data in hot storage for 1 week.

UltraWarm storage serves as a cost-effective layer for older, read-only data that is queried less frequently but still needs to remain searchable. UltraWarm nodes use Amazon Simple Storage Service (Amazon S3) as the backing store with an integrated caching mechanism, to retain large volumes of data at a fraction of the cost of hot storage while still supporting interactive queries for historical analysis. Zeta uses ultrawarm1.large.search instance types in the UltraWarm storage tier and maintains data in UltraWarm storage for 15 days.

Cold storage is designed for long-term archival of infrequently accessed or compliance-driven data. Data in cold storage is detached from active compute resources and resides in Amazon S3, incurring minimal cost. When historical data needs to be queried, the indexes are attached to the UltraWarm nodes using OpenSearch API calls. This helps extracting historical data for audits, periodic research or forensic investigations without maintaining active compute for the entire retention period, thereby reducing storage cost.

OpenSearch Service automates index transitions between hot, UltraWarm, and cold storage tiers using Index State Management (ISM) policies. ISM policies specify the conditions and actions for each state, such as transitioning based on index age, size, or document count. When an index qualifies for a transition, ISM jobs—running every 5 to 8 minutes—evaluate the policy and move the index to the next tier. When indexes reach the UltraWarm threshold, they are migrated to UltraWarm nodes backed by Amazon S3, which reduces storage costs while keeping data accessible for queries. After the UltraWarm retention period, ISM archives the indexes to cold storage, detaching them from compute resources but allowing reattachment for future queries or compliance needs. This automated lifecycle management reduces operational overhead, optimizes storage costs, and maintains performance for both recent and historical data.

For observability data, new indexes are created in the hot tier, where they remain for 7 days to support fast ingestion and low-latency queries. After this period, ISM transitions these indexes to UltraWarm storage, where they are retained for an additional 15 days as read-only data, balancing cost with searchability.

Security

Security is the most critical part of the architecture. Zeta’s observability system implements multiple layers of protection for data confidentiality, integrity, and compliance with banking regulations, and is built using a zero-trust approach following the AWS shared responsibility model for OpenSearch Service:

Infrastructure security: The OpenSearch Service domain is deployed within a virtual private cloud (VPC) with private subnets, isolating it from direct internet access. Security groups enforce restrictive ingress rules, allowing access only from authorized sources. The OpenSearch Service domain uses encryption at rest through AWS Key Management Service (KMS). Data in transit is secured using TLS 1.3 encryption, so that log data, traces, and search queries remain protected during transmission. Service-to-service communication uses AWS Identity and Access Management (IAM) roles and encrypted connections, alleviating the need for hardcoded credentials.
Access control and authentication: The solution uses Amazon OpenSearch Service fine-grained access control(FGAC) integrated with IAM, where IAM serves as the authentication provider and FGAC handles authorization by mapping IAM roles to OpenSearch backend roles. This approach helps Zeta to control access permissions at the index and document level based on tenant requirements and user responsibilities. The data ingestion pipeline implements end-to-end security with Fluentd authenticating to Amazon MSK using IAM roles over encrypted connections. Amazon MSK clusters use encryption in transit and at rest, protecting log data throughout the streaming pipeline. Kubernetes RBAC policies restrict pod-to-pod communication and limit service account permissions.
Data privacy and tenant isolation: Each tenants’ data is maintained in logical separation in OpenSearch Service using tenant id. CSN implements tenant-aware authentication and authorization with FGAC, restricting users to their authorized tenants’ dashboards and data. Every API endpoint validates tenant context, so that users can only access data within their authorized scope. Importantly, no customer data is captured in the logs – only system metrics are used to build the monitoring system, adhering to banking security standards and best practices. User actions are audited and logged for compliance purposes, with audit trails maintained according to regulatory requirements.

This security framework enables the observability system meet the security requirements of core banking operations while maintaining operational efficiency and regulatory compliance across global industries.

Customer Service Navigator

CSN delivers SREs a powerful diagnostics interface engineered for high-efficiency monitoring, deep analysis, and rapid troubleshooting of system performance across distributed environments. The system ingests and processes telemetry data at sub-minute intervals, providing near-real-time metrics, traces, and logs from critical infrastructure components. Actionable, interactive visualizations—such as heatmaps, anomaly graphs, and dependency maps— helps SREs to quickly detect SLO breaches and drill down to granular root causes, often within a few minutes of an incident.

The following screenshot shows an example service health dashboard in CSN for an Olympus tenant.

The following screenshot shows an example of the API performance insights dashboard in CSN.

Business and technical benefits

The OpenSearch Service-based CSN System provides the following business and technical benefits:

Manual effort is reduced through automated Index State Management (ISM) and lifecycle policies, so that Zeta’s teams to focus on innovation
Automated lifecycle policies facilitate seamless retention and archiving of compliance data, reducing the risk of non-compliance
The system supports log retention for over 10 years to meet regulatory requirements for Zeta’s banking and financial services customers
Multiple layers of security—including encryption at rest and in transit, FGAC, and tenant isolation to protect customer data and support Zeta’s zero-trust architecture
By consolidating logs, traces, and metrics from disparate systems into OpenSearch, SRE teams can correlate events more effectively, thereby reducing troubleshooting efforts and achieving an 80% improvement in MTTR
Zeta achieved 99.999999999% data durability for archived logs stored in Amazon S3, providing long-term data integrity
Zstandard compression is being implemented to optimize long-term storage costs

Conclusion

CSN’s advanced correlation engine automatically associates related events across microservices, databases, network layers, and infrastructure, significantly streamlining root cause analysis. Integrated alerting and automated runbooks further reduce response times. Since implementing CSN, Zeta has achieved over an 80% reduction in MTTR, with incident response times decreasing from 30+ minutes to under 5 minutes. The service supports seamless multi-tenant monitoring, processes 3TB of machine-generated data daily, and is architected for petabyte-scale growth. Additionally, CSN helps Zeta meet regulatory requirements for retaining historical logs over several years while keeping storage costs under control. This has substantially improved operational resilience, increased service availability, and empowered teams to proactively resolve issues before they affect end users.

Ready to take your organization’s observability capabilities to the next level? Dive into the technical details of OpenSearch Service in the Amazon OpenSearch Developer Guide. Visit our new migration hub page for more prescriptive guidance on moving your workloads to OpenSearch Service.

About the authors

Deepesh Dhapola is a Senior Solutions Architect at AWS India, where he architects high-performance, resilient cloud solutions for financial services and fintech organizations. He specializes in using advanced AI technologies—including generative AI, intelligent agents, and the Model Context Protocol (MCP)—to design secure, scalable, and context-aware applications. With deep expertise in machine learning and a keen focus on emerging trends, Deepesh drives digital transformation by integrating cutting-edge AI capabilities to enhance operational efficiency and foster innovation for AWS customers. Beyond his technical pursuits, he enjoys quality time with his family and explores creative culinary techniques.

Shashidhar (Shashi) Soppin is an accomplished Enterprise Architect and cloud transformation leader with over 24+ years of experience spanning regulated industries and high-growth technology environments. Currently steering strategic initiatives as Lead Architect at Zeta’s CTO office, Shashidhar has helped in building and led world-class engineering teams, driving innovation in cloud, security, and fintech domains. He has architected secure, scalable platforms—scaling user bases by 10x, enabling complex integrations for leading Bank’s migration to Zeta’s platforms, and pioneering Zero Trust frameworks that achieved outstanding regulatory compliance. A results-driven executive and former DMTS at Wipro, Shashidhar holds 25+ granted patents and has delivered multi-million dollar enterprise deals across domains including AI/ML. Renowned as a published author (“Essentials of Deep Learning”), frequent industry speaker, and hands-on innovator, he combines technical expertise with business acumen, propelling organizations toward robust, future-ready cloud ecosystems and operational excellence. Prior to Wipro he worked in IBM-ISL as well.

Anchal Kansal is a Lead Site Reliability Engineer at Zeta, where she has spent the past four years building and scaling reliable, high-performance systems. With deep expertise in OpenSearch, observability platforms, and large-scale infrastructure, she focuses on ensuring uptime, performance, and operational efficiency. Anchal is passionate about solving complex reliability challenges and sharing practical insights with the engineering community.

Manochandra (Mano) is the Site Reliability Engineering (SRE) expert at Zeta, specializing in data management-oriented systems. With a deep understanding of large-scale distributed architectures, he has extensive experience designing, deploying, and maintaining resilient, production-grade OpenSearch systems. Mano is known for his proactive approach in optimizing infrastructure reliability and performance, as well as his ability to troubleshoot complex operational challenges. His expertise spans implementing automation, monitoring, and incident management best practices, making him a go-to resource for ensuring service availability and scalability at Zeta.

Hitesh Subnani is a FSI Solutions Architect at AWS India, where he works with customers to design and build architectures that deliver business value. He specializes in comprehensive observability and analytics systems, enabling organizations to gain deep insights from operational data. With expertise in search and analytics technologies, Hitesh focuses on scalable monitoring systems, real-time dashboards, and compliance-driven architectures for AWS customers in the financial sector.

Tarun Chakraborty is a Sr. Technical Account Manager (TAM) at AWS India, where he partners with leading banks and fintech organizations to accelerate their cloud transformation journeys. With over 15 years of experience in technology and financial services, he serves as a trusted advisor helping customers leverage AWS’s comprehensive suite of services to drive innovation and achieve their business objectives.

Build enterprise-scale log ingestion pipelines with Amazon OpenSearch Service

2025-08-21 Akhil B

Post Syndicated from Akhil B original https://aws.amazon.com/blogs/big-data/build-enterprise-scale-log-ingestion-pipelines-with-amazon-opensearch-service/

Organizations of all sizes generate massive volumes of logs across their applications, infrastructure, and security systems to gain operational insights, troubleshoot issues, and maintain regulatory compliance. However, implementing log analytic solutions presents significant challenges, including complex data ingestion pipelines and the need to balance cost and performance while scaling to handle petabytes of data.

Amazon OpenSearch Service addresses these challenges by providing high-performance search and analytics capabilities, making it straightforward to deploy and manage OpenSearch clusters in the AWS Cloud without the infrastructure management overhead. A well-designed log analytics solution can help support proactive management in a variety of use cases, including debugging production issues, monitoring application performance, or meeting compliance requirements.

In this post, we share field-tested patterns for log ingestion that have helped organizations successfully implement logging at scale, while maintaining optimal performance and managing costs effectively.

Solution overview

Organizations can choose from several data ingestion architectures, such as:

Log shippers like FluentBit or Fluentd to send logs directly
Amazon Data Firehose for serverless data delivery
Custom solutions using AWS Lambda
Amazon OpenSearch Ingestion for managed data ingestion

Irrespective of the chosen pattern, a scalable log ingestion architecture should comprise the following logical layers:

Collect layer – This is the initial stage where logs are gathered from various sources, including application logs, system logs, and more.
Buffer layer – This layer acts as a temporary storage layer to handle spikes in log volume and prevents data loss during downstream processing issues. This layer also maintains system stability during high load.
Process layer – This layer transforms the unstructured logs into structured formats while adding relevant metadata and contextual information needed for effective analysis.
Store layer – This layer is the final destination for processed logs (OpenSearch in this case), which supports various access patterns, including querying, historical analysis, and data visualization.

OpenSearch Ingestion offers a purpose-built, fully managed experience that simplifies the data ingestion process. In this post, we focus on using OpenSearch Ingestion to load logs from Amazon Simple Storage Service (Amazon S3) into an OpenSearch Service domain, a common and efficient pattern for log analytics.

OpenSearch Ingestion is a fully managed, serverless data ingestion service that streamlines the process of loading data into OpenSearch Service domains or Amazon OpenSearch Serverless collections. It’s powered by Data Prepper, an open source data collector that filters, enriches, transforms, normalizes, and aggregates data for downstream analysis and visualization.

OpenSearch Ingestion uses pipelines as a mechanism that consists of the following major components:

Source – The input component of a pipeline. It defines the mechanism through which a pipeline consumes records.
Buffer – A persistent, disk-based buffer that stores data across multiple Availability Zones to enhance durability. OpenSearch Ingestion dynamically allocates OCUs for buffering, which increases pricing as you may need additional OCUs to maintain ingestion throughput.
Processors – The intermediate processing units that can filter, transform, and enrich records into a desired format before publishing them to the sink. The processor is an optional component of a pipeline.
Sink – The output component of a pipeline. It defines one or more destinations to which a pipeline publishes records. A sink can also be another pipeline, so you can chain multiple pipelines together.

Because of its serverless nature, OpenSearch Ingestion automatically scales to accommodate varying workload demands, alleviating the need for manual infrastructure management while providing built-in monitoring capabilities. Users can focus on their data processing logic rather than spending time on operational complexities, making it an efficient solution for managing data pipelines in OpenSearch environments.

The following diagram illustrates the architecture of the log ingestion pipeline.

Let’s walk through how this solution processes Apache logs from ingestion to visualization:

The source application generates Apache logs that need to be analyzed and stores them in an S3 bucket, which acts as the central storage location for incoming log data. When a new log file is uploaded to the S3 bucket (ObjectCreate event), Amazon S3 automatically triggers an event notification that is configured to send messages to a designated Amazon Simple Queue Service (Amazon SQS) queue.
The SQS queue reliably manages and tracks the notifications of new files uploaded to Amazon S3, making sure the file event is delivered to the OpenSearch Ingestion pipeline. A dead-letter queue (DLQ) is configured to capture failed event processing.
The OpenSearch Ingestion pipeline monitors the SQS queue, retrieving messages that contain information about newly uploaded log files. When a message is received, the pipeline reads the corresponding log file from Amazon S3 for processing.
After the log file is retrieved, the OpenSearch Ingestion pipeline parses the content, and uses the OpenSearch Bulk API to efficiently ingest the processed log data into the OpenSearch Service domain, where it becomes available for search and analysis.
The ingested data can be visualized and analyzed through OpenSearch Dashboards, which provides a user-friendly interface for creating custom visualizations, dashboards, and performing real-time analysis of the log data with features like search, filtering, and aggregations.

In the following sections, we guide you through the steps to ingest application log files from Amazon S3 into OpenSearch Service using OpenSearch Ingestion. Additionally, we demonstrate how to visualize the ingested data using OpenSearch Dashboards.

Prerequisites

This post assumes you have the following:

An AWS account
The AWS Command Line Interface (AWS CLI) installed
The AWS CDK Toolkit installed
Python 3 installed

Deploy the solution

The solution uses a Python AWS Cloud Development Kit (AWS CDK) project to deploy an OpenSearch Service domain and associated components. This project demonstrates event-based data ingestion into the OpenSearch Service domain in a no code approach using OpenSearch Ingestion pipelines.

The deployment is automated using the AWS CDK and comprises the following steps:

Clone the GitHub repo.

git clone [email protected]:aws-samples/sample-log-ingestion-pipeline-for-amazon-opensearch-service.git

Create a virtual environment and install the Python dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Update the following environment variables in cdk.json:
1. domain_name: The OpenSearch domain to be created in your AWS account.
2. user_name: The user name for the internal primary user to be created within the OpenSearch domain.
3. user_password: The password for the internal primary user.

This deployment creates a public-facing OpenSearch domain but is secured through fine-grained access control (FGAC). For production workloads, consider deploying within a virtual private cloud (VPC) with additional security measures. For more information, see Security in Amazon OpenSearch Service.

Bootstrap the AWS CDK stack and initiate the deployment. Provide your AWS account number and the AWS Region where you want deploy the solution:

cdk bootstrap <Account ID>/<region>
cdk deploy --all

The process takes about 30–45 minutes to complete.

Verify the solution resources

When the previous steps are complete, you can check for the created resources.

You can confirm the existence of the stacks on the AWS CloudFormation console. As shown in the following screenshot, the CloudFormation stacks have been created and deployed by cdk bootstrap and cdk deploy.

On the OpenSearch Service console, under Managed clusters in the navigation pane, choose Domains. You can confirm the domain created.

On the OpenSearch Service console, under Ingestion in the navigation pane, choose Pipelines. You can see the pipeline apache-log-pipeline created.

Configure security options

To configure your security roles, complete the following steps:

On the AWS CloudFormation console, open the stack CdkIngestionStack, and on the Outputs tab, copy the Amazon Resource Name (ARN) of osi-pipeline-role.

Open the OpenSearch Service console in the deployed Region within your AWS account and choose the domain you created.
Choose the link for OpenSearch Dashboards URL.
In the login prompt, enter the user credentials that were provided in cdk.json.

After a successful login, the OpenSearch Dashboards console will be displayed.

If you’re prompted to select a tenant, select the Global tenant.
In the Security options, navigate to the Roles section and choose the all_access role.
On the all_access role page, navigate to mapped_users and choose Manage.
Choose Add another backend role under Backend roles and enter the IAM role ARN you copied.
Confirm by choosing Map.

Create an index template

The next step is to create an index template. Complete the following steps:

On the Dev Tools console, copy the contents of the file index_template.txt within the opensearch_object directory.
Enter the code in the Dev Tools console.

This index template defines the mapping and settings for our OpenSearch index.

Choose the play icon to submit the request and create a template.

In the Dashboard Management section, choose Saved Objects and choose Import.
Choose Import and choose the apache_access_log_dashboard.ndjson file within the opensearch_object directory.
Choose Check for existing objects.
Choose Automatically overwrite conflicts and choose Import.

Ingest data

Now you can proceed with the data ingestion.

On the Amazon S3 console, open the S3 bucket opensearch-logging-blog-<Account ID>.
Upload the data file apache_access_log.gz (within the apache_log_data directory). The file can be uploaded in any prefix.

For this solution, we use Apache access logs as our example data source. Although this pipeline is configured for Apache log format, it can be modified to support other log types by adjusting the pipeline configuration. See Overview of Amazon OpenSearch Ingestion for details about configuring different log formats.

After a few minutes, navigate to the Discover tab in OpenSearch Dashboards, where you can find that the data is ingested.
Confirm that the apache* index pattern is selected.

5. On the Dashboards tab, choose Apache Log Dashboard.

The dashboard will be populated by the data and visuals should be displayed.

Operational best practices

When designing your log analytics platform on OpenSearch Service, make sure you follow the recommended operational best practices for cluster configuration, data management, performance, monitoring, and cost optimization. For detailed guidance, refer to Operational best practices for Amazon OpenSearch Service.

Clean up

To avoid ongoing charges for the resources that you created, delete them by completing the following steps:

On the Amazon S3 console, open the bucket opensearch-logging-blog-<Account ID> and choose Empty.
Follow the prompts to delete the contents of the bucket.
Delete the AWS CDK stacks using the following command:

cdk destroy --all --force

Conclusion

As organizations continue to generate increasing volumes of log data, having a well-architected logging solution becomes crucial for maintaining operational visibility and meeting compliance requirements.

Implementing a robust logging infrastructure requires careful planning. In this post, we explored a field-tested approach in building a scalable, efficient, and cost-effective logging solution using OpenSearch Ingestion.

This solution serves as a starting point that can be customized based on specific organizational needs while maintaining the core principles of scalability, reliability, and cost-effectiveness.

Remember that logging infrastructure is not a “set-and-forget” system. Regular monitoring, periodic reviews of storage patterns, and adjustments to index management policies will help make sure your logging solution continues to serve your organization’s evolving needs effectively.

To dive deeper into OpenSearch Ingestion implementation, explore our comprehensive Amazon OpenSearch Service Workshops, which include hands-on labs and reference architectures. For additional insights, see Build a serverless log analytics pipeline using Amazon OpenSearch Ingestion with managed Amazon OpenSearch Service. You can also visit our Migration Hub if you’re ready to migrate legacy or self-managed workloads to OpenSearch Service.

About the authors

Akhil B is a Data Analytics Consultant at AWS Professional Services, specializing in cloud-based data solutions. He partners with customers to design and implement scalable data analytics platforms, helping organizations transform their traditional data infrastructure into modern, cloud-based solutions on AWS. His expertise helps organizations optimize their data ecosystems and maximize business value through modern analytics capabilities.

Ramya Bhat is a Data Analytics Consultant at AWS, specializing in the design and implementation of cloud-based data platforms. She builds enterprise-grade solutions across search, data warehousing, and ETL that enable organizations to modernize data ecosystems and derive insights through scalable analytics. She has delivered customer engagements across healthcare, insurance, fintech, and media sectors.

Chanpreet Singh is a Senior Consultant at AWS, specializing in the Data and AI/ML space. He has over 18 years of industry experience and is passionate about helping customers design, prototype, and scale Big Data and Generative AI applications using AWS native and open-source tech stacks. In his spare time, Chanpreet loves to explore nature, read, and spend time with his family.

Cluster manager communication simplified with Remote Publication

2025-08-14 Himshikha Gupta

Post Syndicated from Himshikha Gupta original https://aws.amazon.com/blogs/big-data/cluster-manager-communication-simplified-with-remote-publication/

Amazon OpenSearch Service has taken a significant leap forward in scalability and performance with the introduction of support for 1,000-node OpenSearch Service domains capable of handling 500,000 shards with OpenSearch Service version 2.17. This breakthrough is made possible by multiple features, including Remote Publication, which introduces an innovative cluster state publication mechanism that enhances scalability, availability, and durability. It uses the remote cluster state feature as the base. This feature provides durability and makes sure metadata is not lost even when the majority of the cluster manager nodes fail permanently. By using a remote store for cluster state publication, OpenSearch Service can now support clusters with a higher number of nodes and shards.

The cluster state is an internal data structure that contains cluster information. The elected cluster manager node manages this state. It’s distributed to follower nodes through the transport layer and stored locally on each node. A follower node can be a data node, a coordinator node or a non-elected cluster manager node. However, as the cluster grows, publishing the cluster state over the transport layer becomes challenging. The increasing size of the cluster state consumes more network bandwidth and blocks transport threads during publication. This can impact scalability and availability. This post explains cluster state publication, Remote Publication, and their benefits in improving durability, scalability, and availability.

How did cluster state publication work before Remote Publication?

The elected cluster manager node is responsible for maintaining and distributing the latest OpenSearch cluster state to all the follower nodes. The cluster state updates when you create indexes and update mappings, or when internal actions like shard relocations occur. Distribution of the updates follows a two-phase process: publish and commit. In the publish phase, the cluster manager sends the updated state to the follower nodes and saves a copy locally. After a majority (more than half) of the eligible cluster manager nodes acknowledge this update, the commit phase begins, where the follower nodes are instructed to apply the new state.

To optimize performance, the elected cluster manager sends only the changes since the last update, referred to as the diff state, reducing data transfer. However, if a folllower node is out of sync or new to the cluster, it might reject the diff state. In such cases, the cluster manager sends the full cluster state to those follower nodes.

The following diagram depicts the cluster state publication flow.

Sequence of steps between the cluster manager node and a follower node demonstrating the cluster state publication over transport layer

The workflow consists of the following steps:

The user invokes an admin API such as create index.
The elected cluster manager node computes the cluster state for the admin API request.
The elected cluster manager node sends the cluster state publish request to follower nodes.
The follower nodes respond with an acknowledgement to the publish request.
The elected cluster manager node persists the cluster state to the disk.
The elected cluster manager node sends the commit request to follower nodes.
The follower nodes respond with an acknowledgement to the commit request.

We’ve observed stable cluster operations with this publication flow up to 200 nodes or 75,000 shards. However, as the cluster state grows in size with more indexes, shards, and nodes, it starts consuming high network bandwidth and blocking transport threads for a longer duration during publication. Additionally, it becomes CPU and memory intensive for the elected cluster manager to transmit to the follower nodes, often impacting publication latency. The increased latency can lead to a high pending task count on the elected cluster manager. This can cause request timeouts, or in severe cases, cluster manager failure, creating a cluster outage.

Using a remote store for cluster state publication improved availability and scalability

With Remote Publication, cluster state updates are transmitted through an Amazon Simple Storage Service (Amazon S3) bucket as the remote store, rather than transmitting the state over the transport layer. When the elected cluster manager updates the cluster state, it uploads the new state to Amazon S3 in addition to persisting on disk. The cluster manager uploads a manifest file, which keeps track of the entities and which entities changed from their previous state. Similarly, follower nodes download the manifest from Amazon S3 and can decide if it needs the full state or only changed entities. This has two benefits: reduced cluster manager resource usage and faster transport thread availability.

Creating new domains or upgrading from existing OpenSearch Service versions to 2.17 or above, or applying the service patch to an existing 2.17 or above domain, enables Remote Publication by default, This provides seamless migration with the remote state. This is enabled by default for SLA clusters, with or without remote-backed storage. Let’s dive into some details of this design and understand how it works internally.

How is the remote store modeled for scalability?

Having scalable and efficient Amazon S3 storage is essential for Remote Publication to work seamlessly. The cluster state has multiple entities, which get updated at different frequencies. For example, cluster node data only changes if a new node joins the cluster or an old node leaves the cluster, which usually happens during blue/green deployments or node replacements. However, shard allocation can change multiple times a day based on index creations, rollovers, or internal service triggered relocations. The storage schema needs to be able to handle these entities in a way that a change in one entity doesn’t impact another entity. A manifest file keeps track of the entities. Each cluster state entity has its own separate file, like one for templates, one for cluster settings, one for cluster nodes, and so on. For entities that scale with the number of indexes, like index metadata and index shard allocation, per-index files are created to make sure changes in an index can be uploaded and downloaded independently. The manifest file keeps track of paths to these individual entity files. The following code shows a sample manifest file. It contains the details of the granular cluster state entities’ files uploaded to Amazon S3 along with some basic metadata.

{
    "term": 5,
    "version": 10,
    "cluster_uuid": "dsgYj10Nkso7",
    "state_uuid": "dlu34Dh2Hiq",
    "node_id": "7rsyg5FbSeSt",
    "node_version": "3000099",
    "committed": true,
    "indices": [{
        "index_name": "index1",
        "uploaded_filename": "index1-s3-key"
    }, {
        "index_name": "index2",
        "uploaded_filename": "index2-s3-key"
    }],
    "indices_routing": [{
        "index_name": "index1",
        "uploaded_filename": "index1-routing-s3-key"
    }, {
        "index_name": "index2",
        "uploaded_filename": "index2-routing-s3-key"
    }],
    "uploaded_settings_metadata": {
        "uploaded_filename": "settings-s3-key"
    },
    "diff_manifest": {
        "from_state_uuid": "aRiq3oEip",
        "to_state_uuid": "dlu34Dh2Hiq",
        "metadata_diff": {
            "settings_metadata_diff": true,
            "indices_diff": {
                "upserts": ["index1"],
                "deletes": ["index2"]
            }
        },
        "routing_table_diff": {
            "upserts": ["index1"],
            "deletes": ["index2"],
            "diff": "indices-routing-diff-s3-key"
        }
    }
}

In addition to keeping track of cluster state components, the manifest file also keeps track of what entities changed compared to the last state, which is the diff manifest. In the preceding code, diff manifest has a section for metadata diff and routing table diff. This signifies that between these two versions of the cluster state, these entities have changed.

We also keep a separate shard diff file specifically for shard allocation. Because multiple shards for different indexes can be relocated in a single cluster state update, having this shard diff file further reduces the number of files to download.

This configuration provides the following benefits:

Separate files help prevent bloating a single document
Per-index files reduces the number of updates and effectively reduces the network bandwidth usage, because most updates affect only a few indexes
Having a diff tracker makes downloads on nodes efficient because only limited data needs to be downloaded

To support the scale and high request rate to Amazon S3, we use Amazon S3 pre-partitioning, so we can scale proportionally with the number of clusters and indexes. For managing storage size, an asynchronous scheduler is added, which cleans up stale files and keeps only the last 10 recently updated documents. After a cluster is deleted, a domain sweeper job removes the files for that cluster after a few days.

Remote Publication overview

Now that you understand how cluster state is persisted in Amazon S3, let’s see how it is used during the publication workflow. When a cluster state update occurs, the elected cluster manager uploads changed entities to Amazon S3 in parallel, with the number of concurrent uploads determined by a fixed thread pool. It then updates and uploads a manifest file with diff details and file paths.

During the publish phase, the elected cluster manager sends the manifest path, term, and version to follower nodes using a new remote transport action. When the elected cluster manager changes, the newly elected cluster manager increments the term which signifies the number of times a new cluster manager election has occurred. The elected cluster manager increments the cluster state version when the cluster state is updated. You can use these two components to identify cluster state progression and make sure nodes operate with the same understanding of the cluster’s configuration. The follower nodes download the manifest, determine if they need a full state or just the diff, and then download the required files from Amazon S3 in parallel. After the new cluster state is computed, follower nodes acknowledge the elected cluster manager.

In the commit phase, the elected cluster manager updates the manifest, marking it as committed, and instructs follower nodes to commit the new cluster state. This process provides efficient distribution of cluster state updates, especially in large clusters, by minimizing direct data transfer between nodes and using Amazon S3 for storage and retrieval. The following diagram depicts the Remote Publication flow when an index creation triggers a cluster state update.

Sequence of steps between the cluster manager node, the follower nodes, and a remote store such as Amazon S3 depicting the remote cluster state publication

The workflow consists of the following steps:

The user invokes an admin API such as create index.
The elected cluster manager node uploads the index metadata and routing table files in parallel to the configured remote store.
The elected cluster manager node uploads the manifest file containing the details of the metadata files to the remote store.
The elected cluster manager sends the remote manifest file path to the follower nodes.
The follower node downloads the manifest file from the remote store.
The follower nodes download the index metadata and routing table files from the remote store in parallel.

Failure detection in publication

Remote Publication brings in a significant change to how publication works and how the cluster state is managed. Issues in file creation, publication, or downloading and creating cluster state on follower nodes can have a potential impact on the cluster. To make sure the new flow works as expected, a checksum validation is added to the publication flow. On the elected cluster manager, after creating a new cluster state, a checksum is created for individual entities and the overall cluster state and added to the manifest. On follower nodes, after the cluster state is created after download, a checksum is created again and matched against the checksum from the manifest. A mismatch in checksums means the cluster state on the follower node is different from that on the elected cluster manager. In the default mode, the service only logs which entity is failing the checksum match and lets the cluster state persist. For further debugging, checksum match supports different modes, where it can download the complete state and find the diff between two states in trace mode, or fail the publication request in failure mode.

Recovery from failures

With remote state, quorum loss is recovered by using the cluster state from the remote store. Without remote state, the cluster manager might lose metadata, leading to data loss for your cluster. However, the cluster manager can now use the last persisted state to help prevent metadata loss in the cluster. The following diagram illustrates the states of a cluster before a quorum loss, during a quorum loss, and after the quorum loss recovery happens using a remote store.

The states of a cluster before a quorum loss, during a quorum loss, and after the quorum loss recovery happens using remote store

Benefits

In this section, we discuss some of the solution benefits.

Scalability and availability

Remote Publication significantly reduces the CPU, memory, and network overhead for the elected cluster manager when transmitting the state to the follower nodes. Additionally, transport threads responsible for sending publish requests to follower nodes are made available more quickly, because the remote publish request size is smaller. The publication request size remains consistent irrespective of the cluster state size, giving consistent publication performance. This enhancement enables OpenSearch Service to support larger clusters of up to 1,000 nodes and a higher number of shards per node, without overwhelming the elected cluster manager. With reduced load on the cluster manager, its availability improves, so it can more efficiently serve admin API requests.

Durability

With the cluster state being persisted to Amazon S3, we get Amazon S3 durability. Clusters suffering quorum loss due to replacement of cluster manager nodes can hydrate with the remote cluster state and recover from quorum loss. Because Amazon S3 has the last committed cluster state, there is no data loss on recovery.

Cluster state publication performance

We tested the elected cluster manager performance in a 1,000-node domain containing 500,000 shards. We compared two versions: the new Remote Publication system vs. the older cluster state publication system. Both clusters were operated with the same workload for a few hours. The following are some key observations:

Cluster state publication time reduced from an average of 13 seconds to 4 seconds, which is a three-fold improvement
Network out reduced from an average of 4 GB to 3 GB
Elected cluster manager resource utilization showed significant improvement, with JVM dropping from an average of 40% to 20% and CPU dropping from 50% to 40%

We tested on a 100-node cluster as well and saw performance improvements with the increase in the size of the cluster state. With 50,000 shards, the uncompressed cluster state size increased to 600 MB. The following observations were made during cluster state update when compared to a cluster without Remote Publication:

Max network out traffic reduced from 11.3 GB to 5.7 GB (approximately 50%)
Average elected cluster manager JVM usage reduced from 54% to 35%
Average elected cluster manager CPU reduced from 33% to 20%

Contributing to open source

OpenSearch is an open source, community-driven software. You can find code for the Remote Publication feature in the project’s GitHub repository. Some of the notable GitHub pull requests have been added inline to the preceding text. You can find the RFCs for remote state and remote state publication in the project’s GitHub repository. A more comprehensive list of pull requests is attached in the meta issues for remote state, remote publication, and remote routing table.

Looking ahead

The new Remote Publication architecture enables teams to build additional features and optimizations using the remote store:

Faster recovery after failures – With the new architecture, we have the last successful cluster state in Amazon S3, which can be downloaded on the new cluster manager. At the time of writing, only cluster metadata gets restored on recovery and then the elected cluster manager tries to build shard allocation by contacting the data nodes. This takes up a lot of CPU and memory for both the cluster manager and data nodes, in addition to the time taken to collate the data to build the allocation table. With the last successful shard allocation available in Amazon S3, the elected cluster manager can download the data, build the allocation table locally, and then update the cluster state to the follower nodes, making recovery faster and less resource-intensive.
Lazy loading – The cluster state entities can be loaded as needed instead of all at once. This approach reduces the average memory usage on a follower node and is expected to speed up cluster state publication.
Node-specific metadata – At present, every follower node downloads and loads the entire cluster state. However, we can optimize this by modifying the logic so that a data node only downloads the index metadata and routing table for the indexes it contains.
Optimize cluster state downloads – There is an opportunity to optimize the downloading of cluster state entities. We are exploring compression and serialization techniques to minimize the amount of data transmitted.
Restoring to an older state – The service keeps the cluster state for the last 10 updates. This can be used to restore the cluster to a previous state in case the state gets corrupted.

Conclusion

Remote Publication makes cluster state publication faster and more robust, significantly improving cluster scalability, reliability, and recovery capabilities, potentially reducing customer incidents and operational overhead. This change in architecture enables further improvements in elected cluster manager performance and making domains more durable, especially for larger domains where cluster manager operations become heavy as the number of indexes and nodes increase. We encourage you to upgrade to the latest version to take advantage of these improvements and share your experience with our community.

About the authors

Himshikha Gupta is a Senior Engineer with Amazon OpenSearch Service. She is excited about scaling challenges with distributed systems. She is an active contributor to OpenSearch, focused on shard management and cluster scalability

Sooraj Sinha is a software engineer at Amazon, specializing in Amazon OpenSearch Service since 2021. He has worked on multiple core components of OpenSearch, including indexing, cluster management, and cross-cluster replication. His contributions have focused on improving the availability, performance, and durability of OpenSearch.

Boosting search relevance: Automatic semantic enrichment in Amazon OpenSearch Serverless

2025-08-06 Jon Handler

Post Syndicated from Jon Handler original https://aws.amazon.com/blogs/big-data/boosting-search-relevance-automatic-semantic-enrichment-in-amazon-opensearch-serverless/

Traditional search engines rely on word-to-word matching (referred to as lexical search) to find results for queries. Although this works well for specific queries such as television model numbers, it struggles with more abstract searches. For example, when searching for “shoes for the beach,” a lexical search merely matches individual words “shoes,” “beach,” “for,” and “the” in catalog items, potentially missing relevant products like “water-resistant sandals” or “surf footwear” that don’t contain the exact search terms.

Large language models (LLMs) create dense vector embeddings for text that expand retrieval beyond individual word boundaries to include the context in which words are used. Dense vector embeddings capture the relationship between shoes and beaches by learning how often they occur together, enabling better retrieval for more abstract queries through what is called semantic search.

Sparse vectors combine the benefits of lexical and semantic search. The process starts with a WordPiece tokenizer to create a limited set of tokens from text. A transformer model then assigns weights to these tokens. During search, the system calculates the dot-product of the weights on the tokens (from the reduced set) from the query with tokens from the target document. You get a blended score from the terms (tokens) whose weights are high for both the query and the target. Sparse vectors encode semantic information, like dense vectors, and supply word-to-word matching through the dot-product, giving you a hybrid lexical-semantic match. For a detailed understanding of sparse and dense vector embeddings, visit Improving document retrieval with sparse semantic encoders in the OpenSearch blog.

Automatic semantic enrichment for Amazon OpenSearch Serverless makes implementing semantic search with sparse vectors effortless. You can now experiment with search relevance improvements and deploy to production with only a few clicks, requiring no long-term commitment or upfront investment. In this post, we show how automatic semantic enrichment removes friction and makes the implementation of semantic search for text data seamless, with step-by-step instructions to enhance your search functionality.

Automatic semantic enrichment

You could already enhance search relevance scoring beyond OpenSearch’s default lexical scoring with the Okapi BM25 algorithm, integrating dense vector and sparse vector models for semantic search using OpenSearch’s connector framework. However, implementing semantic search in OpenSearch Serverless has been complex and costly, requiring model selection, hosting, and integration with an OpenSearch Serverless collection.

Automatic semantic enrichment lets you automatically encode your text fields in your OpenSearch Serverless collections as sparse vectors by just setting the field type. During ingestion, OpenSearch Serverless automatically processes the data through a service-managed machine learning (ML) model, converting text to sparse vectors in native Lucene format.

Automatic semantic enrichment supports both English-only and multilingual options. The multilingual variant supports the following languages: Arabic, Bengali, Chinese, English, Finnish, French, Hindi, Indonesian, Japanese, Korean, Persian, Russian, Spanish, Swahili, and Telugu.

Model details and performance

Automatic semantic enrichment uses a service-managed, pre-trained sparse model that works effectively without requiring custom fine-tuning. The model analyzes the fields you specify, expanding them into sparse vectors based on learned associations from diverse training data. The expanded terms and their significance weights are stored in native Lucene index format for efficient retrieval. We’ve optimized this process using document-only mode, where encoding happens only during data ingestion. Search queries are merely tokenized rather than processed through the sparse model, making the solution both cost-effective and performant.

Our performance validation during feature development used the MS MARCO passage retrieval dataset, featuring passages averaging 334 characters. For relevance scoring, we measured average Normalized discounted cumulative gain (NDCG) for the first 10 search results (ndcg@10) on the BEIR benchmark for English content and average ndcg@10 on MIRACL for multilingual content. We assessed latency through client-side, 90th-percentile (p90) measurements and search response p90 took values. These benchmarks provide baseline performance indicators for both search relevance and response times.

The following table shows the automatic semantic enrichment benchmark.

Language	Relevance improvement	P90 search latency
English	20.0% over lexical search	7.7% lower latency over lexical search (bm25 is 26 ms, and automatic semantic enrichment is 24 ms)
Multilingual	105.1% over lexical search	38.4% higher latency over lexical search (bm25 is 26 ms, and automatic semantic enrichment is 36 ms)

Given the unique nature of each workload, we encourage you to evaluate this feature in your development environment using your own benchmarking criteria before making implementation decisions.

Pricing

OpenSearch Serverless bills automatic semantic enrichment based on OpenSearch Compute Units (OCUs) consumed during sparse vector generation at indexing time. You’re charged only for actual usage during indexing. You can monitor this consumption using the Amazon CloudWatch metric SemanticSearchOCU. For specific details about model token limits and volume throughput per OCU, visit Amazon OpenSearch Service Pricing.

Prerequisites

Before you create an automatic semantic enrichment index, verify that you’ve been granted the necessary permissions for the task. Contact an account administrator for assistance if required. To work with automatic semantic enrichment in OpenSearch Serverless, you need the account-level AWS Identity and Access Management (IAM) permissions shown in the following policy. The permissions serve the following purposes:

The aoss:*Index IAM permissions is used to create and manage indices.
The aoss:APIAccessAll IAM permission is used to perform OpenSearch API operations.

{
"Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
              "aoss:CreateIndex",
              "aoss:GetIndex",
              "aoss:APIAccessAll",
            ],
            "Resource": "<ARN of your Serverless Collection>"
        }
    ]
}

You also need an OpenSearch Serverless data access policy to create and manage Indices and associated resources in the collection. For more information, visit Data access control for Amazon OpenSearch Serverless in the OpenSearch Serverless Developer Guide. Use the following policy:

[
    {
        "Description": "Create index permission",
        "Rules": [
            {
                "ResourceType": "index",
                "Resource": ["index/<collection_name>/*"],
                "Permission": [
                  "aoss:CreateIndex", 
                  "aoss:DescribeIndex",                  
"aoss:ReadDocument",
    "aoss:WriteDocument"
                ]
            }
        ],
        "Principal": [
            "arn:aws:iam::<account_id>:role/<role_name>"
        ]
    },
    {
        "Description": "Create pipeline permission",
        "Rules": [
            {
                "ResourceType": "collection",
                "Resource": ["collection/<collection_name>"],
                "Permission": [
                  "aoss:CreateCollectionItems",
                  "aoss:"
                ]
            }
        ],
        "Principal": [
            "arn:aws:iam::<account_id>:role/<role_name>"
        ]
    },
    {
        "Description": "Create model permission",
        "Rules": [
            {
                "ResourceType": "model",
                "Resource": ["model/<collection_name>/*"],
                "Permission": ["aoss:CreateMLResources"]
            }
        ],
        "Principal": [
            "arn:aws:iam::<account_id>:role/<role_name>"
        ]
    },
]

To access private collections, set up the following network policy:

[
   {
      "Description":"Enable automatic semantic enrichment in private collection",
      "Rules":[
         {
            "ResourceType":"collection",
            "Resource":[
               "collection/<collection_name>"
            ]
         }
      ],
      "AllowFromPublic":false,
      "SourceServices":[
         "aoss.amazonaws.com"
      ],
   }
]

Set up an automatic semantic enrichment index

To set up an automatic semantic enrichment index, follow these steps:

To create an automatic semantic enrichment index using the AWS Command Line Interface (AWS CLI), use the create-index command:

aws opensearchserverless create-index \
    --id <collection_id> \
    --index-name <index_name> \
    --index-schema <index_body>

To describe the created index, use the following command:

aws opensearchserverless create-index \
    --id <collection_id> \
    --index-name <index_name>

You can also use AWS CloudFormation templates (Type: AWS::OpenSearchServerless::CollectionIndex) or the AWS Management Console to create semantic search during collection provisioning as well as after the collection is created.

Example: Index setup for product catalog search

This section shows how to set up a product catalog search index. You’ll implement semantic search on the title_semantic field (using an English model). For the product_id field, you’ll maintain default lexical search functionality.

In the following index-schema, the title_semantic field has a field type set to text and has parameter semantic_enrichment set to status ENABLED. Setting the semantic_enrichment parameter enables automatic semantic enrichment on the title_semantic field. You can use the language_options field to specify either english or multi-lingual. For this post, we generate a nonsemantic title field named title_non_semantic. Use the following code:

aws opensearchserverless create-index \
    --id XXXXXXXXX \
    --index-name 'product-catalog' \
    --index-schema '{
    "mappings": {
        "properties": {
            "product_id": {
                "type": "keyword"
            },
            "title_semantic": {
                "type": "text",
                "semantic_enrichment": {
                    "status": "ENABLED",
                    "language_options": "english"
                }
            },
            "title_non_semantic": {
                "type": "text"
            }
        }
    }
}'

Data ingestion

After the index is created, you can ingest data through standard OpenSearch mechanisms, including client libraries, REST APIs, or directly through OpenSearch Dashboards. Here’s an example of how to add multiple documents using bulk API in OpenSearch Dashboards Dev Tools:

POST _bulk
{"index": {"_index": "product-catalog"}}
{"title_semantic": "Red shoes", "title_non_semantic": "Red shoes", "product_id": "12345" }
{"index": {"_index": "product-catalog"}}
{"title_semantic": "Black shirt", "title_non_semantic": "Black shirt", "product_id": "6789" }
{"index": {"_index": "product-catalog"}}
{"title_semantic": "Blue hat", "title_non_semantic": "Blue hat", "product_id": "0000" }

Search against automatic semantic enrichment index

After the data is ingested, you can query the index:

POST product-catalog/_search?size=1
{
  "query": {
    "match":{
      "title_semantic":{
        "query": "crimson footwear"
      }
    }
  }
}

The following is the response:

{
    "took": 240,
    "timed_out": false,
    "_shards": {
        "total": 0,
        "successful": 0,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 7.6092715,
        "hits": [
            {
                "_index": "product-catalog",
                "_id": "Q61b35YBAkHYIP5jIOWH",
                "_score": 7.6092715,
                "_source": {
                    "title_semantic": "Red shoes",
                    "title_non_semantic": "Red shoes",
                    "title_semantic_embedding": {
                        "feet": 0.85673976,
                        "dress": 0.48490667,
                        "##wear": 0.26745942,
                        "pants": 0.3588211,
                        "hats": 0.30846077,
                        ...
                    },
                    "product_id": "12345"
                }
            }
        ]
    }
}

The search successfully matched the document with Red shoes despite the query using crimson footwear, demonstrating the power of semantic search. The system automatically generated semantic embeddings for the document (truncated here for brevity) which enable these intelligent matches based on meaning rather than exact keywords.

Comparing search results

By running a similar query against the nonsemantic index title_non_semantic, you can confirm that nonsemantic fields can’t search based on context:

GET product-catalog/_search?size=1
{
  "query": {
    "match":{
      "title_non_semantic":{
        "query": "crimson footwear"
      }
    }
  }
}

The following is the search response:

{
    "took": 398,
    "timed_out": ,
    "_shards": {
        "total": 0,
        "successful": 0,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": ,
        "hits": []
    }
}

Limitations of automatic semantic enrichment

Automatic semantic search is most effective when applied to small-to-medium sized fields containing natural language content, such as movie titles, product descriptions, reviews, and summaries. Although semantic search enhances relevance for most use cases, it might not be optimal for certain scenarios:

Very long documents – The current sparse model processes only the first 8,192 tokens of each document for English. For multilingual documents, it’s 512 tokens. For lengthy articles, consider implementing document chunking to ensure complete content processing.
Log analysis workloads – Semantic enrichment significantly increases index size, which might be unnecessary for log analysis where exact matching typically suffices. The additional semantic context rarely improves log search effectiveness enough to justify the increased storage requirements.

Consider these limitations when deciding whether to implement automatic semantic enrichment for your specific use case.

Conclusion

Automatic semantic enrichment marks a significant advancement in making sophisticated search capabilities accessible to all OpenSearch Serverless users. By eliminating the traditional complexities of implementing semantic search, search developers can now enhance their search functionality with minimal effort and cost. Our feature supports multiple languages and collection types, with a pay-as-you-use pricing model that makes it economically viable for various use cases. Benchmark results are promising, particularly for English language searches, showing both improved relevance and reduced latency. However, although semantic search enhances most scenarios, certain use cases such as processing extremely long articles or log analysis might benefit from alternative approaches.

We encourage you to experiment with this feature and discover how it can optimize your search implementation so you can deliver better search experiences without the overhead of managing ML infrastructure. Check out the video and tech documentation for additional details.

About the Authors

Jon Handler is Director of Solutions Architecture for Search Services at Amazon Web Services, based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have generative AI, search, and log analytics workloads for OpenSearch. Prior to joining AWS, Jon’s career as a software developer included four years of coding a large-scale, eCommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a Ph. D. in Computer Science and Artificial Intelligence from Northwestern University.

Arjun Kumar Giri is a Principal Engineer at AWS working on the OpenSearch Project. He primarily works on OpenSearch’s artificial intelligence and machine learning (AI/ML) and semantic search features. He is passionate about AI, ML, and building scalable systems.

Siddhant Gupta is a Senior Product Manager (Technical) at AWS, spearheading AI innovation within the OpenSearch Project from Hyderabad, India. With a deep understanding of artificial intelligence and machine learning, Siddhant architects features that democratize advanced AI capabilities, enabling customers to harness the full potential of AI without requiring extensive technical expertise. His work seamlessly integrates cutting-edge AI technologies into scalable systems, bridging the gap between complex AI models and practical, user-friendly applications.

Create an OpenSearch dashboard with Amazon OpenSearch Service

2025-08-05 Smita Singh

Post Syndicated from Smita Singh original https://aws.amazon.com/blogs/big-data/create-an-opensearch-dashboard-with-amazon-opensearch-service/

Effective log analysis is essential for maintaining the health and performance of modern applications. Amazon OpenSearch Service stands out as a powerful, fully managed solution for log analytics and observability. With its advanced indexing, full-text search, and real-time analytics capabilities, OpenSearch Service makes it possible for organizations to seamlessly ingest, process, and search log data across diverse sources—including AWS services like Amazon CloudWatch, VPC Flow Logs, and more.

With OpenSearch Dashboards, you can turn indexed log data into actionable visualizations that reveal insights and help detect anomalies. By querying data stored in OpenSearch Service, you can extract relevant information and display it using a variety of visualization types—such as line charts, bar graphs, pie charts, heatmaps, and more. These tools make it effortless to monitor system behavior, spot trends, and quickly identify issues in your environment.

This post demonstrates how to harness OpenSearch Dashboards to analyze logs visually and interactively. With this solution, IT administrators, developers, and DevOps engineers can create custom dashboards to monitor system behavior, detect anomalies early, and troubleshoot issues faster through interactive charts and graphs.

Solution overview

In this post, we show how to create an index pattern in OpenSearch Dashboards, create two types of visualizations, and display these visualizations on a custom dashboard. We also demonstrate how to export and import visualizations.

Prerequisites

Before diving into log analysis with OpenSearch Dashboards, you must have the following:

A properly configured OpenSearch Service domain
A working log collection and ingestion pipeline

Amazon OpenSearch Service 101: Create your first search application with OpenSearch guides you through setting up your OpenSearch Service domain and configuring the log ingestion pipeline.

For this post, we work with the following log sources, which have already been ingested into an OpenSearch Service cluster as part of the prerequisite steps:

Access OpenSearch Dashboards

Complete the following steps to access OpenSearch Dashboards:

On the OpenSearch Service console, choose Domains in the navigation pane.
Check if your domain status shows as Active.
Choose your domain to open the domain details page.
Choose the OpenSearch Dashboards URL to open it in a new browser window.

Authenticate into OpenSearch Dashboards using one of the supported methods.

Create an index pattern

After you’re logged in to OpenSearch Dashboards, you must create an index pattern. An index pattern allows OpenSearch Dashboards to locate indexes to search. Complete the following steps

In OpenSearch Dashboards, expand the navigation pane and choose Dashboard Management under Management.
Choose Index patterns in the navigation pane.

Choose Create index pattern.
For Index pattern name, enter a name (for example, log-aws-cloudtrail-*).
Choose Next step.

For Time field¸ choose @timestamp.
Choose Create index pattern.

Create visualizations

Now that the index pattern is created, let’s create some visualizations. For this post, we create a pie chart and an area graph.

Create a pie chart

Complete the following steps to create a pie chart:

In OpenSearch Dashboards, choose Visualize in the navigation pane.

Choose Create visualization.

Choose Pie as the visualization type.
For Source¸ choose log-aws-cloudtrail-*.

Under Buckets¸ choose Add and Split slices.

For Aggregation, choose Terms.

For Field, choose eventName.
For Size, enter 10.

Leave all other parameters as default and choose Update.
Choose Save to save the visualization.

Sample ndjson file for the pie chart – EventNamePie.ndjson

Please refer Export and import visualizations for how to import the samples.

The following screenshot shows our pie chart, which displays different types of events and their occurrence percentage in the last 30 minutes.

Create an area graph

Complete the following steps to create an area graph:

In OpenSearch Dashboards, choose Visualize in the navigation pane.
Choose Create visualization.
Choose Area as the visualization type.

For Source¸ choose log-aws-cloudtrail-*.

Under Buckets¸ choose Add and X-axis.

For Aggregation, choose Date Histogram.
For Field, choose @timestamp.
Leave all other parameters as default and choose Update

Under Advanced¸ choose Add and Split series.

For Aggregation, choose Terms.
For Field, choose eventName.
For Size, enter 10.
Leave all other parameters as default and choose Update.

Choose Save.
Update the time range to Last 60 minutes.
Choose Refresh and Save.

The following screenshot shows an area graph with different types of events and their occurrence count in the last 60 minutes.

Sample ndjson file for Area chart – EventNameArea.ndjson

Please refer Export and import visualizations for how to import the samples.

Create a dashboard

Now we will combine the visualizations we just created into a dashboard. A dashboard serves as a customizable interface that consolidates multiple visualizations, saved searches, and various content into a comprehensive view of data. Users can combine diverse visual elements—including charts, graphs, metrics, and tables—into a single cohesive display that can be arranged and resized on a flexible grid layout. You can simultaneously apply filters and time ranges across multiple visualizations, creating a coordinated analytical experience. Complete the following steps to create a dashboard:

In OpenSearch Dashboards, choose Dashboards in the navigation pane.
Choose Create new dashboard.

Choose Add on the menu bar.

Search for and choose the visualizations you created.

You can resize panels by dragging their corners to adjust dimensions. To modify the layout arrangement, you can drag the top portion of panels, which allows you to organize them horizontally in a row formation. When working with tabular visualizations, the system provides a convenient option to export your results in CSV format for further analysis or reporting purposes.

Choose Save.
Change the time range to Last 60 minutes.
Choose Refresh and Save.

Sample ndjson file for dashboard – CloudTrailSummary.ndjson

Please refer Export and import visualizations for how to import the samples.

The following screenshot shows the CloudTrail dashboard displaying both visualizations.

Export and import visualizations

In OpenSearch, an NDJSON file is used to import and export saved objects, such as dashboards, visualizations, maps, and index template. The NDJSON file provides a streamlined approach for handling large datasets by representing each JSON object on a separate line. This format enables efficient import/export operations, simplified data migration between environments, and seamless sharing of complex dashboard configurations. Organizations can back up and restore critical visualizations, saved searches, and dashboard settings while maintaining their integrity. The format’s structure reduces memory overhead during large transfers and improves processing speed for bulk operations. NDJSON’s human-readable nature also facilitates troubleshooting and manual editing when necessary, making it an invaluable tool for maintaining OpenSearch Dashboards deployments across development, testing, and production environments.

Export a visualization

Complete the following steps to export a visualization:

In OpenSearch Dashboards, choose Saved objects in the navigation pane.
Search for and select your object (in this case, a visualization), then choose Export.

The NDJSON file is downloaded in your local host.

Import a visualization

Complete the following steps to import a visualization:

In OpenSearch Dashboards, choose Saved objects in the navigation pane.
Choose Import.
Choose the first NDJSON file to be imported from your local host.
Select Create new objects with random IDs.
Choose Import.

Choose Done.

Choose Import.

You can now open the imported object.

The following screenshot shows our updated dashboard.

Clean up

To clean up your resources, delete the OpenSearch Service domain and relevant information stored or visualizations created on the domain. You will not be able to recover the data after you delete it.

On the OpenSearch Service console, choose Domains in the navigation pane.
Select the domain you created and choose Delete.

Conclusion

OpenSearch Dashboards is a powerful tool for transforming raw log data into actionable visualizations that drive insights and decision-making. In this post, we’ve shown how to create visualizations like pie charts and area graphs, build comprehensive dashboards, and efficiently export and import your work using NDJSON files. By using the fully managed OpenSearch Service features, organizations can focus on extracting valuable insights rather than managing infrastructure, ultimately enhancing their observability posture and operational efficiency.

To further enhance your OpenSearch proficiency, consider exploring advanced visualization options such as heat maps, gauge charts, and geographic maps that can represent your data in more specialized ways. Implementing automated alerting based on predefined thresholds will help you proactively identify anomalies before they become critical issues. You can also use OpenSearch’s powerful machine learning capabilities for sophisticated anomaly detection and predictive analytics to gain deeper insights from your log data. As your implementation grows, customizing security settings with fine-grained access controls will provide appropriate data visibility across different teams in your organization.

For comprehensive learning resources, refer to the Amazon OpenSearch Service Developer Guide, watch Create your first OpenSearch Dashboard on YouTube, explore best practices in Amazon OpenSearch blog posts, and gain hands-on experience through workshops available in AWS Workshops.

About the Authors

Smita Singh is a Senior Solutions Architect at AWS. She focuses on defining technical strategic vision and works on architecture, design, and implementation of modern, scalable platforms for large-scale global enterprises and SaaS providers. She is a data, analytics, and generative AI enthusiast and is passionate about building innovative, highly scalable, resilient, fault-tolerant, self-healing, multi-tenant platform solutions and accelerators.

Dipayan Sarkar is a Specialist Solutions Architect for Analytics at AWS, where he helps customers modernize their data platform using AWS analytics services. He works with customers to design and build analytics solutions, enabling businesses to make data-driven decisions.

Build a multi-tenant healthcare system with Amazon OpenSearch Service

2025-08-05 Ezat Karimi

Post Syndicated from Ezat Karimi original https://aws.amazon.com/blogs/big-data/build-a-multi-tenant-healthcare-system-with-amazon-opensearch-service/

Healthcare systems face significant challenges managing vast amounts of data while maintaining regulatory compliance, security, and performance. This post explores strategies for implementing a multi-tenant healthcare system using Amazon OpenSearch Service.

In this context, tenants are distinct healthcare entities, sharing a common platform while maintaining isolated data environments. Hospital departments (like emergency, radiology, or patient care), clinics, insurance providers, laboratories, and research institutions are examples of these tenants.

In this post, we address common multi-tenancy challenges and provide actionable solutions for security, tenant isolation, workload management, and cost optimization across diverse healthcare tenants.

Understanding multi-tenant healthcare systems

Tenants in healthcare systems are diverse and have distinct requirements. For example, emergency departments need round-the-clock high availability with subsecond response times for patient care, along with strict access controls for sensitive trauma data. Research departments run complex, resource-intensive queries that are less time-sensitive but require robust anonymization protocols to maintain HIPAA compliance when working with patient data. Outpatient clinics operate during business hours with predictable usage patterns and moderate performance requirements. Administrative systems focus on financial data with scheduled batch processing and require access to billing information and insurance details only. Specialty departments like radiology and cardiology have unique requirements specific to the tasks they perform. For example, radiology requires high storage capacity and bandwidth for large medical imaging files, along with specialized indexing for metadata searches.

Understanding tenant requirements is essential for designing an effective multi-tenant architecture that balances resource sharing with appropriate isolation while maintaining regulatory compliance.

Isolation models

OpenSearch’s hierarchical structure consists of four main levels. At the top level is the domain, which contains one or more nodes that store and search data. Within the domain, indexes contain documents and define how they are stored and searched. Documents are individual records or data entries stored within an index, and each document consists of fields, which are individual data elements with specific data types and values.

Indexes include mappings and settings. Mappings define the schema of documents within an index, specifying field names and their data types. Settings configure various operational aspects of an index, such as the number of primary shards and replica shards.

The isolation model in a multi-tenant OpenSearch system can be at domain, index, or document level. The model you select for your multi-tenant healthcare system impacts security, performance, and cost. For healthcare organizations, as depicted in the following diagram, a hybrid approach typically works best, matching isolation levels to tenant requirements.

Multi-Tenancy Isolation Models

For emergency units, consider domain-based isolation, providing maximum separation by deploying separate OpenSearch domains for each tenant. Although it’s more expensive, it reduces resource contention and provides consistent performance for critical systems. This isolation simplifies compliance by physically separating sensitive patient data.

Similarly, for clinical research tenants, consider domain-based isolation despite its higher cost. Given the resource-intensive nature of research workloads—particularly genomics and population health analytics that process terabytes of data with complex algorithms—separate domains prevent these demanding operations from impacting other tenants.

For specialty departments like cardiology or radiology, where workload patterns are similar but data access patterns are distinct, index-based isolation is a good fit. These departments share a domain but maintain separate indexes. This approach provides strong logical separation while allowing more efficient resource utilization.

For administrative departments where data is less sensitive, a document-based isolation is sufficient, and multiple tenants can share the same indexes.

Data modeling

Effective data modeling is crucial for maintaining performance and manageability in a multi-tenant healthcare system. Implement a consistent index naming convention that incorporates tenant identifiers, data categories, and time periods like {tenant-id}-{data-type}-{time-period}. Tenant-id identifies the entity, for example, cardiology. Examples of the indexes are cardiology-ecg-202505 or radiology-mri-202505. This structured approach simplifies data management, access control, and lifecycle policies.

Consider data access patterns when designing your index strategy. For example, for time-series data like vital signs or telemetry readings, time-based indexes with appropriate rotation policies will improve performance and simplify data lifecycle management.

For shared indexes using document-based isolation, make sure tenant identifiers are consistently applied and indexed for efficient tenant-based filtering.

Tenant management

Effective tenant management prevents resource contention and provides consistent performance across your healthcare system. Implement a hybrid isolation model using a tenant tiering framework based on criticality. The following table outlines the tiering framework.

Tier

Tenant Type

SLA

Resources

Operational Limits

Behavior

Tier-1 Critical

Emergency departments

ICU/Critical care

Operating rooms

24/7 SLA 99.99%

Sub-second response

RPO: Near zero

RTO: Less than 15 minutes

Guaranteed 50% CPU, 50% memory

Dedicated hot nodes

2 replicas minimum

100 concurrent requests

20 MB request size

30-second timeout

No throttling

Priority query routing

Preemptive scaling

Automatic failover

Tier-2 Urgent

Inpatient units

Specialty departments

Radiology/imaging

24/7 SLA with 99.9% availability

Less than 2-second response time

RPO: Less than 15 minutes

RTO: Less than 1 hour

Guaranteed 30% CPU, 30% memory

Shared hot nodes

1–2 replicas

50 concurrent requests

15 MB request size 60-second timeout

Limited throttling during peak

High-priority query routing

Automatic scaling

Automated recovery

Tier-3 Standard

Outpatient clinics

Primary care

Pharmacy

Laboratory

Business hours SLA (8 AM – 8 PM)

99.5% availability Less than 5-second response time

RPO: Less than 1 hour

RTO: Less than 4 hours

Guaranteed 15% CPU, 15% memory

Shared nodes

1 replica

25 concurrent requests

10 MB request size

120-second timeout

Moderate throttling

Standard query routing

Fair thread allocation

Manual scaling

Business hours optimization

Tier-4 Research

Clinical research

Genomics

Population health

Best-effort

SLA, up to 99% availability

Less than 30-second response time

RPO: Less than 24 hours

RTO: Less than 24 hours

Guaranteed 5% CPU, 10% memory

Burst capacity during off-hours

0–1 replicas

10 concurrent requests

50 MB request size

300-second timeout

Aggressive throttling during pea

Compute optimized instances

Large heap size

Research-specific plugins

Tier-5 Admin

Billing/finance

HR systems

Inventory management

Business hours SLA (9 AM – 5 PM) 99% availability Less than 10-second response time

RPO: Less than 24 hours

RTO: Less than 48 hours

No guaranteed resources

Burstable capacity

UltraWarm for historical

1 replica

5 concurrent requests

5 MB request size

180-second timeout

Aggressive throttling

Lowest priority query routing

Batch processing preferred

Off-hours scheduling

Cost-optimized storage

Workload management

When you use OpenSearch Service for multi-tenancy, you must balance your tenants’ workloads to make sure you deliver the resources needed for each to ingest, store, and query their data effectively. A multi-layered workload management framework with a rule-based proxy and OpenSearch Service workload management can effectively address these challenges. For details, see this blog post: Workload management in OpenSearch-based multi-tenant centralized logging platforms.

Security framework

Healthcare data requires protection due to its sensitive nature and regulatory requirements. The OpenSearch Service security framework is specifically adaptable to healthcare’s strict security requirements. This framework combines multiple layers of access control, captured in the following diagram.

Multi-tenancy fine-grained access control in Amazon OpenSearch Service

An important step in this framework is role mapping, where AWS Identity and Access Management (IAM) roles are mapped to OpenSearch roles for role-based access control (RBAC). For example, emergency departments can implement the ED-Physician role with access to patient history across departments, and the ED-Staff role with access to vital sign and medication data. You can map emergency department roles to OpenSearch roles.

With document-level security (DLS), you can limit emergency department staff to active emergency patients only while restricting access to discharged patient data only to the providers who treat them. With field-level security (FLS), you can allow access to medical fields while masking billing and insurance data. You can also provide attribute-based access control (ABAC) policies to allow access based on patient status.

For research departments, you can create Clinical-Researcher roles with read-only access to datasets. Integrate academic roles to research roles to make sure researchers only access data for studies they’re authorized to conduct. For DLS, implement filters to make sure researchers only access approved documents. Use FLS to anonymize HIPAA identifiers. For research departments, ABAC should evaluate the study phase and researcher’s location.

For outpatient care, you can define Medical-Provider roles with full access to assigned patients’ records and Medical-Assistant roles limited to documenting vitals and preliminary information. For DLS, limit access to patient’s physicians only. For FLS, restrict access to medical data only, while limiting nurses to demographic, vital signs, and medication fields. Implement time-aware ABAC policies that restrict access to patient records outside of business hours unless the provider is on-call.

For administrative departments, you can implement Financial roles with access to charge codes and insurance information but no clinical data. For DLS, make sure financial staff only access billing documents. FLS provides access to billing codes, dates of service, and insurance fields while masking clinical content.

For specialty departments, you can create technician roles like Radiologist and apply DLS filters restricting access to the data to these roles and referring physician. FLS allows technicians to see clinical history and previous findings specific to their specialty.

Enable comprehensive audit logging to track access to protected health information. Configure these logs to capture user identity, accessed data, timestamp, and access context. These audit trails are essential for regulatory compliance and security investigations.

Managing data lifecycle for compliance

Index State Management (ISM) capabilities combined with OpenSearch Service storage tiering enable an elaborate approach to data lifecycle management that can be tailored to diverse tenant needs. ISM provides a robust way to automate the lifecycle of indexes by defining policies that dictate transitions between Hot, UltraWarm, and Cold storage tiers based on criteria like index age or size. This automation can extend to the archive tier by creating snapshots, which are stored in Amazon Simple Storage Service (Amazon S3) and can be further transitioned to Amazon S3 Glacier or Glacier Deep Archive for long-term, cost-effective archiving of data that is rarely accessed.

Frame your ISM policy along the following guidelines:

Keep critical patient data in hot storage for 180 days to support immediate access. Transition to warm storage for the next 12 months, then move to cold storage for years 2–7. After 7 years, archive records.

For research data benefits, use project-based lifecycle policies rather than strictly time-based transitions. Maintain research datasets in hot storage during active project phases, regardless of data age. When projects conclude, transition data to warm storage for 12 months. Move to cold storage for the following 5–10 years based on research significance. Afterward, archive records.

For outpatient clinic data, keep recent patient records in hot storage for 90 days, aligning index rollover with typical follow-up windows. Transition to warm storage for months 4–18, coinciding with common annual visit patterns. Move to cold storage for years 2–7. Archive after 7 years.

For administrative data, maintain current fiscal year data in hot storage with automated transitions at year-end boundaries. Move previous fiscal year data to warm storage for 18 months to support auditing and reporting. Transition to cold storage for years 3–7. Archive financial records after 7 years.

For the specialty department data, keep recent metadata in hot storage for 90 days while moving large files, like images, to warm storage after 30 days. Transition complete records to cold storage after 18 months. Archive after 7 years.

Cost management and optimization

Healthcare organizations must balance performance requirements with budget constraints. Effective cost management strategies are essential for sustainable operations.

Implement comprehensive tagging strategies that mirror your index naming conventions to create a unified approach to resource management and cost tracking. Like the index naming convention, design your tags to identify the tenant, application, and data type (for example, “tenant=cardiology” or “application=ecg“). These tags, combined with AWS Cost Explorer, provide visibility into expenses across organizational boundaries.

Develop cost allocation mechanisms that fairly distribute expenses across different tenants. Consider implementing tiered pricing structures based on data volume, query complexity, and service-level guarantees. This approach aligns costs with value and encourages efficient resource utilization.

Optimize your infrastructure based on tenant-specific metrics and usage patterns. Monitor document counts, indexing rates, and query patterns to right-size your clusters and node types. Use different instance types for different workloads—for example, use compute-optimized instances for query-intensive applications.

Use OpenSearch Service storage tiering to optimize costs. UltraWarm provides significant cost savings for infrequently accessed data while maintaining reasonable query performance. Cold storage offers even greater savings for data that’s rarely accessed but must be retained for compliance purposes.

Conclusion

Building a multi-tenant healthcare system on OpenSearch Service requires careful planning and implementation. By addressing tenant isolation, security, data lifecycle management, workload control, and cost optimization, you can create a platform that delivers improved operational efficiency while maintaining strict compliance with healthcare regulations.

About the Authors

Ezat Karimi is a Senior Solutions Architect at AWS, based in Austin, TX. Ezat specializes in designing and delivering modernization solutions and strategies for database applications. Working closely with multiple AWS teams, Ezat helps customers migrate their database workloads to the AWS Cloud.

Jon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have vector, search, and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included 4 years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor’s of the Arts from the University of Pennsylvania, and a Master’s of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.

Trusted identity propagation using IAM Identity Center for Amazon OpenSearch Service

2025-07-25 Sohaib Katariwala

Post Syndicated from Sohaib Katariwala original https://aws.amazon.com/blogs/big-data/trusted-identity-propagation-using-iam-identity-center-for-amazon-opensearch-service/

Enterprise customers of Amazon OpenSearch Service require comprehensive security controls with seamless authentication and authorization mechanisms when accessing data in provisioned domains and Amazon OpenSearch Serverless collections. Security teams within these organizations must not only maintain compliance with enterprise policies but also need to make sure that their users can access data securely, with robust identity management. AWS IAM Identity Center is a popular mechanism for identity management that provides single sign-on (SSO) capabilities for these enterprise customers. IAM Identity Center can use Security Assertion Markup Language (SAML) with both OpenSearch Service provisioned domains and OpenSearch Serverless. Now, by using trusted identity propagation, IAM Identity Center provides a new, direct method for accessing data in OpenSearch Service.

In this post, we outline how you can take advantage of this new access method to simplify data access using the OpenSearch UI and still maintain robust role-based access control for your OpenSearch data.

Trusted identity propagation overview

Trusted identity propagation in IAM Identity Center adds the identity context of a user to a role when accessing OpenSearch Service, which in turn uses this context to authorize and scope OpenSearch data access. This simplifies the authentication and authorization flow for customers because the applications access the data on their behalf. Users or user agents need not be present between the application and the backend services for this authorization to happen, unlike methods like SAML where a user agent needs to be present between these entities as a go-between for exchanging assertions. This flexibility helps simplify accessing a wide variety of data sources such as data residing within the Amazon Virtual Private Cloud (Amazon VPC) of an OpenSearch Service domain, or an OpenSearch Serverless collection. By using the OpenSearch UI, you can additionally simplify the backend connections, resulting in seamless access to the data. The following figures shows how the identity propagation works with OpenSearch Service.

Prerequisites

Before starting to use IAM Identity Center with OpenSearch Service, there are a few options that you must enable. To start, set up an organization or account instance of IAM Identity Center following the instruction in this guide. For OpenSearch Service-provisioned domains, you must enable the IAM Identity Center (IDC) Authentication –new option. You can do this though AWS CloudFormation, OpenSearch REST API, AWS SDK, or the AWS Management Console.

To enable Identity Center using the console

To add the capability for an existing provisioned domain, go to the OpenSearch Service console and navigate to the Security configuration tab and choose Edit.

After this step, or if you are creating new domain, select the check box for IAM Identity Center (IDC) Authentication – new.
You have various options to choose for Subject Key and Roles Key depending upon how you want to establish your role-based access control discussed later in this post. For now, select UserName for Subject Key and GroupName for Roles Key.

For OpenSearch Serverless, choose Serverless in the navigation pane, then Security and Authentication. Choose Edit in the IAM Identity Center (IdC) authentication – new section.

Select the checkbox for Authenticate with IAM Identity Center, and then choose Save.

Select the checkbox for Authentication with IAM Identity Center under Single sign-on authentication, when creating an OpenSearch UI application. For step-by-step instructions on how to create an OpenSearch UI application, see Creating an OpenSearch UI application

After these steps, you’re ready to configure IAM Identity Center by creating new users and groups, or by using existing user identities.

Propagating IAM Identity Center identities

Currently, adding single sign-on authentication with IAM Identity Center can be done while setting up a new OpenSearch UI application. Use the following steps to create a new OpenSearch UI application. Note that single sign-on currently cannot be turned on after an application is created. After single sign-on is enabled, you should see an AWS managed application under Applications in the Identity Center console.

Assigning users and groups

After the application is created and the status shows as Active, you need to assign users and groups to the application. This assignment is important and recommended because these assignments determine the scope and permissions for data access within OpenSearch Service. To do this, select the application you created in the previous steps in the OpenSearch Service console. Here, you will see an option for IAM Identity Center user and groups under Single sign-on authentication. Choose Assign users and groups and select the appropriate Identity Center users and groups.

For OpenSearch Serverless, you must create a new data access policy or add a rule to an existing one to grant IAM Identity Center principals appropriate permissions to access the collections. For example, the following figure shows a data access policy that grants specific permissions to a one user with Rule 1 and provide a more restrictive permission to a group with Rule 2.

At this point your OpenSearch Service domains, OpenSearch Serverless collections and OpenSearch UI are set up for identity propagation.

Fine grained access control for IAM Identity Center identities

Fine grained access control is a role-based access control for OpenSearch Service that provides security at index, document, and field levels for provisioned domains. You can choose what aspects of identity context you propagate to OpenSearch Service. You can choose between UserId, UserName, and Email for your Subject keys, and GroupId and GroupName for your Roles key. This configuration is important because the values of the properties in the identity context are used to match exactly with the user and backend role mapping within OpenSearch Service provisioned domains. Note that if IAM Identity Center sign-on isn’t enabled, OpenSearch Service can only evaluate the request signature with AWS signature Version 4. This means that the role your OpenSearch UI will use won’t contain identity context for authorization. To complete authorization, add the values of the identity context fields to the OpenSearch role mapping. See Mapping roles to users under Managing permissions. Role mapping can be done using OpenSearch REST API, AWS SDK, or using OpenSearch Dashboards.

To map roles using OpenSearch Dashboards

From the menu icon on the top left corner or your screen, select Management, Security, Roles, <Your role>.
Choose the Mapped Users tab and select Manage mapping.

When mapping the role, make sure that you enter the values corresponding to the Subject key. This value must be the same as in your identity context. Additionally, use the Roles key to assign access-based IAM Identity Center groups.

With OpenSearch Serverless, the granularity of access control is at the index level so you will need to add additional rules in the Data Access Policy to control principals who can access collections or indices within a collection.

Verifying identity propagation

The final step is to verify identity propagation.

Open the OpenSearch UI application and select IAM Identity Center from the Login drop down.

After you complete the login process with IAM Identity Center, the OpenSearch UI will open. Choose the user icon in the lower left corner of the screen to verify that it’s your correct principal from Identity Center. It should match the Identity Center property you chose earlier.

To verify correct identity propagation, choose the Dev tools icon just above the user profile icon in the bottom left corner of the screen.
Select the correct OpenSearch domain or OpenSearch Serverless collection data source in the top right corner of the screen and run a _search query. You should see results from the data source confirming that the identity is correctly propagated to OpenSearch Service.

Conclusion

In this post, we showed you how to use Trusted Identity Propagation using IAM Identity Center for Amazon OpenSearch Service, providing a streamlined approach to secure data access while maintaining robust access controls. This solution offers several key benefits:

Simplified authentication: By eliminating the need for user agents between applications and backend services, the solution streamlines the authentication process compared to traditional SAML-based approaches.
Enhanced security: The integration maintains comprehensive security controls while providing seamless authentication and authorization mechanisms for both OpenSearch Service provisioned domains and Amazon OpenSearch Serverless collections.
Flexible identity management: Organizations can use existing IAM Identity Center implementations to manage user access, making it easier to maintain compliance with enterprise security policies.
Fine-grained access control: The solution supports detailed access control at the index, document, and field level for provisioned domains, allowing organizations to implement precise security measures.

Get started implementing this solution in your environment today!

For more information about identity management and security best practices with OpenSearch Service, we recommend:

About the authors

Muthu Pitchaimani is a Search Specialist with Amazon OpenSearch Service. He builds large-scale search applications and solutions. Muthu is interested in the topics of networking and security, and is based out of Austin, Texas.

Sohaib Katariwala is a Senior Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service based out of Chicago, IL. His interests are in all things data and analytics. More specifically he loves to help customers use AI in their data strategy to solve modern day challenges.

Amazon OpenSearch Service 101: How many shards do I need

2025-07-25 Tom Burns

Post Syndicated from Tom Burns original https://aws.amazon.com/blogs/big-data/amazon-opensearch-service-101-how-many-shards-do-i-need/

Customers new to Amazon OpenSearch Service often ask how many shards their indexes need. An index is a collection of shards, and an index’s shard count can affect both indexing and search request efficiency. OpenSearch Service can take in large amounts of data, split it into smaller units called shards, and distribute those shards across a dynamically changing set of instances.

In this post, we provide some practical guidance for determining the ideal shard count for your use case.

Shards overview

A search engine has two jobs: create an index from a set of documents, and search that index to compute the best-matching documents. If your index is small enough, a single partition on a single machine can store that index. For larger document sets, in cases where a single machine isn’t large enough to hold the index, or in cases where a single machine can’t compute your search results effectively, the index can be split into partitions. These partitions are called shards in OpenSearch Service. Each document is routed to a shard that is calculated, by default, by using a hash of that document’s ID.

A shard is both a unit of storage and a unit of computation. OpenSearch Service distributes shards across nodes in your cluster to parallelize index storage and processing. If you add more nodes to an OpenSearch Service domain, it automatically rebalances the shards by moving them between the nodes. The following figure illustrates this process.

Diagram showing how source documents are indexed and partitioned into shards.

As storage, primary shards are distinct from one another. The document set in one shard doesn’t overlap the document set in other shards. This approach makes shards independent for storage.

As computational units, shards are also distinct from one another. Each shard is an instance of an Apache Lucene index that computes results on the documents it holds. Because all the shards comprise the index, they must function together to process each query and update request for that index. To process a query, OpenSearch Service routes the query to a data node for a primary or replica shard. Each node computes its response locally and the shard responses get aggregated for a final response. To process a write request (a document ingestion or an update to an existing document), OpenSearch Service routes the request to the appropriate shards—primary then replica. Because most writes are bulk requests, all shards of an index are typically used.

The two different types of shards

There are two kinds of shards in OpenSearch Service—primary and replica shards. In an OpenSearch index configuration, the primary shard count serves to partition data and the replica count is the number of full copies of the primary shards. For example, if you configure your index with 5 primary shards and 1 replica, you will have a total of 10 shards: 5 primary shards and 5 replica shards.

The primary shard receives writes first. The primary shard passes documents to the replica shards for indexing by default. OpenSearch Service’s O-series instances use segment replication. By default, OpenSearch Service waits for acknowledgment from replica shards before confirming a successful write operation to the client. Primary and replica shards provide redundant data storage, enhancing cluster resilience against node failures. In the following example, the OpenSearch Service domain has three data nodes. There are two indexes, green (darker) and blue (lighter), each of which has three shards. The primary for each shard is outlined in red. Each shard also has a single replica, shown with no outline.

Diagram showing how shards and replica shards are distributed between 3 Opensearch instances.

OpenSearch Service maps shards to nodes based on a number of rules. The most basic rule is that primary and replica shards are never put onto the same node. If a data node fails, OpenSearch Service automatically creates another data node and re-replicates shards from surviving nodes and redistributes them across the cluster. If primary shards fail, replica shards are promoted to primary to prevent data loss and provide continuous indexing and search operations.

So how many shards? Focus on storage first

There are three types of workloads that OpenSearch users typically maintain: search for applications, log analytics, and as a vector database. Search workloads are read-heavy and latency sensitive. They are typically tied to an application to enhance search capability and performance. A common pattern is to index the data in relational databases to give users more filtering capabilities and provide efficient full text search.

Log workloads are write-heavy and receive data continuously from applications and network devices. Typically, that data is put into a changing set of indexes, based on an indexing time period like daily or monthly depending on the use case. Instead of indexing based on time period, you can use rollover policies based on index size or document count to make sure shard sizing best practices are followed.

Vector database workloads use the OpenSearch Service k-Nearest Neighbor (k-NN) plugin to index vectors from an embedding pipeline. This enables semantic search, which measures relevance using the meaning of words rather than exactly matching the words. The embedding model from the pipeline maps multimodal data into a vector with potentially thousands of dimensions. OpenSearch Service searches across vectors to provide search results.

To determine the optimal number of shards for your workload, start with your index storage requirements. Although storage requirements can vary widely, a general guideline is to use 1:1.25 using the source data size to estimate usage. Also, compression algorithms default to performance, but can also be adjusted to reduce size. When it comes to shard sizes, consider the following based on the workload:

Search – Divide your total storage requirement by 30 GB.
- If search latency is high, use a smaller shard size (as low as 10GB), increasing the shard count and parallelism for query processing.
- Increasing the shard count reduces the amount of work at each shard (they have fewer documents to process), but also increases the amount of networking for distributing the query and gathering the response. To balance these competing concerns, examine your average hit count. If your hit count is high, use smaller shards. If your hit count is low, use larger shards.
Logs – Divide the storage requirement for your desired time period by 50 GB.
- If using an ISM policy with rollover, consider setting the min_size parameter to 50 GB.
- Increasing the shard count for logs workloads similarly improves parallelism. However, most queries for logs workloads have a small hit count, so query processing is light. Logs workloads work well with larger shard sizes, but shard smaller if your query workload is heavier.
Vector – Divide your total storage requirement by 50 GB.
- Reducing shard size (as low as 10GB) can improve search latency when your vector queries are hybrid with a heavy lexical component. Conversely, increasing shard size (as high as 75GB) can improve latency when your queries are pure vector queries.
- OpenSearch provides other optimization methods for vector databases, including vector quantization and disk-based search.
- K-NN queries behave like highly filtered search queries, with low hit counts. Therefore, larger shards tend to work well. Be prepared to shard smaller when your queries are heavier.

Don’t be afraid of using a single shard

If your index contains less than the advised shard size (30 GB for search and 50 GB otherwise), we recommend that you use a single primary shard. Although it’s tempting to add more shards thinking it will improve performance, this approach can actually be counterproductive for smaller datasets because of the added networking. Each shard you add to an index distributes the processing of requests for that index across an additional node. Performance can decrease because there is overhead for distributed operations to split and combine results across nodes when a single node can do it sufficiently.

Set the shard count

When you create an OpenSearch index, you set the primary and replica counts for that index. Because you can’t dynamically change the primary shard count of an existing index, you have to make this important configuration decision before indexing your first document.

You set the shard count using the OpenSearch create index API. For example (provide your OpenSearch Service domain endpoint URL and index name):

curl -XPUT https://<opensearch-domain-endpoint>/<index-name> -H 'Content-Type: application/json' -d \
 '{
    "settings": {
        "index" : {
            "number_of_shards": 3,
            "number_of_replicas": 1
        }
    }
 }'

If you have a single index workload, you only have to do this one time, when you create your index for the first time. If you have a rolling index workload, you create a new index regularly. Use the index template API to automate applying settings to all new indexes whose name matches the template. The following example sets the shard count for any index whose name has the prefix logs (provide your OpenSearch service endpoint domain URL and index template name):

curl -XPUT https://<opensearch-domain-endpoint>/_index_template/<template-name> -H 'Content-Type: application/json' -d \
 '{
   "index_patterns": ["logs*"],
   "template": {
        "settings": {
            "index" : {
                "number_of_shards": 3,
                "number_of_replicas": 1
            }
       }
  }
}'

Conclusion

This post outlined basic shard sizing best practices, but additional factors might influence the ideal index configuration you choose to implement in your OpenSearch Service domain.

For more information about sharding, refer to Optimize OpenSearch index shard sizes or Shard strategy. Both resources can help you better fine-tune your OpenSearch Service domain to optimize its available compute resources.

About the authors

Photo of Tom Burns Tom Burns is a Senior Cloud Support Engineer at AWS and is based in the NYC area. He is a subject matter expert in Amazon OpenSearch Service and engages with customers for critical event troubleshooting and improving the supportability of the service. Outside of work, he enjoys playing with his cats, playing board games with friends, and playing competitive games online.

Photo of Ron Miller Ron Miller is a Solutions Architect based out of NYC, supporting transportation and logistics customers. Ron works closely with AWS’s Data & Analytics specialist organization to promote and support OpenSearch. On the weekend, Ron is a shade tree mechanic and trains to complete triathlons.

Workload management in OpenSearch-based multi-tenant centralized logging platforms

2025-07-22 Ezat Karimi

Post Syndicated from Ezat Karimi original https://aws.amazon.com/blogs/big-data/workload-management-in-opensearch-based-multi-tenant-centralized-logging-platforms/

Modern architectures use many different technologies to achieve their goals. Service-oriented architectures, cloud services, distributed tracing, and more create streams of telemetry and other signal data. Each of these data streams becomes a tenant in your logging backend. If your company runs more than one application, the IT team will frequently centralize the storage and processing of log data, making each application a tenant in the overall observability system.

When you use Amazon OpenSearch Service to store and analyze log data, whether as a developer or an IT admin, you must balance these tenants to make sure you deliver the resources to each tenant so they can ingest, store, and query their data. In this post, we present a multi-layered workload management framework with a rules-based proxy and OpenSearch workload management that can effectively address these challenges.

Example use case

In this post, we discuss GlobalLog, a fictional company supporting healthcare, finance, retail, security, and internal tenants, that built a centralized logging system with OpenSearch Service. Each tenant has unique logging patterns based on their business requirements. Financial tenants generate complex, high-volume queries, healthcare tenants focus on compliance with moderate volume logs and queries, and retail tenants experience seasonal spikes with heavy dashboard usage. Internal operation has steady, low-volume logs and infrequent, simple queries. Security monitoring has a constant, high-volume presence throughout the system.

As the GlobalLog’s tenants scaled, operational challenges emerged: high-priority tenant performance suffered during peak hours, resource-intensive queries caused node crashes, and unpredictable traffic created instability. Limited visibility into tenant resource usage complicated troubleshooting and cross-domain security investigations. The platform required robust handling of varied workload patterns and peak usage times, strong performance isolation to prevent tenant interference, and scalability to manage 30% annual data growth.

Solution overview

GlobalLog implemented a comprehensive workload management strategy to handle the diverse demands of its tenants. The solution manages the tenancy with a tiered tenant placement, a rule-based proxy layer that shapes incoming traffic based on the tenant profile and the status of the OpenSearch cluster, and an OpenSearch workload management plugin that provides granular resource governance, allocating resources such as CPU and memory proportionally to each tenant’s tier. The monitoring component provides the intelligence that the solution needs to do its assessment and make reactive and proactive scaling and performance-related decisions by adjusting the traffic governance rules and policies in a timely manner.

The following diagram illustrates the architecture.

GlobalLog multi tier workload management

Tenant tiering and placement

GlobalLog categorized tenants into four tiers based on their logging requirements (volume, retention, query frequency) and allocated resources accordingly. The tiering system, enforced through the integrated proxy layer and OpenSearch workload management, prevents resource over-allocation while making sure service levels match business priorities. The specification for each tier is detailed in the following table.


Tier	SLA	Resources	Limits	Behavior
Tier 1 (Enterprise Critical) High-volume complex queries (over 100 concurrent)	24/7 SLA with 99.99% availability	50% CPU 50% Memory	100 concurrent requests 20 MB request size 180-second timeout	Priority query routing and dedicated search threads
Tier 2 (Business Critical) Moderate volume compliance-oriented queries	Business hours SLA with 99.9% availability	30% CPU 25% memory	50 concurrent requests 10 MB request size 120-second timeout	Compliance-optimized search pipelines
Tier 3 (Business Standard) Variable volume dashboard-heavy usage	Standard business hours support no SLA	10% CPU 20% Memory	25 concurrent requests 5 MB request size 60-second timeout	Burst capacity for seasonal peaks
Tier 4 (Basic) Internal IT operations development environments	Best-effort support no SLA	10% CPU 5%Memory	10 concurrent requests, 2 MB request size 30-second timeout	Automated query optimization for efficiency Operations, seasonal businesses

GlobalLog’s integrated architecture streamlines its cost allocation and resource distribution model. Financial industry tenants pay premium rates for their guaranteed high-performance resources, effectively subsidizing the infrastructure that supports more variable workloads. These tenants are categorized into Tier 1. Healthcare tenants benefit from isolation that enforces compliance without bearing the full cost of dedicated infrastructure. These tenants are categorized into Tier 2. Retail tenants are categorized into Tier 3 because they appreciate the elastic capacity during peak seasons without maintaining excess capacity year-round. Tier 4 includes the administrative tenants with access to enterprise-grade logging at affordable rates through efficient resource sharing.

This balanced ecosystem helps GlobalLog maintain profitability while delivering appropriate service levels to every tenant regardless of their industry-specific workload characteristics.

In the next sections, we discuss GlobalLog’s workload management system.

Proxy layer

GlobalLog’s continuous feedback loop architecture creates a dynamic ecosystem that optimizes resource allocation across diverse tenant workloads in OpenSearch Service. Rather than depending on static configurations, the architecture monitors performance metrics and tenant usage patterns to drive scaling and remediation decisions. This makes sure the system evolves as workloads change over time.

The proxy layer core component is the OpenSearch Traffic Gateway, which functions as an intermediary between clients and OpenSearch clusters. It features the following key capabilities:

Rule-based traffic shaping through pattern matching for request paths and parameters
Metrics for resource cost allocation
Traffic replay

GlobalLog expanded the capabilities of their OpenSearch Traffic Gateway through a comprehensive set of enhancements focused on centralization, dynamism, and adaptability. At the core of this evolution, they used Amazon DynamoDB as the centralized repository for critical gateway data. This central database houses the complete ecosystem of rules, policies, and tenant profiles, alongside crucial operational data including metrics, usage patterns, SLA requirements, tier configurations, and real-time cluster status information.

Beyond this centralization effort, GlobalLog transformed the gateway with a dynamic mechanism capable of real-time adjustments and responsive decision-making. This architectural shift allows the gateway to react intelligently to changing conditions rather than following predetermined pathways.

Additionally, GlobalLog implemented an adaptive rule system with sophisticated contextual awareness. The system now activates specific rules based on current cluster states and tenant usage patterns, enabling precise resource allocation and protection mechanisms that respond to actual conditions rather than hypothetical scenarios. The system implements time-based rule scheduling, providing flexibility by allowing different limits and policies to automatically engage during specific periods such as maintenance windows. This provides optimal performance while accommodating necessary system operations.

The solution implements a continuous feedback loop between the monitoring system, the OpenSearch cluster, and the proxy layer, where the flow of performance metrics and tenant usage patterns drive automated, rule-based scaling and optimization decisions, helping the system evolve as workloads change over time. In this architecture, Amazon EventBridge triggers an AWS Lambda function when predefined criteria are met (for example, an anomaly is detected in OpenSearch Service), resulting in the Lambda function taking steps to remediate the issues by adjusting the traffic shaping rules and uploading them to the OpenSearch Traffic Gateway. To stabilize the feedback loop, GlobalLog took the following steps:

Added dampening mechanisms to prevent rapid rule changes
Implemented gradual adjustment patterns instead of binary switches
Created circuit breakers for automatic fallback to baseline rules

OpenSearch workload management layer

GlobalLog implemented tenant-level admission control and reactive query management through OpenSearch workload management. The system uses workload management to define resource limits, based on tenant criticality, providing efficient resource allocation and preventing bottlenecks.

A key component of OpenSearch’s workload management is its workload groups. A workload group refers to a logical grouping of queries, typically used for managing resources and prioritizing workloads. GlobalLog uses workload groups to manage resource allocation based on the previously defined tenant tiers. Enterprise-critical workloads receive substantial CPU and memory guarantees, providing consistent performance for financial operations. Business Critical tenants operate with moderate resource guarantees, and Standard and Basic tiers function with more constrained resources, reflecting their lower priority status. The following example shows the workload group setup for Enterprise Critical and Business Critical tiers:

PUT _wlm/workload_group
{
  “name”: “Enterprise Critical”,
  “resiliency_mode”: “enforced”,
  “resource_limits”: {
    “cpu”: 0.5,
    “memory”: 0.5
  }

PUT _wlm/workload_group
{
  “name”: “Business Critical”,
  “resiliency_mode”: “enforced”,
  “resource_limits”: {
    “cpu”: 0.3,
    “memory”: 0.25
  }

OpenSearch responds with the set resource limits and the ID for the workload group for Enterprise Critical tier tenants:

{
"_id":"preXpc67RbKKeCyka72_Gw",
  "name":"analytics",
 "resiliency_mode":"enforced",
 "resource_limits":{
"cpu":0.5,
 "memory":0.5
  },
 "updated_at":1726270204642
}

To use a workload group, use the following code:

GET finindex/_search
Host: localhost:9200
Content-Type: application/json
workloadGroupId: preXpc67RbKKeCyka72_Gw
{
 "query": {
      "match": {
             "field_name": "value"
     }
}
}

Real-world use cases

In this section, we discuss two scenarios where GlobalLog’s workload management system helped the company overcome various challenges.

Scenario 1: Security incident response

During a critical security incident, GlobalLog faced a complex challenge of managing simultaneous log access requests from multiple business units, each with different priority levels. At the highest tier were security and financial operations (Tier 1), followed by healthcare operations (Tier 2), retail operations (Tier 3), and internal operations (Tier 4).

At the proxy layer, GlobalLog gave precedence to security and financial tenant queries while implementing specific limitations for other units. Healthcare operations were capped at 15 concurrent queries, retail operations were restricted to 5 queries per minute, and internal operations had their date ranges narrowed.

OpenSearch workload management and the proxy layer played a crucial role by maintaining the security team’s query priority while managing resource pressure, including the cancellation of complex retail queries during high CPU usage.

Scenario 2: End-of-month reporting

During month-end reporting periods, GlobalLog successfully handled intensive analytical workloads from multiple tenants. The implementation of time-based rules proved particularly effective, with prioritizing Tier 4 tenants for batch reporting during regular end-of-month off-peak business hours. The following code shows an example of GlobalLog rules in this context. The first rule allows Tier 4 tenants to run reports during off-peak business hours, and the second rule denies Tier 4 tenants’ requests during business hours:

monthlyReportAllowRule",
"ruleConfig": {
"tenantTier": "tier4$",
"timeWindow": {
     		"dayOfMonth": "25-30",
      		"hours": "18:00-8:00"
    	      }
               }
monthlyReportDenyRule",
"ruleConfig": {
"tenantTier": "^tier4$",
"timeWindow": {
     	       "dayOfMonth": "25-30",
      	       "hours": "9:00-18:00"
    	      }
               }

The system dynamically adjusted resource allocation for Tier 4 tenants for the off-peak hours (6:00 PM – 8:00 AM) using the OpenSearch workload management API.

This comprehensive approach proved highly successful in managing peak reporting periods, facilitating both system stability and optimal performance across all tenant tiers.

Conclusion

The integration of proxy-layer traffic shaping with the OpenSearch workload management plugin in a continuous feedback loop architecture achieved resiliency, stable performance, and fair resource allocation while supporting diverse business priorities. The implementation discussed in this post demonstrates that large-scale, multi-tenant logging environments can effectively serve diverse business needs on shared infrastructure while maintaining performance and cost-efficiency.

Try out these workload management techniques for your own use case and share your feedback and questions in the comments.

About the Authors

Jon Handler is a Senior Principal Solutions Architect at AWS based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have vector, search, and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included 4 years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor’s of the Arts from the University of Pennsylvania, and a Master’s of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.

Optimizing vector search using Amazon S3 Vectors and Amazon OpenSearch Service

2025-07-21 Sohaib Katariwala

Post Syndicated from Sohaib Katariwala original https://aws.amazon.com/blogs/big-data/optimizing-vector-search-using-amazon-s3-vectors-and-amazon-opensearch-service/

NOTE: As of July 15, the Amazon S3 Vectors Integration with Amazon OpenSearch Service is in preview release and is subject to change.

The way we store and search through data is evolving rapidly with the advancement of vector embeddings and similarity search capabilities. Vector search has become essential for modern applications such as generative AI and agentic AI, but managing vector data at scale presents significant challenges. Organizations often struggle with the trade-offs between latency, cost, and accuracy when storing and searching through millions or billions of vector embeddings. Traditional solutions either require substantial infrastructure management or come with prohibitive costs as data volumes grow.

We now have a public preview of two integrations between Amazon Simple Storage Service (Amazon S3) Vectors and Amazon OpenSearch Service that give you more flexibility in how you store and search vector embeddings:

Cost-optimized vector storage: OpenSearch Service managed clusters using service-managed S3 Vectors for cost-optimized vector storage. This integration will support OpenSearch workloads that are willing to trade off higher latency for ultra-low cost and still want to use advanced OpenSearch capabilities (such as hybrid search, advanced filtering, geo filtering, and so on).
One-click export from S3 Vectors: One-click export from an S3 vector index to OpenSearch Serverless collections for high-performance vector search. Customers who build natively on S3 Vectors will benefit from being able to use OpenSearch for faster query performance.

By using these integrations, you can optimize cost, latency, and accuracy by intelligently distributing your vector workloads by keeping infrequent queried vectors in S3 Vectors and using OpenSearch for your most time-sensitive operations that require advanced search capabilities such as hybrid search and aggregations. Further, OpenSearch performance tuning capabilities (that is, quantization, k-nearest neighbor (knn) algorithms, and method-specific parameters) help to improve the performance with little compromise of cost or accuracy.

In this post, we walk through this seamless integration, providing you with flexible options for vector search implementation. You’ll learn how to use the new S3 Vectors engine type in OpenSearch Service managed clusters for cost-optimized vector storage and how to use one-click export from S3 Vectors to OpenSearch Serverless collections for high-performance scenarios requiring sustained queries with latency as low as 10ms. By the end of this post, you’ll understand how to choose and implement the right integration pattern based on your specific requirements for performance, cost, and scale.

Service overview

Amazon S3 Vectors is the first cloud object store with native support to store and query vectors with sub-second search capabilities, requiring no infrastructure management. It combines the simplicity, durability, availability, and cost-effectiveness of Amazon S3 with native vector search functionality, so you can store and query vector embeddings directly in S3. Amazon OpenSearch Service provides two complementary deployment options for vector workloads: Managed Clusters and Serverless Collections. Both harness Amazon OpenSearch’s powerful vector search and retrieval capabilities, though each excels in different scenarios. For OpenSearch users, the integration between S3 Vectors and Amazon OpenSearch Service offers unprecedented flexibility in optimizing your vector search architecture. Whether you need ultra-fast query performance for real-time applications or cost-effective storage for large-scale vector datasets, this integration lets you choose the approach that best fits your specific use case.

Understanding Vector Storage Options

OpenSearch Service provides multiple options for storing and searching vector embeddings, each optimized for different use cases. The Lucene engine, which is OpenSearch’s native search library, implements the Hierarchical Navigable Small World (HNSW) method, offering efficient filtering capabilities and strong integration with OpenSearch’s core functionality. For workloads requiring additional optimization options, the Faiss engine (Facebook AI Similarity Search) provides implementations of both HNSW and IVF (Inverted File Index) methods, along with vector compression capabilities. HNSW creates a hierarchical graph structure of connections between vectors, enabling efficient navigation during search, while IVF organizes vectors into clusters and searches only relevant subsets during query time. With the introduction of the S3 engine type, you now have a cost-effective option that uses Amazon S3’s durability and scalability while maintaining sub-second query performance. With this variety of options, you can choose the most suitable approach based on your specific requirements for performance, cost, and accuracy. For instance, if your application requires sub-50 ms query responses with efficient filtering, Faiss’s HNSW implementation is the best choice. Alternatively, if you need to optimize storage costs while maintaining reasonable performance, the new S3 engine type would be more appropriate.

Solution overview

In this post, we explore two primary integration patterns:

OpenSearch Service managed clusters using service-managed S3 Vectors for cost-optimized vector storage.

For customers already using OpenSearch Service domains who want to optimize costs while maintaining sub-second query performance, the new Amazon S3 engine type offers a compelling solution. OpenSearch Service automatically manages vector storage in Amazon S3, data retrieval, and cache optimization, eliminating operational overhead.

One-click export from an S3 vector index to OpenSearch Serverless collections for high-performance vector search.

For use cases requiring faster query performance, you can migrate your vector data from an S3 vector index to an OpenSearch Serverless collection. This approach is ideal for applications that require real-time response times and gives you the benefits that come with Amazon OpenSearch Serverless, including advanced query capabilities and filters, automatic scaling and high availability, and no administration. The export process automatically handles schema mapping, vector data transfer, index optimization, and connection configuration.

The following illustration shows the two integration patterns between Amazon OpenSearch Service and S3 Vectors.

Prerequisites

Before you begin, make sure you have:

An AWS account
Access to Amazon S3 and Amazon OpenSearch Service
An OpenSearch Service domain (for the first integration pattern)
Vector data stored in S3 Vectors (for the second integration pattern)

Integration pattern 1: OpenSearch Service managed cluster using S3 Vectors

To implement this pattern:

Create an OpenSearch Service Domain using OR1 instances on OpenSearch version 2.19.
1. While creating the OpenSearch Service domain, choose the Enable S3 Vectors as an engine option in the Advanced features section.
Sign in to OpenSearch Dashboards and open Dev tools. Then create your knn index and specify s3vector as the engine.

PUT my-first-s3vector-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
        "my_vector1": {
          "type": "knn_vector",
          "dimension": 2,
          "space_type": "l2",
          "method": {
            "engine": "s3vector"
          }
        },
        "price": {
          "type": "float"
        }
    }
  }
}

Index your vectors using the Bulk API:

POST _bulk
{ "index": { "_index": "my-first-s3vector-index", "_id": "1" } }
{ "my_vector1": [2.5, 3.5], "price": 7.1 }
{ "index": { "_index": "my-first-s3vector-index", "_id": "3" } }
{ "my_vector1": [3.5, 4.5], "price": 12.9 }
{ "index": { "_index": "my-first-s3vector-index", "_id": "4" } }
{ "my_vector1": [5.5, 6.5], "price": 1.2 }
{ "index": { "_index": "my-first-s3vector-index", "_id": "5" } }
{ "my_vector1": [4.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-first-s3vector-index", "_id": "6" } }
{ "my_vector1": [1.5, 2.5], "price": 12.2 }

Run a knn query as usual:

GET my-first-s3vector-index/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector1": {
        "vector": [2.5, 3.5],
        "k": 2
      }
    }
  }
}

The following animation demonstrates steps 2-4 above.

Integration pattern 2: Export S3 vector indexes to OpenSearch Serverless

To implement this pattern:

Navigate to the AWS Management Console for Amazon S3 and select your S3 vector bucket.

Select a vector index that you want to export. Under Advanced search export, select Export to OpenSearch.

Alternatively, you can:

Navigate to the OpenSearch Service console.
Select Integrations from the navigation pane.
Here you will see a new Integration Template to Import S3 vectors to OpenSearch vector engine – preview. Select Import S3 vector index.

You will now be brought to the Amazon OpenSearch Service integration console with the Export S3 vector index to OpenSearch vector engine template pre-selected and pre-populated with your S3 vector index Amazon Resource Name (ARN). Select an existing role that has the necessary permissions or create a new service role.

Scroll down and choose Export to start the steps to create a new OpenSearch Serverless collection and copy data from your S3 vector index into an OpenSearch knn index.

You will now be taken to the Import history page in the OpenSearch Service console. Here you will see the new job that was created to migrate your S3 vector index into the OpenSearch serverless knn index. After the status changes from In Progress to Complete, you can connect to the new OpenSearch serverless collection and query your new OpenSearch knn index.

The following animation demonstrates how to connect to the new OpenSearch serverless collection and query your new OpenSearch knn index using Dev tools.

Cleanup

To avoid ongoing charges:

For Pattern 1:
- Delete the OpenSearch index using S3 vectors.
- Delete the OpenSearch Service managed cluster if no longer needed.

For Pattern 2:
- Delete the import task from the Import history section of the OpenSearch Service console. Deleting this task will remove both the OpenSearch vector collection and the OpenSearch Ingestion pipeline that was automatically created by the import task.

Conclusion

The innovative integration between Amazon S3 Vectors and Amazon OpenSearch Service marks a transformative milestone in vector search technology, offering unprecedented flexibility and cost-effectiveness for enterprises. This powerful combination delivers the best of both worlds: The renowned durability and cost efficiency of Amazon S3 merged seamlessly with the advanced AI search capabilities of OpenSearch. Organizations can now confidently scale their vector search solutions to billions of vectors while maintaining control over their latency, cost, and accuracy. Whether your priority is ultra-fast query performance with latency as low as 10ms through OpenSearch Service, or cost-optimized storage with impressive sub-second performance using S3 Vectors or implementing advanced search capabilities in OpenSearch, this integration provides the perfect solution for your specific needs. We encourage you to get started today by trying S3 Vectors engine in your OpenSearch managed clusters and testing the one-click export from S3 vector indexes to OpenSearch Serverless.

For more information, visit:

About the Authors

Mark Twomey is a Senior Solutions Architect at AWS focused on storage and data management. He enjoys working with customers to put their data in the right place, at the right time, for the right cost. Living in Ireland, Mark enjoys walking in the countryside, watching movies, and reading books.

Sorabh Hamirwasia is a senior software engineer at AWS working on the OpenSearch Project. His primary interest include building cost optimized and performant distributed systems.

Pallavi Priyadarshini is a Senior Engineering Manager at Amazon OpenSearch Service leading the development of high-performing and scalable technologies for search, security, releases, and dashboards.

Bobby Mohammed is a Principal Product Manager at AWS leading the Search, GenAI, and Agentic AI product initiatives. Previously, he worked on products across the full lifecycle of machine learning, including data, analytics, and ML features on SageMaker platform, deep learning training and inference products at Intel.

Integrating Amazon OpenSearch Ingestion with Amazon RDS and Amazon Aurora

2025-07-18 Michael Torio

Post Syndicated from Michael Torio original https://aws.amazon.com/blogs/big-data/integrating-amazon-opensearch-ingestion-with-amazon-rds-and-amazon-aurora/

Unlocking powerful search capabilities for millions of items should be fast, accurate, and effortless while maintaining high relevance. Relational databases are a popular storage method for structured data, and organizations use them extensively to store their core business information. Although relational databases excel at storing and retrieving structured data, they often struggle with searching through large blocks of unstructured text and, for performance reasons, typically don’t index all columns.

In contrast, search engines such as OpenSearch index all fields, enabling rich search capabilities, including semantic search, and powerful aggregations for summarizing and analyzing numeric data. Traditionally, organizations have managed complex, inefficient, and expensive data synchronization processes, including extract, transform, and load (ETL) pipelines, to keep their search indices up to date with their databases. Those looking to enhance their applications with advanced search features need a simpler solution that can maintain search index synchronization with their databases without the overhead of managing custom data sync processes.

We are happy to announce the general availability of the integration of Amazon OpenSearch Service with Amazon Relational Database Service (Amazon RDS) and Amazon Aurora. This new integration eliminates complex data pipelines and enables near real-time data synchronization between Amazon Aurora (including Amazon Aurora MySQL-Compatible Edition and Amazon Aurora PostgreSQL-Compatible Edition) and Amazon RDS databases (including Amazon RDS for MySQL and Amazon RDS for PostgreSQL), and Amazon OpenSearch Service, unlocking advanced search capabilities such as hybrid search, ranked results, and faceted search on transactional databases. You can now deliver low-latency, high-throughput search results, live inventory updates, and personalized recommendations while focusing on creating exceptional customer experiences instead of managing data synchronization. This integration reduces the operational burden of maintaining complex ETL pipelines, reducing costs while providing instant data availability for search operations.

Amazon OpenSearch Ingestion provides near real-time data synchronization between Amazon Aurora or Amazon RDS and OpenSearch Service. Select your Aurora or RDS database, and OpenSearch Ingestion handles the rest, supporting both Aurora MySQL or RDS for MySQL (8.0 and above) and Aurora PostgreSQL or RDS for PostgreSQL (16 and above).

Solution overview

Here’s how these services work together:

Data ingestion – OpenSearch Ingestion first loads your database snapshot from Amazon Simple Storage Service (Amazon S3), where Aurora or Amazon RDS has exported the initial data. It then uses Aurora or Amazon RDS change data capture (CDC) streams to replicate further changes in near real time and indexes them into OpenSearch Service. This automated process keeps your data is consistently up to date in OpenSearch, making it readily available for search and analysis without manual intervention.
Real-time querying – OpenSearch Service offers powerful query capabilities that enable you to perform complex searches and aggregations on your data. Whether you need to analyze trends, detect anomalies, or perform search queries to return relevant results for your application, OpenSearch Service provides the tools you need.

The following diagram illustrates the solution architecture for Amazon Aurora as a source:

A diagram of a processAI-generated content may be incorrect.

Getting Started

Configuring Your Database Source

Before setting up synchronization, you need to configure your source database’s logging settings. For Aurora MySQL, configure your cluster parameter group with enhanced binary log settings. For Amazon RDS, enable basic binary logging or logical replication through your instance parameter group settings. These logging configurations enable OpenSearch Ingestion to capture and replicate data changes from your database.

The sample HR database with Aurora MySQL is a good example to show how this integration works.

Before creating the view, we now explain how OpenSearch will represent this data. OpenSearch mappings define how documents and their fields are stored and indexed, similar to how a database schema defines tables and columns. The OpenSearch Ingestion pipeline uses dynamic mappings by default, automatically converting Aurora or Amazon RDS data types to appropriate OpenSearch field types. For example, database DATE fields become OpenSearch date types, and numeric fields are mapped to corresponding OpenSearch numeric types. Although you can customize these mappings using index templates, the default mappings typically handle common data types correctly, including dates, numbers, and text fields.

GET employees/_mapping

To demonstrate the integration’s ability to handle complex data relationships, we now examine how OpenSearch Ingestion handles joined data. We create a view in the sample HR database that combines information from multiple related tables into a single, searchable document in OpenSearch. This approach shows how you can transform normalized database structures into denormalized documents that are optimized for search operations.

This employee_details view combines data from multiple tables, creating a rich, denormalized representation of employee information. When replicated to OpenSearch, this view becomes a single, comprehensive document for each employee. This structure is ideal for search operations, allowing for fast and complex queries across what were originally separate tables. For example, you could easily search for employees in a specific department and country or analyze salary distributions across regions—queries that would be more complex and potentially slower in the original normalized database structure.

In the pipeline configuration shown in the following screenshot, you can check how OpenSearch Ingestion connects to the HR database. The configuration identifies the source database and the specific tables we want to replicate. While we created a view to understand the data relationships, the pipeline tracks changes from the underlying base tables (employees, departments, locations, and regions). OpenSearch Ingestion automatically maintains these relationships, which means that changes to these tables are properly reflected in your OpenSearch index, keeping your search data consistent with your source database.

In the gif shown below, you can see a demo of setting up this integration using the visual editor of OpenSearch Ingestion.

You can also specify index mapping templates to map your Aurora or Amazon RDS fields to the correct fields in your OpenSearch Service indexes.

For a comprehensive overview of configuration settings for the pipeline, refer to the OpenSearch Data Prepper documentation. You must set up AWS Identity and Access Management (IAM) roles for the pipeline. For instructions, refer to Configure the pipeline role.

After you configure the integration in OpenSearch Ingestion, the pipeline automatically creates indexes that you can view in OpenSearch Dashboards. OpenSearch Ingestion first triggers an automatic export of your Aurora or Amazon RDS database to Amazon S3, then loads this snapshot data from S3 into your OpenSearch cluster to create the initial indices. After this initial load, OpenSearch Ingestion continually captures changes using binary logs (binlog) for MySQL-based databases or write-ahead logs (WAL) for PostgreSQL-based databases. This way, your OpenSearch indices stay synchronized with your source database in near real time. You can view your indices in OpenSearch Dashboards by invoking:

GET _cat/indices

Example response:

Demonstrating near real time data synchronization

Consider the first five entries in the employee table:

When you make changes to your database, OpenSearch Ingestion updates Amazon OpenSearch Service with the change data. For example, the following code updates an employee’s salary:

UPDATE hr.employees SET SALARY = 26000 WHERE EMPLOYEE_ID = 100;

Amazon Aurora sends out a change notice, your OpenSearch Ingestion pipeline picks it up, and OpenSearch Ingestion sends the changed record to OpenSearch in near real time. You can verify this with an OpenSearch query:

GET employees/_search

Important details about this feature:

Monitoring – Track pipeline performance and data synchronization through CloudWatch metrics and the OpenSearch Ingestion dashboard
Limitations – Requires same-Region and same-account deployment, primary keys for optimal synchronization, and currently has no data definition language (DDL) statement support

Conclusion

Amazon Aurora or Amazon RDS integration with Amazon OpenSearch Service is now generally available in all AWS Regions where OpenSearch Ingestion is available.

To learn more, refer to the AWS documentation for Aurora or Amazon RDS integration with Amazon OpenSearch Service:

About the authors

Michael Torio is an Associate Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service based out of Mountain View, CA. Michael enjoys helping customers leverage cloud technologies to solve their business challenges.

Arjun Nambiar is a Product Manager with Amazon OpenSearch Service. He focuses on ingestion technologies that enable ingesting data from a wide variety of sources into Amazon OpenSearch Service at scale. Arjun is interested in large-scale distributed systems and cloud-centered technologies, and is based out of Seattle, Washington.

Introducing Amazon S3 Vectors: First cloud storage with native vector support at scale (preview)

2025-07-16 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/introducing-amazon-s3-vectors-first-cloud-storage-with-native-vector-support-at-scale/

Today, we’re announcing the preview of Amazon S3 Vectors, a purpose-built durable vector storage solution that can reduce the total cost of uploading, storing, and querying vectors by up to 90 percent. Amazon S3 Vectors is the first cloud object store with native support to store large vector datasets and provide subsecond query performance that makes it affordable for businesses to store AI-ready data at massive scale.

Vector search is an emerging technique used in generative AI applications to find similar data points to given data by comparing their vector representations using distance or similarity metrics. Vectors are numerical representation of unstructured data created from embedding models. You generate vectors using embedding models for fields inside your document and store vectors into S3 Vectors to search semantically.

S3 Vectors introduces vector buckets, a new bucket type with a dedicated set of APIs to store, access, and query vector data without provisioning any infrastructure. When you create an S3 vector bucket, you organize your vector data within vector indexes, making it simple for running similarity search queries against your dataset. Each vector bucket can have up to 10,000 vector indexes, and each vector index can hold tens of millions of vectors.

After creating a vector index, when adding vector data to the index, you can also attach metadata as key-value pairs to each vector to filter future queries based on a set of conditions, for example, dates, categories, or user preferences. As you write, update, and delete vectors over time, S3 Vectors automatically optimizes the vector data to achieve the best possible price-performance for vector storage, even as the datasets scale and evolve.

S3 Vectors is also natively integrated with Amazon Bedrock Knowledge Bases, including within Amazon SageMaker Unified Studio, for building cost-effective Retrieval-Augmented Generation (RAG) applications. Through its integration with Amazon OpenSearch Service, you can lower storage costs by keeping infrequent queried vectors in S3 Vectors and then quickly move them to OpenSearch as demands increase or to support real-time, low-latency search operations.

With S3 Vectors, you can now economically store the vector embeddings that represent massive amounts of unstructured data such as images, videos, documents, and audio files, enabling scalable generative AI applications including semantic and similarity search, RAG, and build agent memory. You can also build applications to support a wide range of industry use cases including personalized recommendations, automated content analysis, and intelligent document processing without the complexity and cost of managing vector databases.

S3 Vectors in action
To create a vector bucket, choose Vector buckets in the left navigation pane in the Amazon S3 console and then choose Create vector bucket.

Enter a vector bucket name and choose the encryption type. If you don’t specify an encryption type, Amazon S3 applies server-side encryption with Amazon S3 managed keys (SSE-S3) as the base level of encryption for new vectors. You can also choose server-side encryption with AWS Key Management Service (AWS KMS) keys (SSE-KMS). To learn more about managing your vector bucket, visit S3 Vector buckets in the Amazon S3 User Guide.

Now, you can create a vector index to store and query your vector data within your created vector bucket.

Enter a vector index name and the dimensionality of the vectors to be inserted in the index. All vectors added to this index must have exactly the same number of values.

For Distance metric, you can choose either Cosine or Euclidean. When creating vector embeddings, select your embedding model’s recommended distance metric for more accurate results.

Choose Create vector index and then you can insert, list, and query vectors.

To insert your vector embeddings to a vector index, you can use the AWS Command Line Interface (AWS CLI), AWS SDKs, or Amazon S3 REST API. To generate vector embeddings for your unstructured data, you can use embedding models offered by Amazon Bedrock.

If you’re using the latest AWS Python SDKs, you can generate vector embeddings for your text using Amazon Bedrock using following code example:

# Generate and print an embedding with Amazon Titan Text Embeddings V2.
import boto3 
import json 

# Create a Bedrock Runtime client in the AWS Region of your choice. 
bedrock= boto3.client("bedrock-runtime", region_name="us-west-2") 

The text strings to convert to embeddings.
texts = [
"Star Wars: A farm boy joins rebels to fight an evil empire in space", 
"Jurassic Park: Scientists create dinosaurs in a theme park that goes wrong",
"Finding Nemo: A father fish searches the ocean to find his lost son"]

embeddings=[]
#Generate vector embeddings for the input texts
for text in texts:
        body = json.dumps({
            "inputText": text
        })    
        # Call Bedrock's embedding API
        response = bedrock.invoke_model(
        modelId='amazon.titan-embed-text-v2:0',  # Titan embedding model 
        body=body)   
        # Parse response
        response_body = json.loads(response['body'].read())
        embedding = response_body['embedding']
        embeddings.append(embedding)

Now, you can insert vector embeddings into the vector index and query vectors in your vector index using the query embedding:

# Create S3Vectors client
s3vectors_client = boto3.client('s3vectors', region_name='us-west-2')

# Insert vector embedding
s3vectors.put_vectors( vectorBucketName="channy-vector-bucket",
  indexName="channy-vector-index", 
  vectors=[
{"key": "v1", "data": {"float32": embeddings[0]}, "metadata": {"id": "key1", "source_text": texts[0], "genre":"scifi"}},
{"key": "v2", "data": {"float32": embeddings[1]}, "metadata": {"id": "key2", "source_text": texts[1], "genre":"scifi"}},
{"key": "v3", "data": {"float32": embeddings[2]}, "metadata": {"id": "key3", "source_text":  texts[2], "genre":"family"}}
],
)

#Create an embedding for your query input text
# The text to convert to an embedding.
input_text = "List the movies about adventures in space"

# Create the JSON request for the model.
request = json.dumps({"inputText": input_text})

# Invoke the model with the request and the model ID, e.g., Titan Text Embeddings V2. 
response = bedrock.invoke_model(modelId="amazon.titan-embed-text-v2:0", body=request)

# Decode the model's native response body.
model_response = json.loads(response["body"].read())

# Extract and print the generated embedding and the input text token count.
embedding = model_response["embedding"]

# Performa a similarity query. You can also optionally use a filter in your query
query = s3vectors.query_vectors( vectorBucketName="channy-vector-bucket",
  indexName="channy-vector-index",
  queryVector={"float32":embedding},
  topK=3, 
  filter={"genre":"scifi"},
  returnDistance=True,
  returnMetadata=True
  )
results = query["vectors"]
print(results)

To learn more about inserting vectors into a vector index, or listing, querying, and deleting vectors, visit S3 vector buckets and S3 vector indexes in the Amazon S3 User Guide. Additionally, with the S3 Vectors embed command line interface (CLI), you can create vector embeddings for your data using Amazon Bedrock and store and query them in an S3 vector index using single commands. For more information, see the S3 Vectors Embed CLI GitHub repository.

Integrate S3 Vectors with other AWS services
S3 Vectors integrates with other AWS services such as Amazon Bedrock, Amazon SageMaker, and Amazon OpenSearch Service to enhance your vector processing capabilities and provide comprehensive solutions for AI workloads.

Create Amazon Bedrock Knowledge Bases with S3 Vectors
You can use S3 Vectors in Amazon Bedrock Knowledge Bases to simplify and reduce the cost of vector storage for RAG applications. When creating a knowledge base in the Amazon Bedrock console, you can choose the S3 vector bucket as your vector store option.

In Step 3, you can choose the Vector store creation method either to create an S3 vector bucket and vector index or choose the existing S3 vector bucket and vector index that you’ve previously created.

For detailed step-by-step instructions, visit Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases in the Amazon Bedrock User Guide.

Using Amazon SageMaker Unified Studio
You can create and manage knowledge bases with S3 Vectors in Amazon SageMaker Unified Studio when you build your generative AI applications through Amazon Bedrock. SageMaker Unified Studio is available in the next generation of Amazon SageMaker and provides a unified development environment for data and AI, including building and texting generative AI applications that use Amazon Bedrock knowledge bases.

You can choose your knowledge bases using the S3 Vectors created through Amazon Bedrock when you build generative AI applications. To learn more, visit Add a data source to your Amazon Bedrock app in the Amazon SageMaker Unified Studio User Guide.

Export S3 vector data to Amazon OpenSearch Service
You can balance cost and performance by adopting a tiered strategy that stores long-term vector data cost-effectively in Amazon S3 while exporting high priority vectors to OpenSearch for real-time query performance.

This flexibility means your organizations can access OpenSearch’s high performance (high QPS, low latency) for critical, real-time applications, such as product recommendations or fraud detection, while keeping less time-sensitive data in S3 Vectors.

To export your vector index, choose Advanced search export, then choose Export to OpenSearch in the Amazon S3 console.

Then, you will be brought to the Amazon OpenSearch Service Integration console with a template for S3 vector index export to OpenSearch vector engine. Choose Export with pre-selected S3 vector source and a service access role.

It will start the steps to create a new OpenSearch Serverless collection and migrate data from your S3 vector index into an OpenSearch knn index.

Choose the Import history in the left navigation pane. You can see the new import job that was created to make a copy of vector data from your S3 vector index into the OpenSearch Serverless collection.

Once the status changes to Complete, you can connect to the new OpenSearch serverless collection and query your new OpenSearch knn index.

To learn more, visit Creating and managing Amazon OpenSearch Serverless collections in the Amazon OpenSearch Service Developer Guide.

Now available
Amazon S3 Vectors, and its integrations with Amazon Bedrock, Amazon OpenSearch Service, and Amazon SageMaker are now in preview in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), and Asia Pacific (Sydney) Regions.

Give S3 Vectors a try in the Amazon S3 console today and send feedback to AWS re:Post for Amazon S3 or through your usual AWS Support contacts.

— Channy

Build conversational AI search with Amazon OpenSearch Service

2025-07-03 Bharav Patel

Post Syndicated from Bharav Patel original https://aws.amazon.com/blogs/big-data/build-conversational-ai-search-with-amazon-opensearch-service/

Retrieval Augmented Generation (RAG) is a well-known approach to creating generative AI applications. RAG combines large language models (LLMs) with external world knowledge retrieval and is increasingly popular for adding accuracy and personalization to AI. It retrieves relevant information from external sources, augments the input with this data, and generates responses based on both. This approach reduces hallucinations, improves fact accuracy, and allows for up-to-date, efficient, and explainable AI systems. RAG’s ability to break through classical language model limitations has made it applicable to broad AI use cases.

Amazon OpenSearch Service is a versatile search and analytics tool. It is capable of performing security analytics, searching data, analyzing logs, and many other tasks. It can also work with vector data with a k-nearest neighbors (k-NN) plugin, which makes it helpful for more complex search strategies. Because of this feature, OpenSearch Service can serve as a knowledge base for generative AI applications that integrate language generation with search results.

By preserving context over several exchanges, honing responses, and providing a more seamless user experience, conversational search enhances RAG. It helps with complex information needs, resolves ambiguities, and manages multi-turn reasoning. Conversational search provides a more natural and personalized interaction, yielding more accurate and pertinent results, even though standard RAG performs well for single queries.

In this post, we explore conversational search, its architecture, and various ways to implement it.

Solution overview

Let’s walk through the solution to build conversational search. The following diagram illustrates the solution architecture.

The new OpenSearch feature known as agents and tools is used to create conversational search. To develop sophisticated AI applications, agents coordinate a variety of machine learning (ML) tasks. Every agent has a number of tools; each intended for a particular function. To use agents and tools, you need OpenSearch version 2.13 or later.

Prerequisites

To implement this solution, you need an AWS account. If you don’t have one, you can create an account. You also need an OpenSearch Service domain with OpenSearch version 2.13 or later. You can use an existing domain or create a new domain.

To use the Amazon Titan Text Embedding and Anthropic Claude V1 models in Amazon Bedrock, you need to enable access to these foundation models (FMs). For instructions, refer to Add or remove access to Amazon Bedrock foundation models.

Configure IAM permissions

Complete the following steps to set up an AWS Identity and Access Management (IAM) role and user with appropriate permissions:

Create an IAM role with the following policy that will allow the OpenSearch Service domain to invoke the Amazon Bedrock API:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeAgent",
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:${Region}::foundation-model/amazon.titan-embed-text-v1",
                "arn:aws:bedrock: ${Region}::foundation-model/anthropic.claude-instant-v1"
            ]
        }
    ]
}

Depending on the AWS Region and model you use, specify those in the Resource section.

Add opensearchservice.amazonaws.com as a trusted entity.
Make a note of the IAM role Amazon Resource name (ARN).
Assign the preceding policy to the IAM user that will create a connector.

Create a passRole policy and assign it to IAM user that will create the connector using Python:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::${AccountId}:role/OpenSearchBedrock"
        }
    ]
}

Map the IAM role you created to the OpenSearch Service domain role using the following steps:
- Log in to the OpenSearch Dashboard and open the Security page from the navigation menu.
- Choose Roles and select ml_all_access.
- Choose Mapped Users and Manage Mapping.
- Under Users, add the ARN of the IAM user you created.

Establish a connection to the Amazon Bedrock model using the MLCommons plugin

In order to identify patterns and relationships, an embedding model transforms input data—such as words or images—into numerical vectors in a continuous space. Similar objects are grouped together to make it easier for AI systems to comprehend and respond to intricate user enquiries.

Semantic search concentrates on the purpose and meaning of a query. OpenSearch stores data in a vector index for retrieval and transforms it into dense vectors (lists of numbers) using text embedding models. We are using amazon.titan-embed-text-v1 hosted on Amazon Bedrock, but you will need to evaluate and choose the right model for your use case. The amazon.titan-embed-text-v1 model maps sentences and paragraphs to a 1,536-dimensional dense vector space and is optimized for the task of semantic search.

Complete the following steps to establish a connection to the Amazon Bedrock model using the MLCommons plugin:

Establish a connection by using the Python client with the connection blueprint.
Modify the values of the host and region parameters in the provided code block. For this example, we’re running the program in Visual Studio Code with Python version 3.9.6, but newer versions should also work.

For the role ARN, use the ARN you created earlier, and run the following script using the credentials of the IAM user you created:

import boto3
import requests 
from requests_aws4auth import AWS4Auth

host = 'https://search-test.us-east-1.es.amazonaws.com/'
region = 'us-east-1'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

path = '_plugins/_ml/connectors/_create'
url = host + path

payload = {
  "name": "Amazon Bedrock Connector: embedding",
  "description": "The connector to bedrock Titan embedding model",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock",
    "model": "amazon.titan-embed-text-v1"
  },
  "credential": {
    "roleArn": "arn:aws:iam::<accountID>:role/opensearch_bedrock_external"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
      "headers": {
        "content-type": "application/json",
        "x-amz-content-sha256": "required"
      },
      "request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
      "pre_process_function": "connector.pre_process.bedrock.embedding",
      "post_process_function": "connector.post_process.bedrock.embedding"
    }
  ]
}

headers = {"Content-Type": "application/json"}

r = requests.post(url, auth=awsauth, json=payload, headers=headers, timeout=15)
print(r.status_code)
print(r.text)

Run the Python program. This will return connector_id.

python3 connect_bedrocktitanembedding.py
200
{"connector_id":"nbBe65EByVCe3QrFhrQ2"}

Create a model group against which this model will be registered in the OpenSearch Service domain:

POST /_plugins/_ml/model_groups/_register
{
  "name": "embedding_model_group",
  "description": "A model group for bedrock embedding models"
}

You get the following output:

{
  "model_group_id": "1rBv65EByVCe3QrFXL6O",
  "status": "CREATED"
}

POST /_plugins/_ml/models/_register
{
    "name": "titan_text_embedding_bedrock",
    "function_name": "remote",
    "model_group_id": "1rBv65EByVCe3QrFXL6O",
    "description": "test model",
    "connector_id": "nbBe65EByVCe3QrFhrQ2",
   "interface": {}
}

You get the following output:

{
  "task_id": "2LB265EByVCe3QrFAb6R",
  "status": "CREATED",
  "model_id": "2bB265EByVCe3QrFAb60"
}

Deploy a model using the model ID:

POST /_plugins/_ml/models/2bB265EByVCe3QrFAb60/_deploy

You get the following output:

{
  "task_id": "bLB665EByVCe3QrF-slA",
  "task_type": "DEPLOY_MODEL",
  "status": "COMPLETED"
}

Now the model is deployed, and you will see that in OpenSearch Dashboards on the OpenSearch Plugins page.

Create an ingestion pipeline for data indexing

Use the following code to create an ingestion pipeline for data indexing. The pipeline will establish a connection to the embedding model, retrieve the embedding, and then store it in the index.

PUT /_ingest/pipeline/cricket_data_pipeline {
    "description": "batting score summary embedding pipeline",
    "processors": [
        {
            "text_embedding": {
                "model_id": "GQOsUJEByVCe3QrFfUNq",
                "field_map": {
                    "cricket_score": "cricket_score_embedding"
                }
            }
        }
    ]
}

Create an index for storing data

Create an index for storing data (for this example, the cricket achievements of batsmen). This index stores raw text and embeddings of the summary text with 1,536 dimensions and uses the ingest pipeline we created in the previous step.

PUT cricket_data {
    "mappings": {
        "properties": {
            "cricket_score": {
                "type": "text"
            },
            "cricket_score_embedding": {
                "type": "knn_vector",
                "dimension": 1536,
                "space_type": "l2",
                "method": {
                    "name": "hnsw",
                    "engine": "faiss"
                }
            }
        }
    },
    "settings": {
        "index": {
            "knn": "true"
        }
    }
}

Ingest sample data

Use the following code to ingest the sample data for four batsmen:

POST _bulk?pipeline=cricket_data_pipeline
{"index": {"_index": "cricket_data"}}
{"cricket_score": "Sachin Tendulkar, often hailed as the 'God of Cricket,' amassed an extraordinary batting record throughout his 24-year international career. In Test cricket, he played 200 matches, scoring a staggering 15,921 runs at an average of 53.78, including 51 centuries and 68 half-centuries, with a highest score of 248 not out. His One Day International (ODI) career was equally impressive, spanning 463 matches where he scored 18,426 runs at an average of 44.83, notching up 49 centuries and 96 half-centuries, with a top score of 200 not out – the first double century in ODI history. Although he played just one T20 International, scoring 10 runs, his overall batting statistics across formats solidified his status as one of cricket's all-time greats, setting numerous records that stand to this day."}
{"index": {"_index": "cricket_data"}}
{"cricket_score": "Virat Kohli, widely regarded as one of the finest batsmen of his generation, has amassed impressive statistics across all formats of international cricket. As of April 2024, in Test cricket, he has scored over 8,000 runs with an average exceeding 50, including numerous centuries. His One Day International (ODI) record is particularly stellar, with more than 12,000 runs at an average well above 50, featuring over 40 centuries. In T20 Internationals, Kohli has maintained a high average and scored over 3,000 runs. Known for his exceptional ability to chase down targets in limited-overs cricket, Kohli has consistently ranked among the top batsmen in ICC rankings and has broken several batting records throughout his career, cementing his status as a modern cricket legend."}
{"index": {"_index": "cricket_data"}}
{"cricket_score": "Adam Gilchrist, the legendary Australian wicketkeeper-batsman, had an exceptional batting record across formats during his international career from 1996 to 2008. In Test cricket, Gilchrist scored 5,570 runs in 96 matches at an impressive average of 47.60, including 17 centuries and 26 half-centuries, with a highest score of 204 not out. His One Day International (ODI) record was equally remarkable, amassing 9,619 runs in 287 matches at an average of 35.89, with 16 centuries and 55 half-centuries, and a top score of 172. Gilchrist's aggressive batting style and ability to change the course of a game quickly made him one of the most feared batsmen of his era. Although his T20 International career was brief, his overall batting statistics, combined with his wicketkeeping skills, established him as one of cricket's greatest wicketkeeper-batsmen."}
{"index": {"_index": "cricket_data"}}
{"cricket_score": "Brian Lara, the legendary West Indian batsman, had an extraordinary batting record in international cricket during his career from 1990 to 2007. In Test cricket, Lara amassed 11,953 runs in 131 matches at an impressive average of 52.88, including 34 centuries and 48 half-centuries. He holds the record for the highest individual score in a Test innings with 400 not out, as well as the highest first-class score of 501 not out. In One Day Internationals (ODIs), Lara scored 10,405 runs in 299 matches at an average of 40.48, with 19 centuries and 63 half-centuries. His highest ODI score was 169. Known for his elegant batting style and ability to play long innings, Lara's exceptional performances, particularly in Test cricket, cemented his status as one of the greatest batsmen in the history of the game."}

Deploy the LLM for response generation

Use the following code to deploy the LLM for response generation. Modify the values of host, region, and roleArn in the provided code block.

Create a connector by running the following Python program. Run the script using the credentials of the IAM user created earlier.

import boto3
import requests 
from requests_aws4auth import AWS4Auth

host = 'https://search-test.us-east-1.es.amazonaws.com/'
region = 'us-east-1'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

path = '_plugins/_ml/connectors/_create'
url = host + path

payload = {
  "name": "BedRock Claude instant-v1 Connector ",
  "description": "The connector to BedRock service for claude model",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock",
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens_to_sample": 8000,
    "temperature": 0.0001,
    "response_filter": "$.completion"
  },
   "credential": {
        "roleArn": "arn:aws:iam::accountId:role/opensearch_bedrock_external"
    },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/anthropic.claude-instant-v1/invoke",
      "headers": {
        "content-type": "application/json",
        "x-amz-content-sha256": "required"
      },
      "request_body": "{\"prompt\":\"${parameters.prompt}\", \"max_tokens_to_sample\":${parameters.max_tokens_to_sample}, \"temperature\":${parameters.temperature},  \"anthropic_version\":\"${parameters.anthropic_version}\" }"
    }
  ]
 }
    

headers = {"Content-Type": "application/json"}

r = requests.post(url, auth=awsauth, json=payload, headers=headers, timeout=15)
print(r.status_code)
print(r.text)

If it ran successfully, it would return connector_id and a 200-response code:

200
{"connector_id":"LhLSZ5MBLD0avmh1El6Q"}

Create a model group for this model:

POST /_plugins/_ml/model_groups/_register
{
    "name": "claude_model_group",
    "description": "This is an example description"
}

This will return model_group_id; make a note of it:

{
  "model_group_id": "LxLTZ5MBLD0avmh1wV4L",
  "status": "CREATED"
}

POST /_plugins/_ml/models/_register
{
    "name": "anthropic.claude-v1",
    "function_name": "remote",
    "model_group_id": "LxLTZ5MBLD0avmh1wV4L",
    "description": "LLM model",
    "connector_id": "LhLSZ5MBLD0avmh1El6Q",
    "interface": {}
}

It will return model_id and task_id:

{
  "task_id": "YvbVZ5MBtVAPFbeA7ou7",
  "status": "CREATED",
  "model_id": "Y_bVZ5MBtVAPFbeA7ovb"
}

Finally, deploy the model using an API:

POST /_plugins/_ml/models/Y_bVZ5MBtVAPFbeA7ovb/_deploy

The status will show as COMPLETED. That means the model is successfully deployed.

{
  "task_id": "efbvZ5MBtVAPFbeA7otB",
  "task_type": "DEPLOY_MODEL",
  "status": "COMPLETED"
}

Create an agent in OpenSearch Service

An agent orchestrates and runs ML models and tools. A tool performs a set of specific tasks. For this post, we use the following tools:

VectorDBTool – The agent use this tool to retrieve OpenSearch documents relevant to the user question
MLModelTool – This tool generates user responses based on prompts and OpenSearch documents

Use the embedding model_id in VectorDBTool and LLM model_id in MLModelTool:

POST /_plugins/_ml/agents/_register {
    "name": "cricket score data analysis agent",
    "type": "conversational_flow",
    "description": "This is a demo agent for cricket data analysis",
    "app_type": "rag",
    "memory": {
        "type": "conversation_index"
    },
    "tools": [
        {
            "type": "VectorDBTool",
            "name": "cricket_knowledge_base",
            "parameters": {
                "model_id": "2bB265EByVCe3QrFAb60",
                "index": "cricket_data",
                "embedding_field": "cricket_score_embedding",
                "source_field": [
                    "cricket_score"
                ],
                "input": "${parameters.question}"
            }
        },
        {
            "type": "MLModelTool",
            "name": "bedrock_claude_model",
            "description": "A general tool to answer any question",
            "parameters": {
                "model_id": "gbcfIpEByVCe3QrFClUp",
                "prompt": "\n\nHuman:You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say don't know. \n\nContext:\n${parameters.cricket_knowledge_base.output:-}\n\n${parameters.chat_history:-}\n\nHuman:${parameters.question}\n\nAssistant:"
            }
        }
    ]
}

This returns an agent ID; take note of the agent ID, which will be used in subsequent APIs.

Query the index

We have batting scores of four batsmen in the index. For the first query, let’s specify the player name:

POST /_plugins/_ml/agents/<agent ID>/_execute {
    "parameters": {
        "question": "What is batting score of Sachin Tendulkar ?"
    }
}

Based on context and available information, it returns the batting score of Sachin Tendulkar. Note the memory_id from the response; you will need it for subsequent questions in the next steps.

We can ask a follow-up question. This time, we don’t specify the player name and expect it to answer based on the earlier question:

POST /_plugins/_ml/agents/<agent ID>/_execute {
    "parameters": {
        "question": " How many T20 international match did he play?",
        "next_action": "then compare with Virat Kohlis score",
        "memory_id": "so-vAJMByVCe3QrFYO7j",
        "message_history_limit": 5,
        "prompt": "\n\nHuman:You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say don't know. \n\nContext:\n${parameters.population_knowledge_base.output:-}\n\n${parameters.chat_history:-}\n\nHuman:always learn useful information from chat history\nHuman:${parameters.question}, ${parameters.next_action}\n\nAssistant:"
    }
}

In the preceding API, we use the following parameters:

Question and Next_action – We also pass the next action to compare Sachin’s score with Virat’s score.
Memory_id – This is memory assigned to this conversation. Use the same memory_id for subsequent questions.
Prompt – This is the prompt you give to the LLM. It includes the user’s question and the next action. The LLM should answer only using the data indexed in OpenSearch and must not invent any information. This way, you prevent hallucination.

Refer to ML Model tool for more details about setting up these parameters and the GitHub repo for blueprints for remote inferences.

The tool stores the conversation history of the questions and answers in the OpenSearch index, which is used to refine answers by asking follow-up questions.

In real-world scenarios, you can map memory_id against the user’s profile to preserve the context and isolate the user’s conversation history.

We have demonstrated how to create a conversational search application using the built-in features of OpenSearch Service.

Clean up

To avoid incurring future charges, delete the resources created while building this solution:

Conclusion

In this post, we demonstrated how to use OpenSearch agents and tools to create a RAG pipeline with conversational search. By integrating with ML models, vectorizing questions, and interacting with LLMs to improve prompts, this configuration oversees the entire process. This method allows you to quickly develop AI assistants that are ready for production without having to start from scratch.

If you’re building a RAG pipeline with conversational history to let users ask follow-up questions for more refined answers, give it a try and share your feedback or questions in the comments!

About the author

Bharav Patel is a Specialist Solution Architect, Analytics at Amazon Web Services. He primarily works on Amazon OpenSearch Service and helps customers with key concepts and design principles of running OpenSearch workloads on the cloud. Bharav likes to explore new places and try out different cuisines.

Enhance stability with dedicated cluster manager nodes using Amazon OpenSearch Service

2025-07-03 Chinmayi Narasimhadevara

Post Syndicated from Chinmayi Narasimhadevara original https://aws.amazon.com/blogs/big-data/enhance-stability-with-dedicated-cluster-manager-nodes-using-amazon-opensearch-service/

Amazon OpenSearch Service is a managed service that you can use to secure, deploy, and operate OpenSearch clusters at scale in the AWS Cloud. With OpenSearch Service, you can configure clusters with different types of node options such as data nodes, dedicated cluster manager nodes, dedicated coordinator nodes, and UltraWarm nodes. When configuring your OpenSearch Service domain, you can exercise different node options to manage your cluster’s overall stability, performance, and resiliency.

In this post, we show how to enhance the stability of your OpenSearch Service domain with dedicated cluster manager nodes and how using these in deployment enhances your cluster’s stability and reliability.

The benefit of dedicated cluster manager nodes

A dedicated cluster manager node handles the behind-the-scenes work of running an OpenSearch Service cluster, but it doesn’t store actual data or process search requests. In the absence of dedicated cluster manager nodes, OpenSearch Service will use data nodes for cluster management; combining these responsibilities on the data nodes can impact performance and stability because data operations (like indexing and searching) compete with critical cluster management tasks for computing resources. The dedicated cluster manager node is responsible for several key tasks: monitoring and keeping track of all the data nodes in the cluster, knowing how many indexes and shards there are and where they’re located, and routing data to the correct places. They also update and share the cluster state whenever something changes, like creating an index or adding and removing nodes. The problem, however, is that when traffic gets heavy, the cluster manager node can get overloaded and become unresponsive. If this happens, your cluster will not respond to write requests until it elects a new cluster manager, at which point the cycle might repeat itself. You can alleviate this issue by deploying dedicated cluster manager instances, whereby this separation of duties between the manager node and the data nodes results in a much more stable cluster.

Calculating the number of dedicated cluster manager nodes

In OpenSearch Service, a single node is elected as the cluster manager from all eligible nodes through a quorum-based voting process, confirming consensus before taking on the responsibility of coordinating cluster-wide operations and maintaining the cluster’s state. Quorum is the minimum number of nodes that need to agree before the cluster makes important decisions. It helps keep your data consistent and your cluster running smoothly. When you use dedicated cluster manager nodes, only those nodes are eligible for election and OpenSearch Service sets the quorum to half of the nodes, rounded down to the nearest whole number, plus one. One dedicated cluster manager node is explicitly prohibited by OpenSearch Service because you have no backup in the event of a failure. Using three dedicated cluster manager nodes makes sure that even if one node fails, the remaining two can still reach a quorum and maintain cluster operations. We recommend three dedicated cluster manager nodes for production use cases. Multi-AZ with standby is an OpenSearch Service feature designed to deliver four 9s of availability using a third AWS Availability Zone as a standby. When you use Multi-AZ with standby, the service requires three dedicated cluster manager nodes. If you deploy with Multi-AZ without standby or Single-AZ, we still recommend three dedicated cluster manager nodes. It provides two backup nodes in the event of one cluster manager node failure and the necessary quorum (two) to elect a new manager. You can choose three or five dedicated cluster manager nodes.

Having five dedicated cluster manager nodes works as well as three, and you can lose two nodes while maintaining a quorum. But because only one dedicated cluster manager node is active at any given time, this configuration means you pay for four idle nodes.

Cluster manager node configurations for different domain creation methods

This section explains the resources each domain creation method and template deploy when you set up an OpenSearch Service domain.

With the Easy create option, you can quickly create a domain using ‘multi-AZ with standby’ for high availability three-cluster manager nodes distributed across three Availability Zones. The following table summarizes the configuration.

Domain Creation Method

Output

Easy Create

Dedicated cluster manager node: Yes

Number of cluster manager nodes: 3

Availability Zones: 3

Standby: Yes

The Standard create option provides templates for ‘Production’ and ‘Dev/test’workloads. Both templates come with a Domain with standby and a Domain without standby deployment choice. The following table summarizes these configuration options.

Domain Creation Method	Template	Deployment Option	Output
Standard Create	Production	Domain with standby	Requires dedicated cluster manager node Number of cluster manager nodes: 3 Availability Zones: 3 Standby: Yes Instance type choice: Yes
Standard create	Production	Domain without standby	Requires dedicated cluster manager node Number of cluster manager nodes: 3, 5 Availability Zones: 3 Standby: No Instance type choice: Yes
Standard Create	Dev/test	Domain with standby	Requires dedicated cluster manager node Number of cluster manager nodes: 3 Availability Zones: 3 Standby: Yes Instance type choice: Yes
Standard create	Dev/test	Domain without standby	Does not require dedicated cluster manager node

Choosing a dedicated cluster manager instance type

Dedicated cluster manager instances typically handle critical cluster operations like shard distribution and index management and track cluster state changes. It’s recommended to select a comparatively smaller instance type. Refer to Choosing instance types for dedicated master nodes for more information on instance types for dedicated cluster manager nodes.

You should expect to occasionally adjust cluster manager instance size and type as your workload evolves over time. As with all scale questions, you need to monitor performance and make sure you have enough CPU and Java virtual machine (JVM) heap for your dedicated cluster managers. We recommend using Amazon CloudWatch alarms to monitor the following CloudWatch metrics, and adjust according to the alarm state:

ManagerCPUUtilization – Maximum is greater than or equal to 50% for 15 minutes, three consecutive times
ManagerJVMMemoryPressure – Maximum is greater than or equal to 95% for 1 minute, three consecutive times

Conclusion

Dedicated cluster manager nodes provide added stability and protection against split-brain situations, can be of a different instance type than data nodes, and are an obvious benefit when OpenSearch Service is backing mission-critical applications for production workloads. They are typically not required for development workloads like proof of concept because the cost of running a dedicated cluster manager node exceeds the tangible benefits of keeping the cluster up and running. To learn more about OpenSearch best practices, see link.

About the authors

Imtiaz (Taz) Sayed is the WW Tech Leader for Analytics at AWS. He enjoys engaging with the community on all things data and analytics. He can be reached through LinkedIn.

Chinmayi Narasimhadevara is a Senior Solutions Architect focused on Data Analytics and AI at AWS. She helps customers build advanced, highly scalable, and performant solutions.

Kaltura reduces observability operational costs by 60% with Amazon OpenSearch Service

2025-07-03 Ido Ziv

Post Syndicated from Ido Ziv original https://aws.amazon.com/blogs/big-data/kaltura-reduces-observability-operational-costs-by-60-with-amazon-opensearch-service/

This post is co-written with Ido Ziv from Kaltura.

As organizations grow, managing observability across multiple teams and applications becomes increasingly complex. Logs, metrics, and traces generate vast amounts of data, making it challenging to maintain performance, reliability, and cost-efficiency.

At Kaltura, an AI-infused video-first company serving millions of users across hundreds of applications, observability is mission-critical. Understanding system behavior at scale isn’t just about troubleshooting—it’s about providing seamless experiences for customers and employees alike. But achieving effective observability at this scale comes with challenges: managing spans; correlating logs, traces, and events across distributed systems; and maintaining visibility without overwhelming teams with noise. Balancing granularity, cost, and actionable insights requires constant tuning and thoughtful architecture.

In this post, we share how Kaltura transformed its observability strategy and technological stack by migrating from a software as a service (SaaS) logging solution to Amazon OpenSearch Service—achieving higher log retention, a 60% reduction in cost, and a centralized platform that empowers multiple teams with real-time insights.

Observability challenges at scale

Kaltura ingests over 8TB of logs and traces daily, processing more than 20 billion events across 6 production AWS Regions and over 200 applications—with log spikes reaching up to 6 GB per second. This immense data volume, combined with a highly distributed architecture, created significant challenges in observability. Historically, Kaltura relied on a SaaS-based observability solution that met initial requirements but became increasingly difficult to scale. As the platform evolved, teams generated disparate log formats, applied retention policies that no longer reflected data value, and operated more than 10 organically grown observability sources. The lack of standardization and visibility required extensive manual effort to correlate data, maintain pipelines, and troubleshoot issues – leading to rising operational complexity and fixed costs that didn’t scale efficiently with usage.

Kaltura’s DevOps team recognized the need to reassess their observability solution and began exploring a variety of options, from self-managed platforms to fully managed SaaS offerings. After a comprehensive evaluation, they made the strategic decision to migrate to OpenSearch Service, using its advanced features such as Amazon OpenSearch Ingestion, the Observability plugin, UltraWarm storage, and Index State Management.

Solution overview

Kaltura created a new AWS account that would be a dedicated observability account, where OpenSearch Service was deployed. Logs and traces were collected from different accounts and producers such as microservices on Amazon Elastic Kubernetes Service (Amazon EKS) and services running on Amazon Elastic Compute Cloud (Amazon EC2).

By using AWS services such as AWS Identity and Access Management (IAM), AWS Key Management Service (AWS KMS), and Amazon CloudWatch, Kaltura was able to meet the standards to create a production-grade system while keeping security and reliability in mind. The following figure shows a high-level design of the environment setup.

Ingestion

As seen in the following diagram, logs are shipped using log shippers, also known as collectors. In Kaltura’s case, they used Fluent Bit. A log shipper is a tool designed to collect, process, and transport log data from various sources to a centralized location, such as log analytics platforms, management systems, or an aggregator system. Fluent Bit was used in all sources and also provided light processing abilities. Fluent Bit was deployed as a daemonset in Kubernetes. The application development teams didn’t change their code, because the Fluent Bit pods were reading the stdout of the application pods.

The following code is an example of FluentBit configurations for Amazon EKS:

[INPUT]
   Name                tail
   Path                /var/log/containers/*.log
   Tag                 kube.*
   Skip_Long_Lines     On
   multiline.parser    docker, cri
[FILTER]
   alias               k8s
   # kubernetes filter to parse all logs
   Name                kubernetes
   Match               kube.*
   Kube_Tag_Prefix     kube.var.log.containers.
   Annotations         On
   Labels              Off
   Merge_Log           On
   Keep_Log            Off
   Kube_URL            https://kubernetes.default.svc.cluster.local:443 
[FILTER]
   alias               apps
   Name                rewrite_tag
   Match               kube.*
   Rule                $kubernetes['annotations']['kaltura.com/observability'] ^apps$ 
[OUTPUT]
   Name                http
   Match               apps.*
   Alias               apps
   Host                xxxxx.us-east-1.osis.amazonaws.com
   Port                443
   URI                 /log/apps
   Format              json
   aws_auth            true
   aws_region          us-east-1
   aws_service         osis
   aws_role_arn        arn:aws:iam::xxxxx:role/osis-ingestion-role
   Log_Level           trace
   tls On

Spans and traces were collected directly from the application layer using a seamless integration approach. To facilitate this, Kaltura deployed an OpenTelemetry Collector (OTEL) using the OpenTelemetry Operator for Kubernetes. Additionally, the team developed a custom OTEL code library, which was incorporated into the application code to efficiently capture and log traces and spans, providing comprehensive observability across their system.

Data from Fluent Bit and OpenTelemetry Collector was sent to OpenSearch Ingestion, a fully managed, serverless data collector that delivers real-time log, metric, and trace data to OpenSearch Service domains and Amazon OpenSearch Serverless collections. Each producer sent data to a specific pipeline, one for logs and one for traces, where data was transformed, aggregated, enriched, and normalized before being sent to OpenSearch Service. The trace pipeline used the otel_trace and service_map processors, while using the OpenSearch Ingestion OpenTelemetry trace analytics blueprint.

The following code is an example of the OpenSearch Ingestion pipeline for logs:

version: "2"
entry-pipeline:
 source:
   http:
     path: "/log/apps"

 processor:
   - add_entries:
       entries:
       - key: "log_type"
         value: "default"
       - key: "log_type"
         value: "api"
         add_when: 'contains(/filename, "api.log")'
         overwrite_if_key_exists: true
       - key: "log_type"
         value: "stats"
         add_when: 'contains(/filename, "stats.log")'
         overwrite_if_key_exists: true
       - key: "log_type"
         value: "event"
         add_when: 'contains(/filename, "event.log")'
         overwrite_if_key_exists: true
       - key: "log_type"
         value: "login"
         add_when: 'contains(/filename, "login.log")'
         overwrite_if_key_exists: true

   - grok:
       grok_when: '/log_type == "api"'
       match:
         log: ['^\[%%{DATA:timestamp}] \[%%{DATA:logIp}\] \[%%{DATA:host}\] \[%%{WORD:id}\] %%{WORD:priorityName}\(%%{NUMBER:priority}\): \[memory: %%{DATA:memory} MB, real: %%{DATA:real}MB\] %%{GREEDYDATA:message}']

   - date:
       match:
         - key: timestamp
           patterns: ["dd-MMM-yyyy HH:mm:ss", "dd/MMM/yyyy:HH:mm:ss Z", "EEE MMM dd HH:mm:ss.SSSSSS yyyy"]

       destination: "@timestamp"
       output_format: "yyyy-MM-dd'T'HH:mm:ss"

   - rename_keys:
       entries:
       - from_key: "timestamp"
         to_key: "@timestamp"
         overwrite_if_to_key_exists: false
       - from_key: "date"
         to_key: "@timestamp"
         overwrite_if_to_key_exists: false

   - drop_events:
       drop_when: 'contains(/filename, "simplesamlphp.log")'


 sink:
   - opensearch:
       hosts: ["${opensearch_host}"]
       index: '$${/env}-api-$${/log_type}-app-logs'
       index_type: custom
       action: create
       bulk_size: 20
       aws:
         sts_role_arn: ${sts_role_arn}
         region:  ${region}
       dlq:
         s3:
           bucket: "${bucket}"
           key_path_prefix: 'my-app-dlq-files'
           region: "${region}"
           sts_role_arn: "${sts_role_arn}"

The preceding example shows the use of processors such as grok, date, add_entries, rename_keys, and drop_events:

add_entries:
- Adds a new field log_type based on filename
- Default: “default”
- If the filename contains specific substrings (such as api.log or stats.log), it assigns a more specific type
grok:
- Applies Grok parsing to logs of type “api”
- Extracts fields like timestamp, logIp, host, priorityName, priority, memory, real, and message using a custom pattern
date:
- Parses timestamp strings into a standard datetime format
- Stores it in a field called @timestamp based on ISO8601 format
- Handles multiple timestamp patterns
rename_keys:
- timestamp or date are renamed into @timestamp
- Does not overwrite if @timestamp already exists
drop_events:
- Drops logs where filename contains simplesamlphp.log
- This is a filtering rule to ignore noisy or irrelevant logs

The following is an example of the input of a log line:

   "log": "[25-Mar-2025 18:23:18] [127.0.0.1] [the-most-awesome-server-in-kaltura] [67e2f496cc321] INFO(6): [memory: 4.51 MB, real: 6MB] [request: 1] [time: 0.0263s / total: 0.0263s]",

After processing, we get the following code:

    "log_type": "api",
    "priorityName": "INFO",
    "memory": "4.51",
    "host": "the-most-awesome-server-in-kaltura",
    "real": "6",
    "priority": "6",
    "message": "[request: 1] [time: 0.0263s / total: 0.0263s]",
    "logIp": "127.0.0.1",
    "id": "67e2f496cc321",
    "@timestamp": "2025-03-25T18:23:18"

Kaltura followed some OpenSearch Ingestion best practices, such as:

Including a dead-letter queue (DLQ) in pipeline configuration. This can significantly help troubleshoot pipeline issues.
Starting and stopping pipelines to optimize cost-efficiency, when possible.
During the proof of concept stage:
- Installing Data Prepper locally for faster development iterations.
- Disabling persistent buffering to expedite blue-green deployments.

Achieving operational excellence with efficient log and trace management

Logs and traces play a vital role in identifying operational issues, but they come with unique challenges. First, they represent time series data, which inherently evolves over time. Second, their value typically diminishes as time passes, making efficient management crucial. Third, they are append-only in nature. With OpenSearch, Kaltura faced distinct trade-offs between cost, data retention, and latency. The goal was to make sure valuable data remained accessible to engineering teams with minimal latency, but the solution also needed to be cost-effective. Balancing these factors required thoughtful planning and optimization.

Data was ingested to OpenSearch data streams, which simplifies the process of ingesting append-only time series data. Several Index State Management (ISM) policies were applied to different data streams, which were dependent on log retention requirements. ISM policies handled moving indexes from hot storage to UltraWarm, and eventually deleting the indexes. This allowed a customizable and cost-effective solution, with low latency for querying new data and reasonable latency for querying historical data.

The following example ISM policy makes sure indexes are managed efficiently, rolled over, and moved to different storage tiers based on their age and size, and eventually deleted after 60 days. If an action fails, it is retried with an exponential backoff strategy. In case of failures, notifications are sent to relevant teams to keep them informed.

{
    "id": "retention",
    "policy": {
        "description": "production ISM",
        },
        "default_state": "hot",
        "states": [
            {
                "name": "hot",
                "actions": [
                    {
                        "retry": {
                            "count": 5,
                            "backoff": "exponential",
                            "delay": "1h"
                        },
                        "rollover": {
                            "min_primary_shard_size": "30gb",
                            "copy_alias": false
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "warm",
                        "conditions": {
                            "min_index_age": "2d"
                        }
                    }
                ]
            },
            {
                "name": "warm",
                "actions": [
                    {
                        "retry": {
                            "count": 5,
                            "backoff": "exponential",
                            "delay": "1h"
                        },
                        "warm_migration": {}
                    }
                ],
                "transitions": [
                    {
                        "state_name": "cold",
                        "conditions": {
                            "min_index_age": "14d"
                        }
                    }
                ]
            },
            {
                "name": "cold",
                "actions": [
                    {
                        "retry": {
                            "count": 5,
                            "backoff": "exponential",
                            "delay": "1h"
                        },
                        "cold_migration": {
                            "start_time": null,
                            "end_time": null,
                            "timestamp_field": "@timestamp",
                            "ignore": "none"
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "60d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "cold_delete": {}
                    }
                ],
                "transitions": []
            }
        ],
        "ism_template": [
            {
                "index_patterns": [
                    "*-logs"
                ],
                "priority": 50,
            }
        ]
    }
}

To create a data stream in OpenSearch, a definition of index template is required, which configures how the data stream and its backing indexes will behave. In the following example, the index template specifies key index settings such as the number of shards, replication, and refresh interval—controlling how data is distributed, replicated, and refreshed across the cluster. It also defines the mappings, which describe the structure of the data—what fields exist, their types, and how they should be indexed. These mappings make sure the data stream knows how to interpret and store incoming log data efficiently. Finally, the template enables the @timestamp field as the time-based field required for a data stream.

{
  "index_patterns": [
    "*my-app-logs"
  ],
  "template": {
    "settings": {
      "index.number_of_shards": "32",
      "index.number_of_replicas": "0",
      "index.refresh_interval": "60s"
    },
    "mappings": {
      "properties": {
        "priorityName": {
          "type": "keyword"
        },
        "log_type": {
          "type": "keyword"
        },
        "@timestamp": {
          "type": "date"
        },
        "memory": {
          "type": "float"
        },
        "host": {
          "type": "keyword"
        },
        "pid": {
          "type": "keyword"
        },
        "real": {
          "type": "float"
        },
        "env": {
          "type": "keyword"
        },
        "message": {
          "type": "text"
        },
        "priority": {
          "type": "integer"
        },
        "logIp": {
          "type": "ip"
        }
      }
    }
  },
  "composed_of": [],
  "priority": "100",
  "_meta": {
    "flow": "simple"
  },
  "data_stream": {
    "timestamp_field": {
      "name": "@timestamp"
    }
  },
  "name": "my-app-logs"
}

Implementing role-based access control and user access

The new observability platform is accessed by many types of users; internal users log in to OpenSearch Dashboards using SAML-based federation with Okta. The following diagram illustrates the user flow.

Each user accesses the dashboards to view observability items relevant to their role. Fine-grained access control (FGAC) is enforced in OpenSearch using built-in IAM role and SAML group mappings to implement role-based access control (RBAC).When users log in to the OpenSearch domain, they are automatically routed to the appropriate tenant based on their assigned role. This setup makes sure developers can create dashboards tailored to debugging within development environments, and support teams can build dashboards focused on identifying and troubleshooting production issues. The SAML integration alleviates the need to manage internal OpenSearch users entirely.

For each role in Kaltura, a corresponding OpenSearch role was created with only the necessary permissions. For instance, support engineers are granted access to the monitoring plugin to create alerts based on logs, whereas QA engineers, who don’t require this functionality, are not granted that access.

The following screenshot shows the role of the DevOps engineers defined with cluster permissions.

These users are routed to their own dedicated DevOps tenant, to which they only have write access. This makes it possible for different users from different roles in Kaltura to create the dashboard items that focus on their priorities and needs. OpenSearch supports backend role mapping; Kaltura mapped the Okta group to the role so when a user logs in from Okta, they automatically get assigned based on their role.

This also works with IAM roles to facilitate automations in the cluster using external services, such as OpenSearch Ingestion pipelines, as can be seen in the following screenshot.

Using observability features and service mapping for enhanced trace and log correlation

After a user is logged in, they can use the Observability plugins, view surrounding events in logs, correlate logs and traces, and use the Trace Analytics plugin. Users can inspect traces and spans, and group traces with latency information using built-in dashboards. Users can also drill down to a specific trace or span and correlate it back to log events. The service_map processor used in OpenSearch Ingestion sends OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards.

Using the combined signals of traces and spans, OpenSearch discovers the application connectivity and maps them to a service map.

After OpenSearch ingests the traces and spans from Otel, they are aggregated to groups according to paths and trends. Durations are also calculated and presented to the user over time.

With a trace ID, it’s possible to filter out all the relevant spans by the service and see how long each took, identifying issues with external services such as MongoDB and Redis.

From the spans, users can discover the relevant logs.

Post-migration enhancements

After the migration, a strong developer community emerged within Kaltura that embraced the new observability solution. As adoption grew, so did requests for new features and enhancements aimed at improving the overall developer experience.

One key improvement was extending log retention. Kaltura achieved this by re-ingesting historical logs from Amazon Simple Storage Service (Amazon S3) using a dedicated OpenSearch Ingestion pipeline with Amazon S3 read permissions. With this enhancement, teams can access and analyze logs from up to a year ago using the same familiar dashboards and filters.

In addition to monitoring EKS clusters and EC2 instances, Kaltura expanded its observability stack by integrating more AWS services. Amazon API Gateway and AWS Lambda were introduced to support log ingestion from external vendors, allowing for seamless correlation with existing data and broader visibility across systems.

Finally, to empower teams and promote autonomy, data stream templates and ISM policies are managed directly by developers within their own repositories. By using infrastructure as code tools like Terraform, developers can define index mappings, alerts, and dashboards as code—versioned in Git and deployed consistently across environments.

Conclusion

Kaltura successfully implemented a smart log retention strategy, extending real time retention from 5 days for all log types to 30 days for critical logs, while maintaining cost-efficiency through the use of UltraWarm nodes. This approach led to a 60% reduction in costs compared to their previous solution. Additionally, Kaltura consolidated their observability platform, streamlining operations by merging 10 separate systems into a unified, all-in-one solution. This consolidation not only improved operational efficiency but also sparked increased engagement from developer teams, driving feature requests, fostering internal design collaborations, and attracting early adopters for new enhancements. If Kaltura’s journey has inspired you and you’re thinking about implementing a similar solution in your organization, consider these steps:

Start by understanding the requirements and setting expectations with the engineering teams in your organization
Start with a quick proof of concept to get hands-on experience
Refer to the following resources to help you get started:

About the authors

Ido Ziv is a DevOps team leader in Kaltura with over 6 years of experience. His hobbies include sailing and Kubernetes (but not at the same time).

Roi Gamliel is a Senior Solutions Architect helping startups build on AWS. He is passionate about the OpenSearch Project, helping customers fine-tune their workloads and maximize results.

Yonatan Dolan is a Principal Analytics Specialist at Amazon Web Services. He is located in Israel and helps customers harness AWS analytical services to use data, gain insights, and derive value.

Amazon OpenSearch Service 101: Create your first search application with OpenSearch

2025-06-25 Sriharsha Subramanya Begolli

Post Syndicated from Sriharsha Subramanya Begolli original https://aws.amazon.com/blogs/big-data/amazon-opensearch-service-101-create-your-first-search-application-with-opensearch/

Organizations today face the challenge of managing and deriving insights from an ever-expanding universe of data in real time. Industrial Internet of Things (IoT) sensors stream millions of temperature, pressure, and performance metrics from field equipment every second. Ecommerce platforms need to surface relevant products from vast catalogs instantly. Security teams must analyze system logs in real time to detect threats. As data volumes grow, organizations increasingly struggle with fragmented monitoring tools that create critical visibility gaps and slow incident response times. The cost of commercial observability solutions becomes prohibitive, forcing teams to manage multiple separate tools and increasing both operational overhead and troubleshooting complexity. Across these diverse scenarios, the ability to efficiently search, analyze, and visualize data in real time has become crucial for business success.

Amazon OpenSearch Service addresses these challenges by providing a fully managed search and analytics service. This managed service configures, manages, and scales OpenSearch clusters so you can focus on your search workloads and end customers. Amazon OpenSearch Serverless further makes it straightforward to run search and log analytics workloads by automatically scaling compute and storage resources up and down to match your application’s demands—with no infrastructure to manage. Whether you’re processing continuous streams of IoT telemetry, enabling product discovery, or performing security analytics, OpenSearch Service scales to meet your needs.

In this post, we walk you through a search application building process using Amazon OpenSearch Service. Whether you’re a developer new to search or looking to understand OpenSearch fundamentals, this hands-on post shows you how to build a search application from scratch—starting with the initial setup; diving into core components such as indexing, querying, result presentation; and culminating in the execution of your first search query.

Components of OpenSearch Service

Before building your first search application, it’s important to understand some key architectural components in OpenSearch. The fundamental unit of information in OpenSearch is a document stored in JSON format. These documents are organized into indices—collections of related documents that function similar to database tables. When you search for information, OpenSearch queries these indices to find matching documents.

OpenSearch operates on a distributed architecture where multiple servers, called nodes, work together in a cluster or domain. Each cluster can utilize dedicated master nodes that focus solely on cluster management tasks, such as maintaining cluster state, managing indices, and orchestrating shard allocation. These specialized nodes enhance cluster stability by offloading cluster management duties from data nodes. Data nodes, on the other hand, handle the storage, indexing, and querying of data—essentially performing the heavy lifting of data operations. Together, they provide scalability, availability, and efficient data processing in the cluster. Configure dedicated coordinator nodes that specialize in routing and distributing search and indexing requests across the cluster. These nodes reduce the load on data nodes, which allows them to focus on data storage, indexing, and search operations.

Coordinator nodes in OpenSearch are most beneficial in the following scenarios:

Large cluster deployments – When managing substantial data volumes across many nodes.
Query-intensive workloads – For environments handling frequent search queries or aggregations, especially those with complex date histograms or multiple aggregations, benefit from faster query processing.
Heavy dashboard utilization – OpenSearch Dashboards can be resource-intensive. Offloading this responsibility to dedicated coordinator nodes reduces the strain on data nodes.

To manage large datasets efficiently, OpenSearch splits indices into smaller pieces called shards. Each shard is distributed across the cluster, with a recommended size of 10–50 GB for optimal performance. For reliability and high availability, OpenSearch maintains replica copies of these shards on different nodes, which means that your data remains accessible even if some nodes fail.

Search operations in OpenSearch are powered by inverted indices, a data structure that maps terms to the documents containing them. The BM25 ranking algorithm helps make sure that search results are relevant to users’ queries. Although searches happen in near real time, with configurable refresh intervals, individual document retrievals are immediate.

This architecture provides the foundation for handling high-volume IoT data streams, complex full-text search operations, and real-time analytics, all while maintaining fault tolerance. Understanding these components will help you make informed decisions as you build your search application.OpenSearch Dashboards is a visualization and analytics tool for exploring, analyzing, and visualizing data in real time. It provides an intuitive interface for querying, monitoring, and reporting on OpenSearch data using visualizations such as charts, graphs, and maps. Key features include interactive dashboards, alerting, anomaly detection, security monitoring, and trace analytics.

Sample Amazon OpenSearch Service tutorial application overview

The following architecture diagram demonstrates how to build and deploy a scalable, fully managed search application on Amazon Web Services (AWS). The architecture uses Amazon OpenSearch Service for indexing and searching data. The UI application is deployed on AWS App Runner and interacts with Amazon OpenSearch Service through secure serverless Amazon API Gateway and AWS Lambda.

Here is the end-to-end workflow for our application detailing how user requests are handled from initial access through to data retrieval or indexing:

Users access the application through AWS App Runner, which hosts the frontend interface.
Amazon Cognito handles user authentication and authorization for secure access to the application.
When users interact with the application, their requests are sent to API Gateway. API Gateway communicates with Amazon Cognito to verify user authentication status. It serves as the primary entry point for all API operations and routes the requests appropriately. It forwards requests to Lambda functions within the virtual private cloud (VPC).
Lambda functions process the requests, performing either:
Data indexing operations into OpenSearch Service
Search queries against the OpenSearch Service cluster
The OpenSearch Service cluster resides within a private subnet in a VPC for enhanced security.

Prerequisites

Before you deploy the solution, review the prerequisites.

Install the sample app

The entire infrastructure is deployed using AWS Cloud Development Kit (AWS CDK), with cluster configurations customizable through the cdk.json file on GitHub. This deployment approach provides consistent and repeatable infrastructure creation while maintaining security best practices. The steps to deploy this infrastructure are available in this README file. After deployment, you’ll access a comprehensive search application built with Cloudscape React components that includes:

Interactive search functionality – Test various OpenSearch query methods including prefix match keyword searches, phrase matching, fuzzy searches, and field-specific queries against the sample product dataset
Document management tools – Bulk index the product catalog with a single click or delete and recreate the index as needed for testing purposes
Educational resources – Access embedded guides explaining OpenSearch concepts, query syntax, and best practices

Index the documents

After you’ve deployed this search application, the first step is to index some documents into OpenSearch Service. Sign in to the search application UI and follow these steps:

To trigger a bulk index process, under Index Documents in the navigation pane, choose Bulk Index Product Catalog.
Choose Index Product catalog, as shown in the following screenshot.

The Lambda function indexes a comprehensive ecommerce product catalog into your newly created OpenSearch Service cluster. This sample dataset includes detailed fashion and lifestyle products spanning multiple categories. Each product record contains rich metadata, including title, detailed description, category, color, and price.

Keyword searches

OpenSearch Service offers multiple search features. For an exhaustive list, refer to Search features. We focus on a few keyword search types to help you get started with OpenSearch.

With the product catalog in OpenSearch, you can perform prefix searches through the search application’s intuitive interface. To better understand the search functionality, expand the Guide section at the top of the interface. This interactive guide explains how various kinds of searches work, complete with a practical example in context of the product catalog dataset. The guide includes best practices and a link to the detailed documentation to help you make the most of OpenSearch’s powerful query capabilities.

You can do a prefix search on any of the three key search fields: Title, Description, or Color.

A typical prefix match query looks like this:

{
  "query": {
    "match_phrase_prefix": {
      "attribute_name": {
        "query": "attribute_value",
        "max_expansions": 10,
        "slop": 1
      }
    }
  }
}

You can use this query pattern to find documents where specific fields begin with your search term, offering an intuitive “starts with” search experience.

The following image illustrates a practical example of the Prefix Match search. Entering “Ru” in the title field matches products with titles such as “Running”, “Runners” and “Ruby.” Prefix Match search is particularly useful when users only remember the beginning of a product name or are searching across multiple variations or simply exploring product categories.

Multi Match search enables searching across multiple fields simultaneously. For example, you can search for “Coral” across product title, description, and color fields simultaneously. The search query can be customized using field boosting in which matches in certain fields carry more weight than others.

A typical multi match query looks like this:

{
  "query": {
    "multi_match": {
      "query": "Coral",
      "fields": [
        "title^3",
        "description",
        "color"
      ],
      "type": "best_fields"
    }
  }
}

You can explore Wildcard Match, Range Filter, and other search features through the search application. For developers and administrators managing this search infrastructure, OpenSearch Dashboards is a native, developer-friendly interface for indexing, searching, and managing your data. It serves as a comprehensive control center where you can interact directly with your indices, test queries, and monitor performance in real time. The following screenshot shows OpenSearch Dashboards which provides an interactive UI to explore, analyze and visualize search and log data.

While our example demonstrates lexical search functionality on a sample product catalog, OpenSearch Service is equally powerful for observability usecases. When handling time-series data from logs, metrics, or traces, OpenSearch excels at real-time analytics and visualization. For instance, DevOps teams can index application logs and system telemetry data, then use date histograms and statistical aggregations to identify performance bottlenecks or security anomalies as they occur. This real-time search allows IT teams to detect and respond to incidents with minimal delay. Using OpenSearch Dashboards, teams can create live operational dashboards that update automatically as new data streams in. For IoT applications monitoring thousands of sensors, this means temperature anomalies or equipment failures can trigger immediate alerts through OpenSearch’s alerting capabilities. These observability workloads benefit from the same distributed architecture that powers our product search example, with the added advantage of time-series optimized indices and retention policies for managing high-volume streaming data efficiently.

Beyond search management, you can configure alerts for specific conditions, set up notification channels for operational events, and enable data discovery features. If you want to experiment with the same search queries we implemented in our application, you can launch OpenSearch Dashboards and use relevant index and search APIs from the Dev Tools section, which is an ideal environment for developing and testing before implementing in your production application. Because our OpenSearch Service cluster resides within a private subnet, you need to create a Secure Shell (SSH) tunnel to access the dashboard. For more information and steps to do this, refer to How do I use an SSH tunnel to access OpenSearch Dashboards with Amazon Cognito authentication from outside a VPC? in the Knowledge Center. So far, we’ve explored OpenSearch’s query domain-specific language (DSL). However, for those coming in from a traditional database background, OpenSearch also offers SQL and Piped Processing Language (PPL) functionality, making the transition smoother. You can explore more on this at SQL and PPL in the OpenSearch documentation.

In this post, we introduced you to different types of keyword searches. You can also store documents as vector embeddings in OpenSearch and use it for semantic search, hybrid search, multimodal search, or to implement Retrieval Augmented Generation (RAG) pattern.

Conclusion

You can now build sample search applications by following the steps outlined in this post and the implementation details available at sample-for-amazon-opensearch-service-tutorials-101 on GitHub. By using the distributed architecture of Amazon OpenSearch Service, an AWS managed service, you get fast, scalable search capabilities that grow with your business, built-in security and compliance controls, and automated cluster management—all with pay-only-for-what-you-use pricing flexibility.

Ready to learn more? Check out the Amazon OpenSearch Service Developer Guide. For more insights, best practices and architectures, and industry trends, refer to Amazon OpenSearch Service blog posts and hands-on workshops at AWS Workshops. Please also visit the OpenSearch Service Migration Hub if you are ready to migrate legacy or self-managed workloads to OpenSearch Service.

We hope this detailed guide and accompanying code will help you get started. Try it out, let us know your thoughts in the comments section, and feel free to reach out to us for questions!

About the authors

Sriharsha Subramanya Begolli works as a Senior Solutions Architect with Amazon Web Services (AWS), based in Bengaluru, India. His primary focus is assisting large enterprise customers in modernizing their applications and developing cloud-based systems to meet their business objectives. His expertise lies in the domains of data and analytics.

Fraser Sequeira is a Startups Solutions Architect with Amazon Web Services (AWS) based in Melbourne, Australia. In his role at AWS, Fraser works closely with startups to design and build cloud-native solutions on AWS, with a focus on analytics and streaming workloads. With over 10 years of experience in cloud computing, Fraser has deep expertise in big data, real-time analytics, and building event-driven architecture on AWS. He enjoys staying on top of the latest technology innovations from AWS and sharing his learnings with customers. He spends his free time tinkering with new open source technologies.

Implement secure hybrid and multicloud log ingestion with Amazon OpenSearch Ingestion

2025-06-25 Xiaoxue Xu

Post Syndicated from Xiaoxue Xu original https://aws.amazon.com/blogs/big-data/implement-secure-hybrid-and-multicloud-log-ingestion-with-amazon-opensearch-ingestion/

Running applications across hybrid or multicloud environments creates a common challenge: fragmented logs scattered across different platforms. This fragmentation complicates monitoring, slows troubleshooting, and reduces operational visibility. To address this, many organizations seek to implement secure log ingestion from all environments into a centralized platform.

Amazon OpenSearch Service provides a unified solution for real-time search, analytics, and log management across your entire infrastructure. Amazon OpenSearch Ingestion, a fully managed data collector, simplifies data processing with built-in capabilities to filter, transform, and enrich your logs before analysis.

However, securely sending logs from non-AWS environments presents a challenge. Every request to OpenSearch Ingestion requires AWS Signature Version 4 (AWS SigV4) authentication, traditionally requiring long-term credentials that introduce security risks. AWS Identity and Access Management Roles Anywhere solves this problem by providing temporary credentials for workloads running outside AWS.

In this post, we demonstrate how to configure Fluent Bit, a fast and flexible log processor and router supported by various operating systems, to securely send logs from any environment to OpenSearch Ingestion using IAM Roles Anywhere. This approach alleviates the need for long-term credentials while providing a comprehensive view of your application logs across all environments—improving security, simplifying operations, and enhancing your ability to quickly resolve issues.

Solutions overview

The solution in this post uses Fluent Bit to collect logs, retrieve temporary credentials from IAM Roles Anywhere, and sign HTTP log ingestion requests with AWS SigV4 before sending them to the OpenSearch Ingestion pipeline. The following diagram shows the architecture.

Architecture for log ingestion with AWS IAM Roles Anywhere

This solution provisions the following key components:

Certificate authority – For this post, we use AWS Private Certificate Authority (AWS Private CA) as the certificate authority (CA) source. Alternatively, you can integrate with an external CA; for more details, see IAM Roles Anywhere with an external certificate authority. Certificates issued from public CAs can’t be used as trust anchors for IAM Roles Anywhere.
X.509 Certificate – We use a sample private certificate stored in AWS Certificate Manager (ACM) and issued by AWS Private CA.
IAM Roles Anywhere configuration – This includes the following:
- Trust anchor – Establishes trust between IAM Roles Anywhere and the specified CA.
- IAM role – Grants permissions for log ingestion and trusts the IAM Roles Anywhere service principal. At minimum, this role must be granted permission for the osis:Ingest action.
- Profile – Defines which roles IAM Roles Anywhere can assume and the maximum permissions granted with the temporary credentials.
OpenSearch Service domain – For this post, we use an OpenSearch Service domain, which is an AWS provisioned equivalent of an open source OpenSearch cluster. We create the domain within a virtual private cloud (VPC); see VPC versus public domains for more information. Alternatively, you can use an Amazon OpenSearch Serverless collection, which is an OpenSearch cluster that scales compute capacity based on your application’s needs.
OpenSearch Ingestion – This is configured to receive logs over HTTP as the pipeline source and forward them to the OpenSearch Service domain as the pipeline sink.

Connectivity between AWS and your hybrid or multicloud environments

You can access your OpenSearch Ingestion pipelines using an interface VPC endpoint with push-based HTTP source, which provides private IP address connectivity. For production environments, we recommend using these private connections through interface endpoints for enhanced security.

Setting up this connectivity requires additional configuration, such as creating an AWS Site-to-Site VPN connection with your hybrid and multicloud network. Although this post focuses on the log ingestion solution, you can find detailed guidance on network connectivity in the following resources:

Hybrid connectivity – Learn about different methods to connect your on-premises networks to AWS
Configuring VPC access for Amazon OpenSearch Ingestion pipelines – Set up secure private access to your ingestion pipelines
Access Amazon OpenSearch Service using an OpenSearch Service-managed VPC endpoint (AWS PrivateLink) – Configure private endpoints for your OpenSearch Service domain

How Fluent Bit retrieves temporary credentials using IAM Roles Anywhere

Using the HTTP output plugin, Fluent Bit can send logs to the OpenSearch Ingestion pipeline. The following diagram is a simplified view of how Fluent Bit retrieves AWS credentials.

How Fluent Bit retrieve AWS credentials

On Linux systems, Fluent Bit can use an AWS Command Line Interface (AWS CLI) profile that uses the credential_process parameter to trigger an external process. This external process is invoked to generate or retrieve credentials not directly supported by the AWS CLI.

The following are two common mechanisms for the external process:

IAM Roles Anywhere – Uses X.509 certificates to authenticate and returns temporary IAM credentials through IAM Roles Anywhere
OpenID Connect (OIDC) federation – Exchanges an OIDC authentication token for temporary AWS credentials

Although both options are viable, this post focuses on IAM Roles Anywhere. In this setup, the AWS IAM Roles Anywhere Credential Helper is executed to handle the signing process for the CreateSession API. This returns credentials in a JSON format that Fluent Bit can consume.

As of this writing, the Fluent Bit aws_profile configuration is supported only on Linux. It is untested on other Unix-based systems (such as macOS) and is not implemented for Windows.

Prerequisites

Before you begin this walkthrough, make sure you have the following:

AWS account requirements – This includes:
- An AWS account with permissions to deploy AWS CloudFormation templates.
- Access to AWS CloudShell for exporting a sample private certificate we will create using AWS CloudFormation in a later step.
Remote (hybrid or multicloud) environment – You must have a remote machine with Linux-based operating system. This solution was tested on Ubuntu 24.04 with the following additional tooling installed:

Deploy AWS resources with AWS CloudFormation

Follow these steps to deploy AWS resources required for this solution:

Choose Launch Stack:
Enter a unique name for Stack name. The default value is osis-with-iamra.
Configure the stack parameters. Default values are provided in the following table.

Parameter	Default value	Description
`CACommonName`	`example.com`	Common Name for the CA
`CACountry`	`US`	Organization for the CA
`CAOrganization`	`Example Org`	Country for the CA
`CAValidityInDays`	`1826`	Validity period in days for the CA certificate
`VPCCIDR`	`10.0.0.0/16`	IPv4 CIDR range for the VPC used for OpenSearch Service domain
`PublicSubnetCIDR`	`10.0.0.0/24`	IPv4 CIDR range for public subnet
`PrivateSubnet1CIDR`	`10.0.1.0/24`	IPv4 CIDR range for private subnet
`PrivateSubnet2CIDR`	`10.0.2.0/24`	IPv4 CIDR range for private subnet
`DomainName`	`test-domain`	Name of the OpenSearch Service domain
`PipelineName`	`test-pipeline`	Name of the OpenSearch Ingestion pipeline
`PipelineIngestionPath`	`/test-ingestion-path`	Ingestion path for the OpenSearch Ingestion pipeline

Select the acknowledgement check box and choose Create Stack.
Stack deployment takes about 30 minutes to complete.
When stack creation is complete, navigate to the Outputs tab on the AWS CloudFormation console and note down the values for the resources created.
The following table summarizes the output values.

Output	Description	Example value
`ACMCertificateArn`	Amazon Resource Name (ARN) of the ACM certificate. You will use this for exporting certificate and private key files using the AWS CLI in a later step.	`arn:aws:acm:aa-example-1:111122223333:certificate/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111`
`CertificateAuthorityArn`	ARN of the Private CA.	`arn:aws:acm-pca:aa-example-1:111122223333:certificate-authority/a1b2c3d4-5678-90ab-cdef-EXAMPLE22222`
`TrustAnchorArn`	ARN of the IAM Roles Anywhere profile. You will use this value for configuring `credential_process` for IAM Roles Anywhere in a later step.	`arn:aws:rolesanywhere:aa-example-1:111122223333:trust-anchor/a1b2c3d4-5678-90ab-cdef-EXAMPLE33333`
`IngestionRoleArn`	ARN of the OpenSearch Ingestion role. You will use this value for configuring `credential_process` for IAM Roles Anywhere in a later step.	`arn:aws:iam::111122223333:role/role-name-with-path`
`ProfileArn`	ARN of the IAM Roles Anywhere profile. You will use this value for configuring `credential_process` for IAM Roles Anywhere in a later step.	`arn:aws:rolesanywhere:aa-example-1:111122223333:profile/a1b2c3d4-5678-90ab-cdef-EXAMPLE44444`
`OpenSearchDomainEndpoint`	Endpoint of the VPC OpenSearch domain. You will use this public endpoint for querying your index after ingestion.	`vpc-my-domain-123456789012.aa-example-1.es.amazonaws.com`
`PipelineEndpoint`	Endpoint of the OpenSearch Ingestion pipeline. You will use this public endpoint in the Fluent Bit configuration.	`my-pipeline-123456789012.aa-example-1.osis.amazonaws.com`
`PipelineIngestionPath`	Ingestion path for the OpenSearch Ingestion pipeline.	`/test-ingestion-path`

Export a sample private certificate using CloudShell

Follow these steps to export the sample private certificate created by the CloudFormation stack:

Open CloudShell. For more details, see Navigating the AWS CloudShell interface.
Export the certificate ARN from the CloudFormation outputs. If you changed the stack name in the previous step, use that value for <stack-name>, otherwise use the default value osis-with-iamra.

export CERT_ARN=$(aws cloudformation describe-stacks \
    --stack-name <stack-name> \
    --query 'Stacks[0].Outputs[?OutputKey==`ACMCertificateArn`].OutputValue' \
    --output text)

Extract the certificate and private key files:

# Generate and save the passphrase
export PASSPHRASE=$(openssl rand -base64 32)

# Export certificate using environment variables
aws acm export-certificate \
    --certificate-arn $CERT_ARN \
    --passphrase $(echo -n "$PASSPHRASE" | base64) \
    > cert_export.json

# Extract components to separate files
jq -r '.Certificate' cert_export.json > certificate.pem
jq -r '.PrivateKey' cert_export.json > encrypted_private_key.pem

# Decrypt the private key
openssl rsa -in encrypted_private_key.pem -out private_key.pem -passin pass:"$PASSPHRASE"

# Clear environment variables
unset PASSPHRASE CERT_ARN

Download the extracted certificate and private key files from CloudShell:
1. /home/cloudshell-user/certificate.pem
2. /home/cloudshell-user/private_key.pem

Configure an AWS CLI profile

Follow these steps to configure an AWS CLI profile for your log ingestion environment:

Store the downloaded certificate and private key to your environment. For an automated approach to generate and rotate certificates, see Set up AWS Private Certificate Authority to issue certificates for use with IAM Roles Anywhere.
Create a new profile named osis-pipeline-credentials that invokes the credential process. Replace the placeholders with your specific values. Find the values for trusted-anchor-arn, profile-arn, and ingestion-role-arn in your CloudFormation stack outputs.

aws configure set profile.osis-pipeline-credentials.credential_process "</path/to/aws_signing_helper> credential-process \
      --certificate </path/to/certificate.pem> \
      --private-key </path/to/private_key.pem> \
      --trust-anchor-arn <trusted-anchor-arn> \
      --profile-arn <profile-arn> \
      --role-arn <ingestion-role-arn>"

Verify your configuration. Open the ~/.aws/config file and confirm it contains a profile named osis-pipeline-credentials similar to the following:

[profile osis-pipeline-credentials]
credential_process = </path/to/aws_signing_helper> credential-process       --certificate </path/to/certificate.pem>       --private-key </path/to/private_key.pem>       --trust-anchor-arn <trusted-anchor-arn>       --profile-arn <profile-arn>       --role-arn <ingestion-role-arn>

Configure Fluent Bit

Run the following command to create a Fluent Bit configuration. Replace the placeholders with your specific values. Find the osis-pipeline-endpoint and pipeline-ingestion-path values in your CloudFormation stack outputs.

cat << 'EOF' > ~/fluent-bit.conf
[INPUT]
  name                  tail
  path                  /var/log/syslog
  read_from_head        true
  refresh_interval      5
[OUTPUT]
  name 			http
  match	   		*
  aws_service 		osis
  host 			<osis-pipeline-endpoint>
  port 			443
  uri 			<pipeline-ingestion-path> 
  format 		json
  aws_auth	 	true
  aws_region 		<aa-example-1>   
  aws_profile 		osis-pipeline-credentials
  tls 			On
EOF

This example configuration includes the following:

Uses the tail input plugin to monitor the /var/log/syslog file
Uses the http output plugin to flush log records to the OpenSearch Ingestion pipeline endpoint
Uses the osis-pipeline-credentials profile to obtain temporary AWS credentials for SigV4 authentication (aws_auth set to true)

Test the solution

Follow these steps to test the setup:

Start the Fluent Bit client with the configuration file fluent-bit.conf that you created in the previous step. Replace the placeholder with the value applicable to your environment. For Ubuntu 24.04, the default path of the Fluent Bit client is /opt/fluent-bit/bin/fluent-bit. Adjust the path if using other distributions.

sudo AWS_CONFIG_FILE=~/.aws/config <path-to-fluent-bit> -c ~/fluent-bit.conf

Because the solution in this post launched the OpenSearch Service domain within a VPC, you will need an environment that has connectivity to the VPC. For this post, we create a CloudShell VPC environment to run the commands in the next step. Find the VPC, subnet, and security group to use from your CloudFormation stack outputs.
The solution that you deployed through AWS CloudFormation dynamically creates indexes based on ingestion timestamps, format logs-%{yyyy.MM.dd}. You can specify your preferred naming using OpenSearch Ingestion index management. You can query your OpenSearch index using your preferred tool to see the ingested logs from Fluent Bit. We use awscurl in a CloudShell environment as shown in the following example. Replace the placeholders with your specific values. Find the opensearch-domain-endpoint value in your CloudFormation stack outputs.

pip install awscurl

export OPENSEARCH_DOMAIN_ENDPOINT=https://<opensearch-domain-endpoint>

# List indices matching logs-%{yyyy.MM.dd} format and get most recent one to query
export INDEX=$(awscurl --service es "$OPENSEARCH_DOMAIN_ENDPOINT/_cat/indices?v" | grep -E "logs-[0-9]{4}\.[0-9]{2}\.[0-9]{2}" | sort -r | head -1 | awk '{print $3}')

awscurl --service es $OPENSEARCH_DOMAIN_ENDPOINT/$INDEX/_search \
        -X GET -H "Content-Type: application/json" \
        -d '{
            "size": 10,
            "sort": [
              {"@timestamp": {"order": "desc"}}
            ],
            "query": { "match_all": {} }
          }' | jq '.hits.hits[]._source'

The following is an example of the expected output:

{
  "date": 1732039662.399506,
  "log": "2024-11-19T18:07:42.399375+00:00 test-server fluent-bit[9986]: 200 OK",
  "@timestamp": "2024-11-19T18:07:42.812Z"
}
{
  "date": 1732039662.399501,
  "log": "2024-11-19T18:07:42.399224+00:00 test-server fluent-bit[9986]: [2024/11/19 18:07:42] [ info] [output:http:http.0] test-pipeline-123456789012.us-east-2.osis.amazonaws.com:443, HTTP status=200",
  "@timestamp": "2024-11-19T18:07:42.812Z"
}
...

Clean up

To avoid future charges, remove the deployed resources:

Delete the CloudFormation stack.
Remove generated files from CloudShell:

rm cert_export.json encrypted_private_key.pem certificate.pem private_key.pem

Conclusion

In this post, we demonstrated how to obtain temporary credentials from IAM Roles Anywhere and securely ingest logs from hybrid or multicloud environments into OpenSearch Service using OpenSearch Ingestion. This approach minimizes the risk of credential exposure while enabling centralized log collection from distributed workloads. This solution is particularly valuable for organizations managing complex infrastructures across multiple environments and looking to consolidate observability data in OpenSearch Service. For additional details, refer to the following resources:

If you have questions or feedback about this post, please leave them in the comments section.

About the Authors

Xiaoxue Xu is a Solutions Architect for AWS based in Toronto. She primarily works with financial services customers to help secure their workload and design scalable solutions on the AWS Cloud.

Simran Singh is a Senior Solutions Architect at AWS. In this role, he assists our large enterprise customers in meeting their key business objectives using AWS. His areas of expertise include artificial intelligence and machine learning, security, and improving the experience of developers building on AWS.

Enhance security and performance with TLS 1.3 and Perfect Forward Secrecy on Amazon OpenSearch Service

2025-06-12 Shubham Kumar

Post Syndicated from Shubham Kumar original https://aws.amazon.com/blogs/big-data/enhance-security-and-performance-with-tls-1-3-and-perfect-forward-secrecy-on-amazon-opensearch-service/

Amazon OpenSearch Service recently introduced a new Transport Layer Security (TLS) policy Policy-Min-TLS-1-2-PFS-2023-10, which supports the latest TLS 1.3 protocol and TLS 1.2 with Perfect Forward Secrecy (PFS) cipher suites. This new policy improves security and enhances OpenSearch performance.

OpenSearch Service previously offered predefined TLS policies for domain endpoint security, making it possible to encrypt your traffic end-to-end by enforcing HTTPS. However, these policies were limited to older versions of TLS, such as TLS 1.0 and TLS 1.2, without any PFS offerings.

In this post, we discuss the benefits of this new policy and how to enable it using the AWS Command Line Interface (AWS CLI).

Solution overview

The new TLS security policy provides an upgraded security posture for OpenSearch Service domains by implementing TLS 1.3 and PFS. This makes it possible to enhance the confidentiality and integrity of traffic between clients and your OpenSearch Service domains, providing a more secure and efficient communication channel for your sensitive data. TLS 1.3 is the latest version of the Transport Layer Security protocol, designed to prevent certain attacks targeting legacy TLS ciphers and provide improvements like 0-RTT resumption for faster connection times. TLS 1.3 can establish secure connections faster than TLS 1.2, resulting in reduced latency for your applications. PFS is an important security enhancement that makes sure past communications remain secure, even if the server’s long-term secret key is compromised in the future. By using a unique, randomly generated session key for each connection, PFS adds an extra layer of protection against potential eavesdropping or decryption of encrypted data. Compared to the older TLS 1.2 policy Policy-Min-TLS-1-2-2019-07, TLS 1.2 with PFS offers stronger security by protecting against potential key compromises, while still maintaining compatibility with older clients that don’t support TLS 1.3.

Prerequisites

To start using this new policy, you need the following prerequisites:

An active AWS account
Appropriate AWS Identity and Access Management (IAM) permissions to create and modify OpenSearch Service domains

Enable the new TLS policy on OpenSearch Service

To create new domains with the new TLS policy enabled, add --domain-endpoint-options '{"TLSSecurityPolicy": "Policy-Min-TLS-1-2-PFS-2023-10"}' to the create-domain AWS CLI command:

aws opensearch create-domain \
--domain-name my-domain \
--domain-endpoint-options '{"TLSSecurityPolicy": "Policy-Min-TLS-1-2-PFS-2023-10"}' <other config options>

For existing domains, you can update the domain configuration to use the new TLS policy by running the update-domain-config AWS CLI command:

aws opensearch update-domain-config \
--domain-name my-domain \
--domain-endpoint-options '{"TLSSecurityPolicy": "Policy-Min-TLS-1-2-PFS-2023-10"}'

Client-side considerations

Most modern clients and libraries should support TLS 1.3 and TLS 1.2 with PFS out of the box. However, if you encounter issues or compatibility concerns, you might need to update your client libraries or configurations to enable support for the new TLS policy.

Conclusion

The new Policy-Min-TLS-1-2-PFS-2023-10 security policy for OpenSearch Service offers significant improvements in security and performance. By supporting TLS 1.3 and TLS 1.2 with PFS, this policy helps protect your data in transit and provides faster connection times. We recommend that you start using this new TLS security policy for improved security posture and performance when connecting to your OpenSearch Service domains. To get started, follow the steps outlined in this post to enable the new policy on your existing or new domains.

For more information on the available TLS options and how to configure them, refer to Infrastructure security in Amazon OpenSearch Service.

At Amazon, security is our top priority, and we are continuously working to enhance the security and performance of our services. Stay tuned for more exciting updates!

About the authors

Shubham Kumar is a Software Development Engineer at Amazon OpenSearch Service, specializing in the security domain. He is passionate about developing robust security features to enhance the protection of customer data and infrastructure.

Sachet Alva is a Software Development Manager at Amazon OpenSearch Service, overseeing the infrastructure security and custom package initiatives. His team’s innovations contribute to the enhanced security and flexibility of Amazon OpenSearch Service deployments.

Naveen Negi is a Senior Tech Product Manager for Amazon OpenSearch Service. He works closely with engineering teams and customers to shape the future of OpenSearch Service, making sure it meets evolving security and performance needs.

Designing centralized and distributed network connectivity patterns for Amazon OpenSearch Serverless

2025-06-10 Ankush Goyal

Post Syndicated from Ankush Goyal original https://aws.amazon.com/blogs/big-data/designing-centralized-and-distributed-network-connectivity-patterns-for-amazon-opensearch-serverless/

Amazon OpenSearch Serverless is a fully managed search and analytics service that automatically provisions and scales infrastructure to help you run search and analytics workloads without cluster management. With OpenSearch Serverless, you can quickly build search and analytics capabilities into your applications.

As organizations scale their use of OpenSearch Serverless, understanding network architecture and DNS management becomes increasingly important. Building upon the connectivity patterns discussed in our previous post Network connectivity patterns for Amazon OpenSearch Serverless, this post covers advanced deployment scenarios focused on centralized and distributed access patterns—specifically, how enterprises can simplify network connectivity across multiple AWS accounts and extend access to on-premises environments for their OpenSearch Serverless deployments.

We outline two key deployment patterns:

Pattern 1 – A centralized endpoint model where interface virtual private cloud (VPC) endpoints for OpenSearch Serverless are deployed in a shared services VPC, allowing spoke VPCs from other AWS accounts and on premises to access OpenSearch Serverless collections through these consolidated endpoints.
Pattern 2 – A distributed endpoint model where interface VPC endpoints are created in individual spoke VPCs, with multiple consumers (central account, on-premises networks, and other spoke accounts) accessing these endpoints through centralized DNS management. This approach provides direct connectivity within each spoke VPC while maintaining centralized DNS control and management across the organization.

Before diving into advanced deployment patterns, let’s review the DNS behavior of OpenSearch Serverless when accessed through an interface VPC endpoint (AWS PrivateLink). Understanding this foundational aspect can help clarify the connectivity patterns we explore in this post.

OpenSearch Serverless interface VPC endpoint DNS resolution

When creating an OpenSearch Serverless interface VPC endpoint, the service automatically provisions three private hosted zones: one visible private hosted zone us-east-1.aoss.amazonaws.com that handles domain resolution for the OpenSearch Serverless collection and dashboard, another visible private hosted zone us-east-1.opensearch.amazonaws.com that manages resolution for the OpenSearch UI (OpenSearch Dashboards), and one hidden internal private hosted zone that manages the final DNS resolution to private IP addresses.

Our objective in this post is to explore how the two private hosted zones for OpenSearch Serverless work together: the visible private hosted zone us-east-1.aoss.amazonaws.com for collections and dashboards, and the hidden private hosted zone for final DNS resolution to private IP addresses. We examine how these private hosted zones enable scalable DNS resolution in both centralized and distributed architectures. The following workflow diagram shows the DNS resolution flow for the us-east-1 AWS Region. The same pattern applies to other Regions, with the Region identifiers in the DNS records changing accordingly.

The workflow consists of the following steps:

A user requests access to a collection URL (for example, abc.us-east-1.aoss.amazonaws.com).
The DNS request is sent to the Amazon Route 53 Resolver, which checks the visible private hosted zone us-east-1.aoss.amazonaws.com and finds a CNAME record pointing to the endpoint-specific domain.
The Route 53 Resolver uses the hidden internal private hosted zone to resolve this endpoint-specific domain to the VPC endpoint’s private IP address.
Traffic is allowed only if it originates from the interface VPC endpoint approved by OpenSearch Serverless network policies.

Although this DNS Resolution Process provides flexible and secure private access, it becomes complex when you need connectivity from multiple VPCs, different AWS accounts, or on-premises networks. The following patterns address these challenges and outline strategies to simplify network access and DNS management for OpenSearch Serverless in such environments.

Pattern 1: Centralized interface VPC endpoint for OpenSearch Serverless

This pattern uses a centralized approach where a shared services AWS account with a shared services VPC hosts the OpenSearch Serverless interface VPC endpoint and OpenSearch Serverless collection. From there, other AWS accounts with Amazon VPCs (spoke VPCs) need to be able to access OpenSearch Serverless collections through this central endpoint. Organizations commonly implement this setup in hub-and-spoke network designs that connect their VPCs using either AWS Transit Gateway or AWS Cloud WAN. The following diagram illustrates this architecture.

Challenge

When accessing from on-premises networks, both network access and DNS resolution for the OpenSearch Serverless interface VPC endpoint work successfully. However, although the endpoint is network-accessible from spoke VPCs (for example, through Transit Gateway or AWS Cloud WAN), DNS resolution from these VPCs fail.

This happens because OpenSearch Serverless creates and uses a private hosted zone us-east-1.aoss.amazonaws.com that is only associated with the VPC containing the endpoint, in this case, the Shared Services VPC. Simply sharing this private hosted zone with the spoke VPCs doesn’t solve the problem, because the wildcard CNAME record references a DNS name privatelink.c0X.sgw.iad.prod.aoss.searchservices.aws.dev. This DNS name can’t be resolved from other VPCs without additional configuration, because it belongs to a private hosted zone privatelink.c0X.sgw.iad.prod.aoss.searchservices.aws.dev that is only associated with the shared services VPC. This private hosted zone isn’t visible in your account and is controlled by AWS.

Solution: Use Amazon Route 53 Profiles for cross-VPC DNS resolution

To enable centralized DNS resolution, you can use Amazon Route 53 Profiles. With Route 53 Profiles, you can manage and apply DNS-related Amazon Route 53 configurations across multiple VPCs and AWS accounts. The following diagram illustrates the solution architecture.

The solution consists of the following steps:

Create an OpenSearch Serverless interface VPC endpoint in the shared services VPC. This automatically creates and associates the following:
- 1. Two default private hosted zones.
  2. One hidden private hosted zone with this VPC.
Create a Route 53 Profile in the shared services account.
Associate the interface VPC endpoint for OpenSearch Serverless with the Route 53 Profile.
1. The Route 53 Profile automatically associates the hidden private hosted zone with the profile.
Associate the private hosted zone us-east-1.aoss.amazonaws.com that was automatically created by OpenSearch Serverless with the Route 53 Profile.
Share the Route 53 Profile with your other AWS accounts in your organization using AWS Resource Access Manager (AWS RAM).
Associate the spoke VPCs (located in different accounts) with the Route 53 Profile.

If you have an existing Route 53 Profile in your shared services account that is already associated to spoke VPCs, you can simply associate the OpenSearch Serverless interface VPC endpoint and the private hosted zone us-east-1.aoss.amazonaws.com to this profile.

After completing these steps, the DNS resolution for the OpenSearch Serverless collection and dashboard endpoints works seamlessly from spoke VPCs associated with the Route 53 Profile. Clients in spoke VPCs can resolve and access OpenSearch Serverless collections and dashboards through the centralized VPC endpoint.

Pattern 2: Distributed interface VPC endpoint for OpenSearch Serverless

Each spoke VPC, residing in its respective AWS account, hosts its own OpenSearch Serverless collection and interface VPC endpoint. We now want to achieve the following:

Centralize DNS management in a shared services VPC to provide consistent resolution for OpenSearch Serverless collections deployed across multiple spoke accounts
Provide on-premises resources with DNS resolution capability for all OpenSearch Serverless collections across the organization through a Route 53 Resolver inbound endpoint in the shared services VPC

The following diagram illustrates this architecture.

Challenge

Managing DNS resolution for OpenSearch Serverless collections and dashboards becomes complex in this distributed model because each interface VPC endpoint creates its own set of private hosted zones that are only associated with their respective VPCs. This creates a fragmented DNS landscape where the shared services VPC and on-premises networks need a consolidated way to resolve domains of OpenSearch Serverless collections and dashboards across multiple spoke accounts.

Solution: Use a self-managed private hosted zone in the shared services VPC for on-prem DNS resolution

To enable centralized DNS resolution for distributed endpoints, create a self-managed private hosted zone in the shared services account and associate it with the shared services VPC. Within this private hosted zone, you can create CNAME records that map each OpenSearch Serverless collection endpoint to its respective interface VPC endpoint DNS names in the spoke accounts. The following diagram illustrates this architecture.

Implementation consists of the following steps:

Create a self-managed private hosted zone in the shared services account with the domain name us-east-1.aoss.amazonaws.com and associate it with the shared services VPC. For each OpenSearch Serverless collection, create a CNAME record that points to the Regional DNS name of its corresponding interface VPC endpoint.

This configuration enables both on-premises resources and resources in the shared services VPC to resolve OpenSearch Serverless endpoints that are in the spoke accounts.

After you complete these steps, each OpenSearch Serverless interface VPC endpoint remains within its original AWS account, maintaining security boundaries and account-level autonomy. On-premises systems can access OpenSearch Serverless collections and dashboards using original collection DNS names (for example, {collection-name}.us-east-1.aoss.amazonaws.com) through DNS resolution provided by the private hosted zone in the shared services VPC.

Conclusion

As organizations scale their adoption of OpenSearch Serverless, establishing secure and centralized network access becomes increasingly important. In this post, we explored two architectural patterns specifically around DNS management:

Centralized endpoint model – This pattern is ideal when a shared services account manages the OpenSearch Serverless interface VPC endpoints, allowing multiple spoke accounts to access OpenSearch Serverless collections and dashboards through a centralized set of network resources.
Distributed endpoint model with centralized DNS – This pattern is suitable for organizations that require account-level autonomy, where each AWS account manages its own OpenSearch Serverless interface VPC endpoints, while DNS resolution is centralized through a shared self-managed private hosted zone in a shared services account.

By understanding the DNS architecture of OpenSearch Serverless and using services like Route 53 Profiles and AWS RAM, organizations can build secure and robust access patterns that align with their organizational structure and needs.

About the Authors

Ankush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.

Anvesh Koganti is a Solutions Architect at AWS specializing in Networking. He focuses on helping customers build networking architectures for highly scalable and resilient AWS environments. Outside of work, Anvesh is passionate about consumer technology and enjoys listening to podcasts on tech and business. When disconnecting from the digital world, Anvesh spends time outdoors hiking and biking.

Salman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He specializes in guiding customers through the design, implementation, and support of AWS solutions. Combining his networking expertise with a drive to explore new technologies, he helps organizations successfully navigate their cloud journey. Outside of work, he enjoys photography, traveling, and watching his favorite sports teams.

PackScan: Building real-time sort center analytics with AWS Services

2025-05-30 Sairam Vangapally

Post Syndicated from Sairam Vangapally original https://aws.amazon.com/blogs/big-data/packscan-building-real-time-sort-center-analytics-with-aws-services/

Amazon manages a complex logistics network with multiple touch points, from fulfillment centers to sort centers to final customer delivery. Among these, sort centers play a crucial role in the middle mile, providing faster and more efficient package movement. Within Amazon’s Middle Mile operations, high-volume sort centers process millions of packages daily, making immediate access to operational data essential for optimizing efficiency and decision-making. Real-time visibility into key metrics—such as package movements, container statuses, and associate productivity—is critical for smooth logistics operations. To address the need for real-time operational planning, the Amazon Middle Mile team developed PackScan, a cloud-based platform designed to provide instant insights across the network. By significantly reducing data latency, PackScan enables proactive decision-making, so teams can monitor inbound package flows, optimize outbound shipments based on live data, track associate productivity, identify bottlenecks, and enhance overall operational efficiency—all in real time.

In this post, we explore how PackScan uses Amazon cloud-based services to drive real-time visibility, improve logistics efficiency, and support the seamless movement of packages across Amazon’s Middle Mile network.

Prerequisites

This post assumes a foundational understanding of the following services and concepts:

Core services such as Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), AWS Lambda, Amazon Data Firehose, and Amazon OpenSearch Service
Infrastructure components like Amazon Elastic Compute Cloud (Amazon EC2), Amazon Load Balancers, and security configurations including network rules and security groups
Visualization tools such as Grafana hosted on Amazon EC2

Although hands-on experience is not required, a conceptual understanding of these services will help in understanding the architecture, design patterns, and components discussed throughout the article.

Business challenges

Amazon’s sort centers handle over 15 million packages daily across more than 120 facilities in North America. Given this scale, even minor delays in operational insights can lead to inefficiencies, increased costs, and escalations. Traditionally, data latencies of up to an hour have restricted the ability to make proactive decisions, directly affecting productivity, resource allocation, and responsiveness—especially during peak periods like holiday seasons and big deal days.

Without immediate visibility into package movements, container statuses, and associate performance, operational teams face challenges in identifying and resolving bottlenecks in real time. The lack of timely insights can disrupt the flow of packages, leading to shipment delays, reduced throughput, and suboptimal facility performance. Addressing these inefficiencies required a solution capable of delivering real-time, high-fidelity data to support rapid decision-making.

To bridge this gap, Amazon’s Middle Mile organization needed a scalable platform that could enhance visibility, minimize latency, and provide up-to-the-minute insights into logistics operations. PackScan was designed to meet these demands, giving teams access to the real-time data necessary to optimize workflows, mitigate bottlenecks, and improve overall efficiency.

Data flow

In 2024, PackScan was deployed across 80 sort centers in the USA, enabling real-time package analytics. The solution powers Grafana dashboards, which refresh every 10 seconds by fetching live package data from OpenSearch Service. With this near real-time visibility, operations teams can monitor package movement and sorting efficiency across sort centers. The following diagram outlines how package scan data is ingested, processed, and made actionable.

Each sort center is equipped with hardware at inbound stations where packages arrive from trailers. Integrated barcode scanners automatically scan each package as it enters the sorting process. Every scan generates an SNS event, capturing key attributes such as the package ID, dimensions, the associate who performed the scan, and the timestamp and location of the scan.

After they’re generated, these SNS events are ingested into Data Firehose through a Lambda function, where the data undergoes real-time enrichment. During this process, additional attributes are appended, including the business logic rules. The enriched data is then streamed into OpenSearch Service, where events are indexed to enable fast and efficient querying. With the indexed package scan events available in OpenSearch Service, real-time analytics and monitoring become possible. The Grafana dashboards query this data every 10 seconds, providing operational insights into package inflow metrics and associate performance.

Solution overview

PackScan was implemented using a structured and scalable approach, using AWS cloud-based services to enable high-frequency data ingestion, real-time processing, and actionable insights. The architecture is designed to minimize latency while providing reliability, scalability, and operational efficiency. The solution is built around a serverless, event-driven architecture that dynamically scales based on data ingestion volumes. The architecture—illustrated in the following figure—enabled us to build a real-time data solution, utilizing the advantages of various AWS services to provide low-latency analytics, high scalability, and real-time operational insights across Amazon’s sort centers.

The following are the key components and features of the solution:

Real-time data processing – Lambda functions serve as the processing backbone of the system, handling 500,000 scan events per second. Each incoming event is processed by applying data transformations, enrichment, and validation before passing it downstream.
High-frequency data ingestion and streaming – Data Firehose is the primary ingestion pipeline, handling millions of scan events daily from thousands of barcode scanners across multiple sort centers. The Firehose streams handle incoming data of 12,000 PUT requests per second, maintaining smooth ingestion and low-latency streaming. Data retention policies are set to buffer and forward enriched events every 60 seconds or upon reaching 5 MB batch size, optimizing storage and processing efficiency.
Optimized querying and operational insights – OpenSearch Service is used to index and store the processed scan events, providing real-time querying and anomaly detection. The OpenSearch cluster consists of 12 data nodes (r5.4xlarge.search) and 3 primary nodes (r5.large.search), processing up to 10 GB of data per day with a rolling index strategy, where indexes are rotated every 24 hours to maintain query performance. The system supports concurrent queries per second, enabling logistics teams to perform rapid lookups and gain instant visibility into package movements.
Live visualization and dashboarding – Grafana, hosted on an m5.12xlarge EC2 instance, provides real-time visualization of key logistics metrics. The dashboards refresh every 10 seconds, querying OpenSearch and displaying up-to-the-minute package analytics. The setup includes multiple preconfigured dashboards, monitoring package flow at different inbound stations, and workforce efficiency. These dashboards support concurrent users, enabling supervisors and associates to track and optimize operations proactively. The following screenshot shows one of the real-time dashboards, with details of package flow by different routes within sort centers.

The entire PackScan architecture is designed for automatic scaling, adjusting dynamically based on data ingestion volume to maintain efficiency during peak and off-peak operations. This approach provides cost-effective resource utilization while maintaining high availability and performance.

Business outcomes

The implementation of PackScan has led to measurable improvements in operational efficiency, workforce productivity, and real-time decision-making across Amazon’s sort centers. By reducing data latency and enabling real-time insights, PackScan has transformed logistics operations in meaningful ways:

Widespread deployment – PackScan was deployed across 80 sort centers, supporting approximately 1,000 display monitors that provide real-time operational insights.
Significant reduction in data latency – Data latency dropped from approximately 1 hour to less than 1 minute, allowing for real-time operational responsiveness and minimizing workflow disruptions.
Proactive operational management – With dynamic workload balancing and instant bottleneck identification, supervisors can now address issues as they arise, leading to smoother operations and fewer escalations.
Boost in workforce productivity – The real-time performance feedback has enhanced associate engagement, resulting in a 25% increase in throughput per hour and 12% reduction in labor hours.

Overall, PackScan has redefined real-time logistics visibility within Amazon’s Middle Mile operations, empowering operational teams with actionable insights, enhanced workforce efficiency, and a data-driven approach to package movement and sort center performance.

Lessons learned and best practices

The deployment and scaling of PackScan provided valuable insights into optimizing real-time logistics visibility. Several key lessons and best practices emerged from this implementation:

Cloud architecture drives efficiency – Adopting Amazon technologies provides seamless scalability, reduced operational overhead, and lower infrastructure costs, while maintaining high reliability. The following table shows an approximate breakdown of monthly service costs observed in production. This is an estimation based on current pricing; we recommend checking the respective AWS service pricing pages to generate the most up-to-date quote. This architecture demonstrates that with combination of provisioned and serverless design, production-ready solutions can be built and scaled at a fraction of the cost of traditional infrastructure.

AWS Service	Description	Estimated Monthly Cost
Amazon EC2	Three EC2 instances of type m5.12xlarge hosting Grafana	$1,700
AWS Lambda	Streams SNS events to Data Firehose	$4,000
Amazon Data Firehose	Real-time data delivery with 12,000 records streaming to OpenSearch Service	$1,500
Amazon OpenSearch Service	Indexing and querying package scan events	$28,000

Real-time visibility is a game changer – Immediate access to operational data enhances agility, enabling teams to make timely, data-driven decisions that prevent bottlenecks and improve throughput.
Continuous monitoring enhances decision-making – Operational dashboards should evolve with business needs. Regular monitoring and updates provide accuracy, usability, and relevance in driving informed decision-making.

By applying these best practices, PackScan has set a foundation for scalable, real-time logistics management, making sure that Amazon’s Middle Mile operations remain proactive, efficient, and highly responsive to changing business demands.

Conclusion

PackScan has successfully transformed real-time operational visibility within Amazon’s sort centers, addressing critical challenges in data latency, workforce productivity, and logistics efficiency. By using AWS services, particularly Data Firehose for real-time data delivery and OpenSearch Service for analytics, PackScan has enabled proactive decision-making, streamlined operations, and enhanced throughput in high-volume sort environments. Looking ahead, future enhancements will focus on further elevating operational intelligence and scalability, including:

Integrating predictive analytics to anticipate workflow bottlenecks and optimize resource allocation
Scaling the solution across additional operational scenarios, providing greater resilience and adaptability to dynamic logistics environments

With these advancements, PackScan will continue to drive operational excellence, cost-efficiency, and real-time decision-making capabilities, reinforcing Amazon’s commitment to innovation in logistics and supply chain management.

For those interested in implementing similar solutions, we recommend exploring AWS Serverless Architecture Patterns and the AWS Architecture Blog for additional insights and best practices in building scalable, real-time analytics solutions.

About the authors

Sairam Vangapally is a Data Engineer at Amazon with extensive experience architecting real-time, large-scale data platforms that power critical logistics operations across North America. He has led the design and deployment of end-to-end data pipelines, enabling high-throughput ingestion, transformation, and analytics at scale. He is passionate about building resilient data infrastructure and driving cross-functional collaboration to deliver solutions that accelerate operational insights and business impact.

Nitin Goyal serves as a Data Engineering Manager in Amazon’s Sort Center organization, where he leads initiatives to optimize operational efficiency across North American facilities. With over nine years of tenure at Amazon spanning multiple teams, he specializes in architecting high-performance data systems, with particular emphasis on real-time streaming pipelines, artificial intelligence, and low-latency solutions. His expertise drives the development of sophisticated operational workflows that enhance sort center productivity and effectiveness.