All posts by Arun Lakshmanan

Simplify your query management with search templates in Amazon OpenSearch Service

2024-04-03 Arun Lakshmanan

Post Syndicated from Arun Lakshmanan original https://aws.amazon.com/blogs/big-data/simplify-your-query-management-with-search-templates-in-amazon-opensearch-service/

Amazon OpenSearch Service is an Apache-2.0-licensed distributed search and analytics suite offered by AWS. This fully managed service allows organizations to secure data, perform keyword and semantic search, analyze logs, alert on anomalies, explore interactive log analytics, implement real-time application monitoring, and gain a more profound understanding of their information landscape. OpenSearch Service provides the tools and resources needed to unlock the full potential of your data. With its scalability, reliability, and ease of use, it’s a valuable solution for businesses seeking to optimize their data-driven decision-making processes and improve overall operational efficiency.

This post delves into the transformative world of search templates. We unravel the power of search templates in revolutionizing the way you handle queries, providing a comprehensive guide to help you navigate through the intricacies of this innovative solution. From optimizing search processes to saving time and reducing complexities, discover how incorporating search templates can elevate your query management game.

Search templates

Search templates empower developers to articulate intricate queries within OpenSearch, enabling their reuse across various application scenarios, eliminating the complexity of query generation in the code. This flexibility also grants you the ability to modify your queries without requiring application recompilation. Search templates in OpenSearch use the mustache template, which is a logic-free templating language. Search templates can be reused by their name. A search template that is based on mustache has a query structure and placeholders for the variable values. You use the _search API to query, specifying the actual values that OpenSearch should use. You can create placeholders for variables that will be changed to their true values at runtime. Double curly braces ({{}}) serve as placeholders in templates.

Mustache enables you to generate dynamic filters or queries based on the values passed in the search request, making your search requests more flexible and powerful.

In the following example, the search template runs the query in the “source” block by passing in the values for the field and value parameters from the “params” block:

GET /myindex/_search/template
 { 
      "source": {   
         "query": { 
             "bool": {
               "must": [
                 {
                   "match": {
                    "{{field}}": "{{value}}"
                 }
             }
        ]
     }
    }
  },
 "params": {
    "field": "place",
    "value": "sweethome"
  }
}

You can store templates in the cluster with a name and refer to them in a search instead of attaching the template in each request. You use the PUT _scripts API to publish a template to the cluster. Let’s say you have an index of books, and you want to search for books with publication date, ratings, and price. You could create and publish a search template as follows:

PUT /_scripts/find_book
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "bool": {
          "must": [
            {
              "range": {
                "publish_date": {
                  "gte": "{{gte_date}}"
                }
              }
            },
            {
              "range": {
                "rating": {
                  "gte": "{{gte_rating}}"
                }
              }
            },
            {
              "range": {
                "price": {
                  "lte": "{{lte_price}}"
                }
              }
            }
          ]
        }
      }
    }
  }
}

In this example, you define a search template called find_book that uses the mustache template language with defined placeholders for the gte_date, gte_rating, and lte_price parameters.

To use the search template stored in the cluster, you can send a request to OpenSearch with the appropriate parameters. For example, you can search for products that have been published in the last year with ratings greater than 4.0, and priced less than $20:

POST /books/_search/template
{
  "id": "find_book",
  "params": {
    "gte_date": "now-1y",
    "gte_rating": 4.0,
    "lte_price": 20
  }
}

This query will return all books that have been published in the last year, with a rating of at least 4.0, and a price less than $20 from the books index.

Default values in search templates

Default values are values that are used for search parameters when the query that engages the template doesn’t specify values for them. In the context of the find_book example, you can set default values for the from, size, and gte_date parameters in case they are not provided in the search request. To set default values, you can use the following mustache template:

PUT /_scripts/find_book
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "publish_date": {
                  "gte": "{{gte_date}}{{^gte_date}}now-1y{{/gte_date}}"
                }
              }
            },
            {
              "range": {
                "rating": {
                  "gte": "{{gte_rating}}"
                }
              }
            },
            {
              "range": {
                "price": {
                  "lte": "{{lte_price}}"
                }
              }
            }
          ]
        },
        "from": "{{from}}{{^from}}0{{/from}}",
        "size": "{{size}}{{^size}}2{{/size}}"
      }
    }
  }
}

In this template, the {{from}}, {{size}}, and {{gte_date}} parameters are placeholders that can be filled in with specific values when the template is used in a search. If no value is specified for {{from}}, {{size}}, and {{gte_date}}, OpenSearch uses the default values of 0, 2, and now-1y, respectively. This means that if a user searches for products without specifying from, size, and gte_date, the search will return just two products matching the search criteria for 1 year.

You can also use the render API as follows if you have a stored template and want to validate it:

POST _render/template
{
  "id": "find_book",
  "params": {
    "gte_date": "now-1y",
    "gte_rating": 4.0,
    "lte_price": 20
  }
}

Conditions in search templates

The conditional statement that allows you to control the flow of your search template based on certain conditions. It’s often used to include or exclude certain parts of the search request based on certain parameters. The syntax as follows:

{{#Any condition}}
  ... code to execute if the condition is true ...
{{/Any}}

The following example searches for books based on the gte_date, gte_rating, and lte_price parameters and an optional stock parameter. The if condition is used to include the condition_block/term query only if the stock parameter is present in the search request. If the is_available parameter is not present, the condition_block/term query will be skipped.

GET /books/_search/template
{
  "source": """{
    "query": {
      "bool": {
        "must": [
        {{#is_available}}
        {
          "term": {
            "in_stock": "{{is_available}}"
          }
        },
        {{/is_available}}
          {
            "range": {
              "publish_date": {
                "gte": "{{gte_date}}"
              }
            }
          },
          {
            "range": {
              "rating": {
                "gte": "{{gte_rating}}"
              }
            }
          },
          {
            "range": {
              "price": {
                "lte": "{{lte_price}}"
              }
            }
          }
        ]
      }
    }
  }""",
  "params": {
    "gte_date": "now-3y",
    "gte_rating": 4.0,
    "lte_price": 20,
    "is_available": true
  }
}

By using a conditional statement in this way, you can make your search requests more flexible and efficient by only including the necessary filters when they are needed.

To make the query valid inside the JSON, it needs to be escaped with triple quotes (""") in the payload.

Loops in search templates

A loop is a feature of mustache templates that allows you to iterate over an array of values and run the same code block for each item in the array. It’s often used to generate a dynamic list of filters or queries based on the values passed in the search request. The syntax is as follows:

{{#list item in array}}
  ... code to execute for each item ...
{{/list}}

The following example searches for books based on a query string ({{query}}) and an array of categories to filter the search results. The mustache loop is used to generate a match filter for each item in the categories array.

GET books/_search/template
{
  "source": """{
    "query": {
      "bool": {
        "must": [
        {{#list}}
        {
          "match": {
            "category": "{{list}}"
          }
        }
        {{/list}}
          {
          "match": {
            "title": "{{name}}"
          }
        }
        ]
      }
    }
  }""",
  "params": {
    "name": "killer",
    "list": ["Classics", "comics", "Horror"]
  }
}

The search request is rendered as follows:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "killer"
          }
        },
        {
          "match": {
            "category": "Classics"
          }
        },
        {
          "match": {
            "category": "comics"
          }
        },
        {
          "match": {
            "category": "Horror"
          }
        }
      ]
    }
  }
}

The loop has generated a match filter for each item in the categories array, resulting in a more flexible and efficient search request that filters by multiple categories. By using the loops, you can generate dynamic filters or queries based on the values passed in the search request, making your search requests more flexible and powerful.

Advantages of using search templates

The following are key advantages of using search templates:

Maintainability – By separating the query definition from the application code, search templates make it straightforward to manage changes to the query or tune search relevancy. You don’t have to compile and redeploy your application.
Consistency – You can construct search templates that allow you to design standardized query patterns and reuse them throughout your application, which can help maintain consistency across your queries.
Readability – Because templates can be constructed using a more terse and expressive syntax, complicated queries are straightforward to test and debug.
Testing – Search templates can be tested and debugged independently of the application code, facilitating simpler problem-solving and relevancy tuning without having to re-deploy the application. You can easily create A/B testing with different templates for the same search.
Flexibility – Search templates can be quickly updated or adjusted to account for modifications to the data or search specifications.

Best practices

Consider the following best practices when using search templates:

Before deploying your template to production, make sure it is fully tested. You can test the effectiveness and correctness of your template with example data. It is highly recommended to run the application tests that use these templates before publishing.
Search templates allow for the addition of input parameters, which you can use to modify the query to suit the needs of a particular use case. Reusing the same template with varied inputs is made simpler by parameterizing the inputs.
Manage the templates in an external source control system.
Avoid hard-coding values inside the query—instead, use defaults.

Conclusion

In this post, you learned the basics of search templates, a powerful feature of OpenSearch, and how templates help streamline search queries and improve performance. With search templates, you can build more robust search applications in less time.

If you have feedback about this post, submit it in the comments section. If you have questions about this post, start a new thread on the Amazon OpenSearch Service forum or contact AWS Support.

Stay tuned for more exciting updates and new features in OpenSearch Service.

About the authors

Arun Lakshmanan is a Search Specialist with Amazon OpenSearch Service based out of Chicago, IL. He has over 20 years of experience working with enterprise customers and startups. He loves to travel and spend quality time with his family.

Madhan Kumar Baskaran works as a Search Engineer at AWS, specializing in Amazon OpenSearch Service. His primary focus involves assisting customers in constructing scalable search applications and analytics solutions. Based in Bengaluru, India, Madhan has a keen interest in data engineering and DevOps.

Working with percolators in Amazon OpenSearch Service

2023-04-27 Arun Lakshmanan

Post Syndicated from Arun Lakshmanan original https://aws.amazon.com/blogs/big-data/working-with-percolators-in-amazon-opensearch-service/

Amazon OpenSearch Service is a managed service that makes it easy to secure, deploy, and operate OpenSearch and legacy Elasticsearch clusters at scale in the AWS Cloud. Amazon OpenSearch Service provisions all the resources for your cluster, launches it, and automatically detects and replaces failed nodes, reducing the overhead of self-managed infrastructures. The service makes it easy for you to perform interactive log analytics, real-time application monitoring, website searches, and more by offering the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions). Amazon OpenSearch Service now offers a serverless deployment option (public preview) that makes it even easier to use OpenSearch in the AWS cloud.

A typical workflow for OpenSearch is to store documents (as JSON data) in an index, and execute searches (also JSON) to find those documents. Percolation reverses that. You store searches and query with documents. Let’s say I’m searching for a house in Chicago that costs < 500K. I could go to the website every day and run my query. A clever website would be able to store my requirements (a query) and notify me when something new (a document) comes up that matches my requirements. Percolation is an OpenSearch feature that enables the website to store these queries and run documents against them to find new matches.

In this post, We will explore how to use percolators to find matching homes from new listings.

Before getting into the details of percolators, let’s explore how search works. When you insert a document, OpenSearch maintains an internal data structure called the “inverted index” which speeds up the search.

Indexing and Searching:

Let’s take the above example of a real estate application having the simple schema of type of the house, city, and the price.

First, let’s create an index with mappings as below

PUT realestate
{
     "mappings": {
        "properties": {
           "house_type": { "type": "keyword"},
           "city": { "type": "keyword" },
           "price": { "type": "long" }
         }
    }
}

Let’s insert some documents into the index.

ID	House_type	City	Price
1	townhouse	Chicago	650000
2	house	Washington	420000
3	condo	Chicago	580000

POST realestate/_bulk 
{ "index" : { "_id": "1" } } 
{ "house_type" : "townhouse", "city" : "Chicago", "price": 650000 }
{ "index" : { "_id": "2" } }
{ "house_type" : "house", "city" : "Washington", "price": 420000 }
{ "index" : { "_id": "3"} }
{ "house_type" : "condo", "city" : "Chicago", "price": 580000 }

As we don’t have any townhouses listed in Chicago for less than 500K, the below query returns no results.

GET realestate/_search
{
  "query": {
    "bool": {
      "filter": [ 
        { "term": { "city": "Chicago" } },
        { "term": { "house_type": "townhouse" } },
        { "range": { "price": { "lte": 500000 } } }
      ]
    }
  }
}

If you’re curious to know how search works under the hood at high level, you can refer to this article.

Percolation:

If one of your customers wants to get notified when a townhouse in Chicago is available, and listed at less than $500,000, you can store their requirements as a query in the percolator index. When a new listing becomes available, you can run that listing against the percolator index with a _percolate query. The query will return all matches (each match is a single set of requirements from one user) for that new listing. You can then notify each user that a new listing is available that fits their requirements. This process is called percolation in OpenSearch.

OpenSearch has a dedicated data type called “percolator” that allows you to store queries.

Let’s create a percolator index with the same mapping, with additional fields for query and optional metadata. Make sure you include all the necessary fields that are part of a stored query. In our case, along with the actual fields and query, we capture the customer_id and priority to send notifications.

PUT realestate-percolator-queries
{
  "mappings": {
    "properties": {
      "user": {
         "properties": {
            "query": { "type": "percolator" },
            "id": { "type": "keyword" },
            "priority":{ "type": "keyword" }
         }
      },
      "house_type": {"type": "keyword"},
      "city": {"type": "keyword"},
      "price": {"type": "long"}
    }
  }
}

After creating the index, insert a query as below

POST realestate-percolator-queries/_doc/chicago-house-alert-500k
{
  "user" : {
     "id": "CUST101",
     "priority": "high",
     "query": {
        "bool": {
           "filter": [ 
                { "term": { "city": "Chicago" } },
                { "term": { "house_type": "townhouse" } },
                { "range": { "price": { "lte": 500000 } } }
            ]
        }
      }
   }
}

The percolation begins when a new document gets run against the stored queries.

{"city": "Chicago", "house_type": "townhouse", "price": 350000}
{"city": "Dallas", "house_type": "house", "price": 500000}

Run the percolation query with document(s), and it matches the stored query

GET realestate-percolator-queries/_search
{
  "query": {
     "percolate": {
        "field": "user.query",
        "documents": [ 
           {"city": "Chicago", "house_type": "townhouse", "price": 350000 },
           {"city": "Dallas", "house_type": "house", "price": 500000}
        ]
      }
   }
}

The above query returns the queries along with the metadata we stored (customer_id in our case) that matches the documents

{
    "took" : 11,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
     },
     "hits" : {
        "total" : {
           "value" : 1,
           "relation" : "eq"
         },
         "max_score" : 0.0,
         "hits" : [ 
         {
              "_index" : "realestate-percolator-queries",
              "_id" : "chicago-house-alert-500k",
              "_score" : 0.0,
              "_source" : {
                   "user" : {
                       "id" : "CUST101",
                       "priority" : "high",
                       "query" : {
                            "bool" : {
                                 "filter" : [ 
                                      { "term" : { "city" : "Chicago" } },
                                      { "term" : { "house_type" : "townhouse" } },
                                      { "range" : { "price" : { "lte" : 500000 } } }
                                 ]
                              }
                        }
                  }
            },
            "fields" : {
                "_percolator_document_slot" : [0]
            }
        }
     ]
   }
}

Percolation at scale

When you have a high volume of queries stored in the percolator index, searching queries across the index might be inefficient. You can consider segmenting your queries and use them as filters to handle the high-volume queries effectively. As we already capture priority, you can now run percolation with filters on priority that reduces the scope of matching queries.

GET realestate-percolator-queries/_search
{
    "query": {
        "bool": {
            "must": [ 
             {
                  "percolate": {
                      "field": "user.query",
                      "documents": [ 
                          { "city": "Chicago", "house_type": "townhouse", "price": 35000 },
                          { "city": "Dallas", "house_type": "house", "price": 500000 }
                       ]
                  }
              }
          ],
          "filter": [ 
                  { "term": { "user.priority": "high" } }
            ]
       }
    }
}

Best practices

Prefer the percolation index separate from the document index. Different index configurations, like number of shards on percolation index, can be tuned independently for performance.
Prefer using query filters to reduce matching queries to percolate from percolation index.
Consider using a buffer in your ingestion pipeline for reasons below,
1. You can batch the ingestion and percolation independently to suit your workload and SLA
2. You can prioritize the ingest and search traffic by running the percolation at off hours. Make sure that you have enough storage in the buffering layer.

Consider using an independent cluster for percolation for the below reasons,
1. The percolation process relies on memory and compute, your primary search will not be impacted.
2. You have the flexibility of scaling the clusters independently.

Conclusion

In this post, we walked through how percolation in OpenSearch works, and how to use effectively, at scale. Percolation works in both managed and serverless versions of OpenSearch. You can follow the best practices to analyze and arrange data in an index, as it is important for a snappy search performance.

If you have feedback about this post, submit your comments in the comments section.

Noise

All posts by Arun Lakshmanan

Simplify your query management with search templates in Amazon OpenSearch Service

Search templates

Default values in search templates

Conditions in search templates

Loops in search templates

Advantages of using search templates

Best practices

Conclusion

About the authors

Working with percolators in Amazon OpenSearch Service

Indexing and Searching:

Percolation:

Percolation at scale

Best practices

Conclusion

About the author

The collective thoughts of the interwebz