Viraj Phanse

Post Syndicated from Viraj Phanse original https://aws.amazon.com/blogs/big-data/using-pipes-to-explore-discover-and-find-data-in-amazon-es-with-piped-processing-language/

System developers, DevOps engineers, support engineers, site reliability engineers (SREs), and IT managers make sure that the underlying infrastructure powering the applications and systems within an organization is available, reliable, secure, and scalable. To achieve these goals, you need to perform a fast and deep analysis on the underlying logs, monitoring, and observability data. Amazon Elasticsearch Service (Amazon ES) is a popular choice to store and analyze such data. However, extracting insights from Elasticsearch isn’t easy. Although Query DSL (the language used to query data stored in Elasticsearch) is powerful, it has a steep learning curve, and wasn’t designed as a human interface to easily create one-time queries and explore user data.

In this post, we discuss the newly supported Piped Processing Language (PPL) feature, powered by Open Distro for Elasticsearch, which enables you to form complex queries and quickly explore and discover data with the help of pipes.

What is Piped Processing Language?

Piped Processing Language is powered by Open Distro for Elasticsearch, an Apache 2.0-licensed distribution of Elasticsearch. PPL enables you to explore, discover, and find data stored in Elasticsearch, using a set of commands delimited by pipes ( | ).

Pipes allow you to combine two or more commands as a chain, such that the output of one command acts as an input for the next command, very similar to Unix pipes. With PPL, you can now search for keywords and feed the results from the command on the left of the pipe to the command on the right of the pipe, effectively creating a command pipeline.

Use case

As an illustration, consider a use case where you want to find out the number of hosts that are responding with HTTP 404 (Page not found) and HTTP 503 (Server Unavailability) errors, aggregate the error responses per host, and sort in the order of impact.

Using Query DSL

When you use Query DSL, the query looks similar to the following code:

GET kibana_sample_data_logs/_search
{"from":0,"size":0,"timeout":"1m","query":{"bool":{"should":[{"term":{"response.keyword":{"value":"404","boost":1}}},{"term":{"response.keyword":{"value":"503","boost":1}}}],"adjust_pure_negative":true,"boost":1}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"host":{"terms":{"field":"host.keyword","missing_bucket":true,"order":"asc"}}},{"response":{"terms":{"field":"response.keyword","missing_bucket":true,"order":"asc"}}}]},"aggregations":{"request_count":{"value_count":{"field":"request.keyword"}},"sales_bucket_sort":{"bucket_sort":{"sort":[{"request_count":{"order":"desc"}}],"size":10}}}}}}

The following screenshot shows the query results.

Using PPL

You can replace the entire DSL query with a single PPL command:

source = kibana_sample_data_logs | where response='404' or response='503' | stats count(request) as request_count by host, response | sort -request_count

The following screenshot shows the query results.

Commands and functions supported by PPL

PPL supports a comprehensive set of commands, including search, where, field, rename, dedup, sort, stats, eval, head, top, and rare. These commands are read-only requests to process data and return results. The following table summarizes the purpose of each command.

Command	What does it do?	Example	Result
`search source`	Retrieves documents from the index. The keyword search can be ignored.	`source=accounts;`	Retrieves all documents from the accounts index.
`field`	Keeps or removes fields from the search result.	`source=accounts \| fields account_number`, `firstname`, `lastname`;	Gets account_number, firstname, and lastname fields from the search result.
`dedup`	Removes duplicate documents defined by a field from the search result.	`source=accounts \| dedup gender \| fields account_number`, `gender`;	Removes duplicate documents with the same gender.
`stats`	Aggregates the search results using `sum`, `count`, `min`, `max`, and `avg`.	`source=accounts \| stats avg(age)`;	Calculates the average age of all accounts.
`eval`	Evaluates an expression and appends its result to the search result.	`search source=accounts \| eval doubleAge = age * 2 \| fields age`, `doubleAge`;	Creates a new doubleAge field for each document that is age * 2.
`head`	Returns the first N number of results in a specified search order.	`search source=accounts \| fields firstname, age \| head`;	Fetches the first 10 results.
`top`	Finds the most common values of all fields in the field list.	`search source=accounts \| top gender`;	Finds the most common value of gender.
`rare`	Finds the least common values of all fields in a field list.	`search source=accounts \| rare gender`;	Finds the least common value of gender.
`where`	Filters the search result.	`search source=accounts \| where account_number=1` or `gender="F" \| fields account_number, gender`;	Gets all the documents from the account index.
`rename`	Renames one or more fields in a search result.	`search source=accounts \| rename account_number as an \| fields acc`;	Renames the account field as acc.
`sort`	Sorts results in a specified field.	`search source=accounts \| sort age \| fields account_number, age`;	Sorts all documents by age field in ascending order.

PPL also supports functions including date-time, mathematical, string, aggregate, and trigonometric, and operators and expressions.

Summary

Piped Processing Language, powered by Open Distro for Elasticsearch, has a comprehensive set of commands and functions that enable you to quickly begin extracting insights from your data in Elasticsearch. It’s supported on all Amazon ES domains running Elasticsearch 7.9 or greater. PPL also expands the capabilities of the Query Workbench in Kibana in addition to SQL. For more information, see Piped Processing Language.

About the Author

Viraj Phanse is a product management leader at Amazon Web Services for Search Services/Analytics. An avid foodie, he loves trying cuisines from around the globe. In his free time, he loves to play his keyboard and travel.

Features overview

The following is a brief overview of the features of Open Distro for Elasticsearch SQL Engine on Amazon ES:

Query tools

SQL Workbench – A comprehensive and integrated visual tool to run on-demand SQL queries, translate SQL into its REST equivalent, and view and save results as text, JSON, JDBC, or CSV. The following screenshot shows a query on the SQL Workbench page.

SQL CLI – An interactive, standalone command line tool to run on-demand SQL queries, translate SQL into its REST equivalent, and view and save results as text, JSON, JDBC, or CSV. For following screenshot shows a query on the CLI.

Connectors and drivers

ODBC driver – The Open Database Connectivity (ODBC) driver enables connecting with business intelligence (BI) applications such as Tableau and exporting data to CSV and JSON.
JDBC driver – The Java Database Connectivity (JDBC) driver also allows you to connect with BI applications such as Tableau and export data to CSV and JSON.

Query support

Basic queries – You can use the SELECT clause, along with FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT to search and aggregate data.
Complex queries – You can perform complex queries such as subquery, join, and union on more than one Elasticsearch index.
Metadata queries – You can query basic metadata about Elasticsearch indexes using the SHOW and DESCRIBE commands.

Delete support

Delete – You can delete all the documents or documents that satisfy predicates in the WHERE clause from search results. However, it doesn’t delete documents from the actual Elasticsearch index.

JSON and full-text search support

JSON – Support for JSON by following PartiQL specification, a SQL-compatible query language, lets you query semi-structured and nested data for any data format.
Full-text search support – Full-text search on millions of documents is possible by letting you specify the full range of search options using SQL commands such as match and score.

Functions and operator support

Functions and operators – Support for string functions and operators, numeric functions and operators, and date-time functions is possible by enabling fielddata in the document mapping.

Settings

Settings – You can view, configure, and modify settings to control the behavior of SQL without needing to restart or bounce the Elasticsearch cluster.

Interfaces

Endpoints – The explain endpoint allows translating SQL into Query DSL, and the cursor helps obtain a paginated response for the SQL query result.

Monitoring

Monitoring – You can obtain node-level statistics by using the stats endpoint.

Request and response protocols

Request protocol – HTTP POST is used as a request protocol.
Response protocols – You can use response protocols such as JDBC, CSV, Elastic DSL, and Raw format.

Conclusion

Open Distro for Elasticsearch SQL Engine on Amazon ES provides a comprehensive, flexible, and user-friendly set of features to obtain search results from Amazon ES in a declarative manner using SQL. For more information about querying with SQL, see SQL Support for Amazon Elasticsearch Service.

About the Author

Viraj Phanse (@vrphanse) is a product management leader at Amazon Web Services for Search Services/Analytics. Prior to AWS, he was in product management/strategy and go-to-market leadership roles at Oracle, Aerospike, INSZoom and Persistent Systems. He is a Fellow and Selection Committee member at Berkeley Angel Network, and a Big Data Advisory Board Member at San Francisco State University. He has completed his M.S. in Computer Science from UCLA and MBA from UC Berkeley’s Haas School of Business.

Noise

All posts by Viraj Phanse

Using pipes to explore, discover and find data in Amazon ES with Piped Processing Language

What is Piped Processing Language?

Use case

Using Query DSL

Using PPL

Commands and functions supported by PPL

Summary

About the Author

Power data analytics, monitoring, and search use cases with the Open Distro for Elasticsearch SQL Engine on Amazon ES

Features overview

Conclusion

About the Author

The collective thoughts of the interwebz