Tag Archives: Amazon OpenSearch Service

Custom packages and hot reload of dictionary files with Amazon OpenSearch Service

Post Syndicated from Sonam Chaudhary original https://aws.amazon.com/blogs/big-data/custom-packages-and-hot-reload-of-dictionary-files-with-amazon-opensearch-service/

Amazon OpenSearch Service is a fully managed service that you can use to deploy and operate OpenSearch clusters cost-effectively at scale in the AWS Cloud. The service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more by offering the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions).

There are various use cases such as website search, ecommerce search, and enterprise search where the user wants to get relevant content for specific terms. Search engines match the terms (words) sent through the query API. When there are many different ways of specifying the same concept, you use synonyms to give the search engine more match targets than what the user entered.

Similarly, there are certain use cases where input data has a lot of common or frequently occurring words that don’t add much relevance when used in a search query. These include words like “the,” “this,” and “that.” These can be classified as stopwords.

OpenSearch Service allows you to upload custom dictionary files, which can include synonyms and stopwords to be customized to your use case. This is especially useful for use cases where you want to do the following:

  • Specify words that can be treated as equivalent. For example, you can specify that words such as “bread,” “danish,” and “croissant” be treated as synonymous. This leads to better search results because instead of returning a null result if an exact match isn’t found, an approximately relevant or equivalent result is returned.
  • Ignore certain high frequency terms that are common and lack useful information in terms of contributing to the search’s relevance store. These could include “a,” “the,” “of,” “an,” and so on.

Specifying stems, synonyms, and stopwords can greatly help with query accuracy, and allows you to customize and enhance query relevance. They can also help with stemming (such as in the Japanese (kuromoji) Analysis Plugin). Stemming is reducing a word to its root form. For Example, “cooking” and “cooked” can be stemmed to the same root word “cook.” This way, any variants of a word can be stemmed to one root word to enhance the query results.

In this post, we show how we can add custom packages for synonyms and stopwords to an OpenSearch Service domain. We start by creating custom packages for synonyms and stopwords and creating a custom analyzer for a sample index that uses the standard tokenizer and a synonym token filter, followed by a demonstration of hot reload of dictionary files.

Tokenizers and token filters

Tokenizers break streams of characters into tokens (typically words) based on some set of rules. The simplest example is the whitespace tokenizer, which breaks the preceding characters into a token each time it encounters a whitespace character. A more complex example is the standard tokenizer, which uses a set of grammar-based rules to work across many languages.

Token filters add, modify, or delete tokens. For example, a synonym token filter adds tokens when it finds a word in the synonyms list. The stop token filter removes tokens when finds a word in the stopwords list.

Prerequisites

For this demo, you must have an OpenSearch Service cluster (version 1.2) running. You can use this feature on any version of OpenSearch Service running 7.8+.

Users without administrator access require certain AWS Identity and Access Management (IAM) actions in order to manage packages: es:CreatePackage, es:DeletePackage, es:AssociatePackage, and es:DissociatePackage. The user also needs permissions on the Amazon Simple Storage Service (Amazon S3) bucket path or object where the custom package resides. Grant all permission within IAM, not in the domain access policy. This allows for better management of permissions because any change in permissions can be separate from the domain and allows the user to perform the same action across multiple OpenSearch Service domains (if needed).

Set up the custom packages

To set up the solution, complete the following steps:

  1. On the Amazon S3 console, create a bucket to hold the custom packages.
  2. Upload the files with the stopwords and synonyms to this bucket. For this post, the file contents are as follows:
    1. synonyms.txt:
      pasta, penne, ravioli 
      ice cream, gelato, frozen custard
      danish, croissant, pastry, bread

    2. stopwords.txt:
      the
      a
      an
      of

      The following screenshot shows the uploaded files:

Now we import our packages and associate them with a domain.

  1. On the OpenSearch Service console, choose Packages in the navigation pane.
  2. Choose Import package.
  3. Enter a name for your package (for the synonym package, we use my-custom-synonym-package) and optional description.
  4. For Package source, enter the S3 location where synonyms.txt is stored.
  5. Choose Submit.
  6. Repeat these steps to create a package with stopwords.txt.
  7. Choose your synonym package when its status shows as Available.
  8. Choose Associate to a domain.
  9. Select your OpenSearch Service domain, then choose Associate.
  10. Repeat these steps to associate your OpenSearch Service domain to the stopwords package.
  11. When the packages are available, note their IDs.

You use analyzers/id as the file path in your requests to OpenSearch.

Use the custom packages with your data

After you associate a file with a domain, you can use it in parameters such as synonyms_path and stopwords_path when you create tokenizers and token filters. For more information, see OpenSearch Service.

You can create a new index (my-index-test) using the following snippet in the OpenSearch Service domain and specify the Analyzers/id values for the synonyms and stopwords packages.

  1. Open OpenSearch Dashboards.
  2. On the Home menu, choose Dev Tools.
  3. Enter the following code in the left pane:
    PUT my-index-test
    {
      "settings": {
        "index": {
          "analysis": {
            "analyzer": {
              "my_analyzer": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": ["my_stop_filter" , "my_synonym_filter"]
              }
            },
            "filter": {
              "my_stop_filter": {
                "type": "stop",
                "stopwords_path": "analyzers/Fxxxxxxxxx",
                "updateable": true
              },
              "my_synonym_filter": {
                "type": "synonym",
                "synonyms_path": "analyzers/Fxxxxxxxxx",
                "updateable": true
              }
                
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "description": {
            "type": "text",
            "analyzer": "standard",
            "search_analyzer": "my_analyzer"
          }
        }
      }
    }

  4. Choose the play sign to send the request to create the index with our custom synonyms and stopwords.

The following screenshot shows our results.

This request creates a custom analyzer for my index that uses the standard tokenizer and a synonym and stop token filter. This request also adds a text field (description) to the mapping and tells OpenSearch to use the new analyzer as its search analyzer. It still uses the standard analyzer as its index analyzer.

Note the line "updateable": true in the token filter. This field only applies to search analyzers, not index analyzers, and is critical if you later want to update the search analyzer automatically.

Let’s start by adding some sample data to the index my-index-test:

POST _bulk
{ "index": { "_index": "my-index-test", "_id": "1" } }
{ "description": "pasta" }
{ "index": { "_index": "my-index-test", "_id": "2" } }
{ "description": "the bread" }
{ "index": { "_index": "my-index-test", "_id": "3" } }
{ "description": "ice cream" }
{ "index": { "_index": "my-index-test", "_id": "4" } }
{ "description": "croissant" }

Now If you search for the words you specified in the synonyms.txt file, you get the required results. Note that my test index only has pasta in the indexed data, but because I specified “ravioli” as a synonym for “pasta” in my associated package, I get the results for all documents that have the word “pasta” when I search for “ravioli.”

GET my-index-test/_search
{
  "query": {
    "match": {
      "description": "ravioli"
    }
  }
}


Similarly, you can use the stopwords feature to specify common words that can be filtered out while showing search results and don’t impact the relevance much while returning search query results.

Hot reload

Now let’s say you want to add another synonym (“spaghetti”) for “pasta.”

  1. The first step is to update the synonyms.txt file as follows and upload this updated file to your S3 bucket:
    pasta , penne , ravioli, spaghetti
    ice cream, gelato, frozen custard
    danish, croissant, pastry , bread


    Uploading a new version of a package to Amazon S3 doesn’t automatically update the package on OpenSearch Service. OpenSearch Service stores its own copy of the file, so if you upload a new version to Amazon S3, you must manually update it in OpenSearch Service.

    If you try to run the search query against the index for the term “spaghetti” at this point, you don’t get any results:

    GET my-index-test/_search
    {
      "query": {
        "match": {
          "description": "spaghetti"
        }
      }
    }

    After the file is modified in Amazon S3, update the package in OpenSearch Service, then apply the update. To do this, perform the following steps:

  2. On the OpenSearch Service console, choose Packages.
  3. Choose the package you created for custom synonyms and choose Update.
  4. Provide the S3 path to the file, then choose Update package.
  5. Enter a description and choose Update package.

    You return to the Packages page.
  6. When the package status shows as Available, choose it and wait for the associated domain to show as updated.
  7. Select the domain and choose Apply update.
  8. Choose Apply update again to confirm.

Wait for the association status to change to Active to confirm that the package version is also updated.

If your domain runs Elasticsearch 7.7 or earlier, uses index analyzers, or doesn’t use the updateable field, and if you want to add some additional synonyms at a later time, you have to reindex your data with the new dictionary file. Previously, on Amazon Elasticsearch Service, these analyzers could only process data as it was indexed.

If your domains runs OpenSearch Service or Amazon Elasticsearch Service 7.8 or later and only uses search analyzers with the updateable field set to true, you don’t need to take any further action. OpenSearch Service automatically updates your indexes using the _plugins/_refresh_search_analyzers API. This allows for refresh of search analyzers in real time without you needing to close and reopen the index.

This feature called hot reload provides the ability to reload dictionary files without reindexing your data. With the new hot reload capability, you can call analyzers at search time, and your dictionary files augment the query. This feature also lets you version your dictionary files in OpenSearch Service and update them on your domains, without having to reindex your data.

Because the domain used in this demonstration runs OpenSearch Service 1.2, you can utilize this hot reload feature and without re-indexing of any data. Simply run a search query for the newly added synonym (“spaghetti”) and get all resultant documents that are synonymous to it:

GET my-index-test/_search
{
  "query": {
    "match": {
      "description": "spaghetti"
    }
  }
}

Conclusion

In this post, we showed how easy it is to set up synonyms in OpenSearch Service so you can find the relevant documents that match a synonym for a word, even when the specific word isn’t used as search term. We also demonstrated how to add and update existing synonym dictionaries and load those files to reflect the changes.

If you have feedback about this post, submit your comments in the comments section. You can also start a new thread on the OpenSearch Service forum or contact AWS Support with questions.


About the Authors

Sonam Chaudhary is a Solutions Architect and Big Data and Analytics Specialist at AWS. She works with customers to build scalable, highly available, cost-effective, and secure solutions in the AWS Cloud. In her free time, she likes traveling with her husband, shopping, and watching movies.

Prashant Agrawal is a Search Specialist Solutions Architect with OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining AWS, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.

Building SAML federation for Amazon OpenSearch Dashboards with Ping Identity

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-ping-identity/

Amazon OpenSearch is an open search and log analytics service, powered by the Apache Lucene search library.

In this blog post, we provide step-by-step guidance for SP-initiated SSO by showing how to set up a trial Ping Identity account. We’ll show how to build users and groups within your organization’s directory and enable SSO in OpenSearch Dashboards.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or the internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in.

Ping Identity is an AWS Competency Partner, and the provider of the PingOne Cloud Platform is a multi-tenant Identity-as-a-Service (IDaaS) platform. Ping Identity supports both service provider (SP)-initiated and identity provider (IdP)-initiated SSO.

Overview of Ping Identity SAML authenticated solution

Figure 1 shows a sample architecture of a generic integrated solution between Ping Identity and OpenSearch Dashboards over SAML authentication.

SAML transactions between Amazon OpenSearch and Ping Identity

Figure 1. SAML transactions between Amazon OpenSearch and Ping Identity

The sign-in flow is as follows:

  1. User opens browser window and navigates to Amazon OpenSearch Dashboards
  2. Amazon OpenSearch generates SAML authentication request
  3. Amazon OpenSearch redirects request back to browser
  4. Browser redirects to Ping Identity URL
  5. Ping Identity parses SAML request, authenticates user, and generates SAML response
  6. Ping Identity returns encoded SAML response to browser
  7. Browser sends SAML response back to Amazon OpenSearch Assertion Consumer Service (ACS) URL
  8. ACS verifies SAML response
  9. User logs into Amazon OpenSearch domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account
  2. A virtual private cloud (VPC)-based Amazon OpenSearch domain with fine-grained access control enabled
  3. Ping Identity account with user and a group
  4. A browser with network connectivity to Ping Identity, Amazon OpenSearch domain, and Amazon OpenSearch Dashboards.

The steps in this post are structured into the following sections:

  1. Identity provider (Ping Identity) setup
  2. Prepare Amazon OpenSearch for SAML configuration
  3. Identity provider (Ping Identity) SAML configuration
  4. Finish Amazon OpenSearch for SAML configuration
  5. Validation
  6. Cleanup

Identity provider (Ping Identity) setup

Step 1: Sign up for a Ping Identity account

  • Sign up for a Ping Identity account, then click on the Sign up button to complete your account setup.
  • If you already have an account with Ping Identity, login to your Ping Identity account.

Step 2: Create Population in Ping Identity

  • Choose Identities in the left menu and click Populations to proceed.
  • Click on the blue + button next to Populations, enter the name as IT, then click on the Save button (see Figure 2).
Creating population in Ping Identity

Figure 2. Creating population in Ping Identity

Step 3: Create a group in Ping Identity

  • Choose Groups from the left menu and click on the blue + button next to Groups. For this example, we will create a group called opensearch for Kibana access. Click on the Save button to complete the group creation.

Step 4: Create users in Ping Identity

  • Choose Users in left menu, then click the + Add User button.
  • Provide GIVEN NAME, FAMILY NAME, EMAIL ADDRESS, and choose Population as users, as created in Step 1. Choose your own USERNAME. Click on the SAVE button to create your user.
  • Add more users as needed.

Step 5: Assign role and group to users

  • Click on Identities/users in the left menu, and click on Users. Then click on the edit button for a particular user, as shown in Figure 3.
Assigning roles and groups to users in Ping Identity

Figure 3. Assigning roles and groups to users in Ping Identity

  • Click on the Edit button, click on + Add Role button, and click on the edit button to assign a role to the user.
  • For this example, choose Environment Admin, as shown in Figure 4. You can choose different roles depending on your use case.
Assigning roles to users in Ping Identity

Figure 4. Assigning roles to users in Ping Identity

  • For this example, assign administrator responsibilities for our users. Click on Show Environments, and drag Administrators into the ADDED RESPONSIBILITES section. Then click on the Add Role button.
  • Add Group to users. Go to the Groups tab, search for the opensearch group created in Step 3. Click on the + button next to opensearch to add into group memberships.

Prepare Amazon OpenSearch for SAML configuration

Once the Amazon OpenSearch domain is up and running, we can proceed with configuration.

  • Under Actions, choose Edit security configuration, as shown in Figure 5.
Enabling Amazon OpenSearch security configuration for SAML

Figure 5. Enabling Amazon OpenSearch security configuration for SAML

  • Under SAML authentication for OpenSearch Dashboards/Kibana, select Enable SAML authentication check box (Figure 6). When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.
Amazon OpenSearch URLs for SAML configuration

Figure 6. Amazon OpenSearch URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL as highlighted in Figure 6 for Ping Identity SAML configuration. We will complete the rest of the Amazon OpenSearch SAML configuration after the Ping Identity SAML configuration.

Ping Identity SAML configuration

Go back to PingIdentity.com, and navigate to Connections on the left menu. Then select Applications, and click on Application +.

  • For this example, we are creating an application called “Kibana”
  • Select WEB APP as APPLICATION TYPE and CHOOSE CONNECTION TYPE as SAML, and click on Configure button to proceed as shown in Figure 7.
Configuring a new application in Ping Identity

Figure 7. Configuring a new application in Ping Identity

  • On the “Create App Profile” page, click on the Next button, and choose the “Manually Enter” option for PROVIDE APP METADATA. Enter the following under Configure SAML Connection section
    • ACS URL https://vpc-XXXXX-XXXXX-west-2.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
    • Choose Sign Assertion & Response under SIGNING KEY
    • ENTITY ID: https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com (Service provider entity ID)
    • ASSERTION VALIDITY DURATION (IN SECONDS) as 3600
    • Choose default options, then click on the Save and Continue button as shown in Figure 8
Configuring SAML connection in Ping Identity

Figure 8. Configuring SAML connection in Ping Identity

  • Enter the following under Configure Attribute Mapping, then click on Save and Close.
    • Set User ID to default
    • Click on +ADD ATTRIBUTE button to add following SAML attributes
      • OUTGOING VALUE: Group Names, SAML ATTRIBUTE: saml_group
      • OUTGOING VALUE: Username, SAML ATTRIBUTE: saml_username
  • Select the Policies tab and click on edit icon on the right.
  • Add the Single_Factor policy to the application, then click on Save.
  • Select the Access tab, add the opensearch group to the application, then click on Save to complete SAML configuration.
  • Finally, go to the Configuration tab, click on the Download Metadata button to download the Ping Identity metadata for the Amazon OpenSearch SAML configuration. Enable opensearch SAML application (Figure 9).
Downloading metadata in Ping Identity

Figure 9. Downloading metadata in Ping Identity

Amazon OpenSearch SAML configuration

  • Switch back to Amazon OpenSearch domain:
    • Navigate to the Amazon OpenSearch console.
    • Click on Actions, then click on Modify Security configuration.
    • Select the Enable SAML authentication check box.
  • Under Import IdP metadata section:
    • Metadata from IdP: Import the Ping Identity identity provider metadata from the downloaded XML file, shown in Figure 10.
    • SAML master backend role: opensearch (Ping Identity group). Provide SAML backend role/group SAML assertion key for group SSO into Kibana.
Configuring Amazon OpenSearch SAML parameters

Figure 10. Configuring Amazon OpenSearch SAML parameters

  • Under Optional SAML settings:
    • Leave the Subject Key as saml_subject from Ping Identity SAML application attribute name.
    • Role key should be saml_group. You can view a sample assertion during the configuration process by tools like SAML-tracer. This can help you examine and troubleshoot the contents of real assertions.
    • Session time to live (mins): 60
  • Click on the Submit button to complete Amazon OpenSearch SAML configuration for Kibana. We have successfully completed SAML configuration and are now ready for testing.

Validating Access with Ping Identity Users

  • The OpenSearch Dashboards URL can be found in the Overview tab within “General Information” in the Amazon OpenSearch console (Figure 11). The first access to the OpenSearch Dashboards URL redirects you to the Ping Identity login screen.
Validating Ping Identity users access with Amazon OpenSearch

Figure 11. Validating Ping Identity users access with Amazon OpenSearch

  • If your OpenSearch domain is hosted within a private VPC, you will not be able to access OpenSearch Dashboards over public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch cluster and your identity provider.
  • You can create a Mac or Windows EC2 instance within the same VPC and access Amazon OpenSearch Dashboards from an EC2 instance’s web browser to validate your SAML configuration. Or you can access your Amazon OpenSearch Dashboards through Site-to-Site VPN if you are trying to access it from your on-premises environment.
  • Now copy and paste the OpenSearch Dashboards URL in your browser, and enter user credentials.
  • After successful login, you will be redirected into the OpenSearch Dashboards home page. Explore our sample data and visualizations in OpenSearch Dashboards, as shown in Figure 12.
SAML authenticated Amazon OpenSearch Dashboards

Figure 12. SAML authenticated Amazon OpenSearch Dashboards

  • You have successfully federated Amazon OpenSearch Dashboards with Ping Identity as an identity provider. You can connect OpenSearch Dashboards by using your Ping Identity credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created to avoid incurring future charges. Refer to these links:

Conclusion

In this blog post, we have demonstrated how to set up Ping Identity as an identity provider over SAML authentication for Amazon OpenSearch Dashboards access. With this solution, you now have an OpenSearch Dashboard that uses Ping Identity as the custom identity provider for your users. This reduces the customer login process to one set of credentials and improves employee productivity.

Get started by checking the Amazon OpenSearch Developer Guide, which provides guidance on how to build applications using Amazon OpenSearch for your operational analytics.

Building SAML federation for Amazon OpenSearch Service with Okta

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-okta/

Amazon OpenSearch Service is a fully managed open search and analytics service powered by the Apache Lucene search library. Security Assertion Markup Language (SAML)-based federation for OpenSearch Dashboards lets you use your existing identity provider (IdP) like Okta to provide single sign-on (SSO) for OpenSearch Dashboards on OpenSearch Service domains.

This post shows step-by-step guidance to enable SP-initiated single sign-on (SSO) into OpenSearch Dashboards using Okta.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or the internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in to OpenSearch Dashboards. SAML authentication for OpenSearch Dashboards is only for accessing OpenSearch Dashboards through a web browser.

Overview of Okta SAML authenticated solution

Figure 1 depicts a sample architecture of a generic, integrated solution between Okta and OpenSearch Dashboards over SAML authentication.

SAML transactions between Amazon OpenSearch Service and Okta

Figure 1. SAML transactions between Amazon OpenSearch Service and Okta

The initial sign-in flow is as follows:

  1. User opens browser window and navigates to OpenSearch Dashboards
  2. OpenSearch Service generates SAML authentication request
  3. OpenSearch Service redirects request back to browser
  4. Browser redirects to Okta URL
  5. Okta parses SAML request, authenticates user, and generates SAML response
  6. Okta returns encoded SAML response to browser
  7. Browser sends SAML response back to OpenSearch Service Assertion Consumer Services (ACS) URL
  8. ACS verifies SAML response
  9. User logs into OpenSearch Service domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account
  2. A virtual private cloud (VPC)-based OpenSearch Service domain with fine-grained access control enabled
  3. Okta account with user and a group
  4. A browser with network connectivity to Okta, OpenSearch Service domain, and OpenSearch Dashboards.

The steps in this post are structured into the following sections:

  1. Identity provider (Okta) setup
  2. Prepare OpenSearch Service for SAML configuration
  3. Identity provider (Okta) SAML configuration
  4. Finish OpenSearch Service for SAML configuration
  5. Validation
  6. Cleanup

Identity provider (Okta) setup

Step 1: Sign up for an Okta account

  • Sign up for an Okta account, then click on the Sign up button to complete your account setup.
  • If you already have an account with Okta, login to your Okta account.

Step 2: Create Groups in Okta

  • Choose Directory in the left menu and click Groups to proceed.
  • Click on Add Group and enter name as opensearch. Then click on the Save button, see Figure 2.
Creating a group in Okta

Figure 2. Creating a group in Okta

Step 3: Create users in Okta

  • Choose People in left menu under Directory section and click the +Add Person button.
  • Provide First name, Last name, username (email ID), and primary email. Then select set by admin from the Password dropdown, and choose first time password. Click on the Save button to create your user.
  • Add more users as needed.

Step 4: Assign Groups to users 

  • Choose Groups from the left menu, then click on the opensearch group created in Step 2. Click on the Assign People button to add users to the opensearch group. Next, either click on individual user under Person & Username, or use the Add All button to add all existing users to the opensearch group. Click on the Save button to complete adding users to your group.

Prepare OpenSearch Service for SAML configuration

Once OpenSearch Service domain is up and running, we can proceed with configuration.

  • Navigate to the OpenSearch Service console
  • Under Actions, choose Edit security configuration as shown in Figure 3
Enabling Amazon OpenSearch Service security configuration for SAML

Figure 3. Enabling Amazon OpenSearch Service security configuration for SAML

  • Under SAML authentication for OpenSearch Dashboards/Kibana, select the Enable SAML authentication check box, see Figure 4. When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.
Amazon OpenSearch Service URLs for SAML configuration

Figure 4. Amazon OpenSearch Service URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL (highlighted in Figure 4) for Okta SAML configuration. The OpenSearch Dashboards login flow can take one of two forms:

  • Service provider (SP) initiated: You navigate to your OpenSearch Dashboard (for example, https://my-domain.us-east-1.es.amazonaws.com/_dashboards), which redirects you to the login screen. After you log in, the identity provider redirects you to OpenSearch Dashboards.
  • Identity provider (IdP) initiated: You navigate to your identity provider, log in, and choose OpenSearch Dashboards from an application directory.

We will complete the rest of the OpenSearch Service SAML configuration after the Okta SAML configuration.

Okta SAML configuration

  • Go back to Okta.com, and choose Applications from the left menu. Click on Applications, then click on Create App Integration and choose SAML 2.0. Click on the Next button to proceed, as shown in Figure 5.
  • For this example, we are creating an application called “OpenSearch Dashboard”.
  • Select Platform as Web, and select Sign on method as SAML 2.0. Click on the Create button to proceed.
Creating a SAML app integration in Okta

Figure 5. Creating a SAML app integration in Okta

  • Enter the App name as OpenSearch, use default options, and click on the Next button to proceed.
  • Enter the following under the SAML Settings section, as shown in Figure 6. Click on the Next button to proceed.
    • Single Sign on URL = https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
    • Audience URI(SP Entity ID) = https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com (Service Provider entity ID)
    • Default RelayState = leave it blank
    • Name ID format = Select EmailAddress from drop down
    • Application username = Select Okta username from dropdown
    • Update application username on = leave it set to default
  • Enter the following under Attribute Statements (optional) section.
    • Name = http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress
    • Name format = Select URI Reference from dropdown
    • Value = user.email
  • Enter the following under the Group Attribute Statements (optional) section.
    • Name = http://schemas.xmlsoap.org/claims/Group
    • Name format = Select URI Reference from dropdown
    • Filter = Select Matches regex from dropdown and enter value as .*open.* to match the group created in previous steps for OpenSearch Dashboards access.
SAML configuration in Okta

Figure 6. SAML configuration in Okta

  • Select I’m a software vendor. I’d like to integrate my app with Okta under the Help Okta Support understand how you configured this application section.
  • Click on the Finish button to complete the Okta SAML application configuration.
  • Choose Sign on menu. Right click on the Identity Provider metadata hyperlink to download the Okta identity provider metadata as okta.xml. You will use this for the SAML configuration in OpenSearch Service, see Figure 7.
SAML configuration in Okta

Figure 7. Downloading Okta identity provider metadata for SAML configuration

  • Choose the Assignments menu and click on Assign-> Assign to Groups
  • Select the opensearch group, click on Assign, and click on the Done button to complete the Group assignment, as shown in Figure 8.
Assigning groups to the app in Okta

Figure 8. Assigning groups to the app in Okta

  • Switch back to the OpenSearch Service domain
  • Under the Import IdP metadata section:
    • Metadata from IdP: Import the Okta identity provider metadata from the downloaded XML file
    • SAML master backend role: opensearch (Okta group). Provide the SAML backend role/group SAML assertion key for group SSO into OpenSearch Dashboard.
  • Under Optional SAML settings:
    • Leave Subject Key blank
    • Role key should be http://schemas.xmlsoap.org/claims/Group. You can view a sample assertion during the configuration process with tools like SAML-tracer. This can help you examine and troubleshoot the contents of real assertions.
    • Session time to live (mins): 60
  • Click on the Save changes button (Figure 9) to complete OpenSearch Service SAML configuration for OpenSearch Dashboards. We have successfully completed SAML configuration, and now we are ready for testing.
Configuring Amazon OpenSearch Service SAML parameters

Figure 9. Configuring Amazon OpenSearch Service SAML parameters

Validating access with Okta users

  • Access the OpenSearch Dashboards endpoint from the previously created OpenSearch Service cluster. The OpenSearch Dashboards URL can be found in General information within “My Domains” of the OpenSearch Service console, as shown in Figure 10. The first access to OpenSearch Dashboards URL redirects you to the Okta login screen.
Validating Okta user access with Amazon OpenSearch Service

Figure 10. Validating Okta user access with Amazon OpenSearch Service

  • Now copy and paste the OpenSearch Dashboards URL in your browser, and enter the user credentials.
  • If your OpenSearch Service domain is hosted within a private VPC, you will not be able to access your OpenSearch Dashboard over public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch Service cluster and your identity provider.
  • You can create a Mac or Windows EC2 instance within the same VPC so that you can access Amazon OpenSearch Dashboard from EC2 instance’s web browser to validate your SAML configuration. Or you can access your OpenSearch Dashboard through Site-to-Site VPN from your on-premises environment.
  • After successful login, you will be redirected into the OpenSearch Dashboards home page. Here, you can explore our sample data and visualizations in OpenSearch Dashboards (Figure 11).
SAML authenticated OpenSearch dashboard

Figure 11. SAML authenticated OpenSearch Dashboards

  • Now, you have successfully federated OpenSearch Dashboards with Okta as an identity provider. You can connect OpenSearch Dashboards by using your Okta credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created, to avoid incurring future charges. Refer to these links:

Conclusion

In this blog post, we have demonstrated how to set up Okta as an identity provider over SAML authentication for OpenSearch Dashboards access. Get started by checking the Amazon OpenSearch Service Developer Guide, which provides guidance on how to build applications using OpenSearch Service.

Building SAML federation for Amazon OpenSearch Dashboards with Auth0

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-auth0/

Amazon OpenSearch is a fully managed, distributed, open search, and analytics service that is powered by the Apache Lucene search library. OpenSearch is derived from Elasticsearch 7.10.2, and is used for real-time application monitoring, log analytics, and website search. It’s ideal for use cases that require fast access and response for large volumes of data. OpenSearch Dashboards is derived from Kibana 7.10.2, and used for visual data exploration. With Security Assertion Markup Language (SAML)-based federation for OpenSearch, Dashboards lets you use your existing identity provider (IdP) like Auth0. You can use Auth0 to provide single sign-on (SSO) for OpenSearch Dashboards on Amazon OpenSearch search domains. It also gives you fine-grained access control, and the ability to search your data and build visualizations. Amazon OpenSearch supports providers that use the SAML 2.0 standard, such as Auth0, Okta, Keycloak, Active Directory Federation Services (AD FS), and Ping Identity (PingID).

In this post, we provide step-by-step guidance to show you how to set up a trial Auth0 account. We’ll demonstrate how to build users and groups within your organization’s directory, and enable SP-initiated single sign-on (SSO) into OpenSearch Dashboards.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or an internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in to the OpenSearch Dashboards. SAML authentication for OpenSearch Dashboards is only for accessing the OpenSearch Dashboards through a web browser. Your SAML credentials do not let you make direct HTTP requests to OpenSearch or OpenSearch Dashboards APIs.

Auth0 is an AWS Competency Partner and popular Identity-as-a-Service (IDaaS) solution. It supports both service provider (SP)-initiated and identity provider (IdP)-initiated SSO. For SP-initiated SSO, when you sign into the OpenSearch Dashboards login page it sends an authorization request to Auth0. Once it authenticates your identity, you are redirected to OpenSearch Dashboards. In IdP-initiated SSO, you log in to the Auth0 SSO page, and choose OpenSearch Dashboards to open the application.

Overview of AuthO SAML authenticated solution

Figure 1 depicts a sample architecture of a generic, integrated solution between Auth0 and OpenSearch Dashboards over SAML authentication.

High level flow of SAML transactions between Amazon OpenSearch and Auth0

Figure 1. A high-level view of a SAML transaction between Amazon OpenSearch and Auth0

The sign-in flow is as follows:

  1. User opens browser window and navigates to Amazon OpenSearch Dashboards
  2. Amazon OpenSearch generates SAML authentication request
  3. Amazon OpenSearch redirects request back to browser
  4. Browser redirects to Auth0 URL
  5. Auth0 parses SAML request, authenticates user, and generates SAML response
  6. Auth0 returns encoded SAML response to browser
  7. Browser sends SAML response back to Amazon OpenSearch Assertion Consumer Service (ACS) URL
  8. ACS verifies SAML response
  9. User logs into Amazon OpenSearch domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account
  2. A virtual private cloud (VPC) based Amazon OpenSearch domain with fine-grained access control enabled
  3. An Auth0 account with user and a group
  4. A browser with network connectivity to Auth0, Amazon OpenSearch domain, and Amazon OpenSearch Dashboards.

The steps in this post are structured into the following sections:

  1. Identity provider (Auth0) setup
  2. Prepare Amazon OpenSearch for SAML configuration
  3. Identity provider (Auth0) SAML configuration
  4. Finish Amazon OpenSearch for SAML configuration
  5. Validation
  6. Cleanup

Identity provider (Auth0) setup

Step 1: Sign up for an Auth0 account

  • Sign up for an Auth0 account, then click on the Sign up button to complete your account setup.
  • If you already have an account with Auth0, log in to your Auth0 account.

Step 2: Create Groups in Auth0

  • Choose User Management in the left menu and click Users, then click on the +Create User button.
  • Provide an email, password, and connection to your users. Click on the Create button to create your user.
  • Add more users to your Auth0 account.

Step 3: Install Auth0 Extension to create a group and assign users to the group

  • Click on Extensions in the left menu and search for “Auth0 Authorization”. Click on Auth0 Authorization to install the extension, shown in Figure 2.
The diagram depicts the Installing of Auth0 Authorization extension

Figure 2. Installing Auth0 Authorization extension

  • Use all default options and click on the Install button to install the extension.
  • Click on the Auth0 Authorization extension and choose the Accept button to provide access to your Auth0 account.
  • The Auth0 Authorization extension must be configured. Click on Go to Configuration (Figure 3).
The diagram depicts the configuration of Auth0 Authorization extension

Figure 3. Configuring the Auth0 Authorization extension

  • Rotate your API keys and check Groups, Roles, and Permissions to provide authorization to the extension and then click on PUBLISH RULE to complete the configuration, see Figure 4.
The diagram depicts the providing permissions to Auth0 Authorization extension

Figure 4. Providing the permissions to Auth0 Authorization extension

Step 4: Create a group in Auth0

  • Choose Groups from the left menu and click on the Create your first Group button. For this example, we will create a group called opensearch for OpenSearch Dashboards access.
  • Add your users to opensearch by clicking on ADD MEMBERS BUTTON, then click on the CONFIRM button to complete your group assignment (Figure 5).
The diagram depicts the adding users to Auth0 Group

Figure 5. Adding users to Auth0 Group

Step 5: Create an Auth0 Application

  • Choose Applications from the left menu. Click on the +Create Application button.
  • For this example, we are creating an application called “opensearch”.
  • Select Single Page Web Applications, then click on the CREATE button to proceed.
  • Click on the Addons tab on the application Kibana (Figure 6).
The diagram depicts the creation of Auth0 SAML application

Figure 6. Creating an Auth0 SAML application

  • Click on the SAML2 WEB APP, then select settings to provide SAML URLs from Amazon OpenSearch. We will configure these details after preparing the Amazon OpenSearch cluster for SAML.

Prepare Amazon OpenSearch for SAML configuration

Once the Amazon OpenSearch domain is up and running, we can proceed with configuration.

  • Under Actions, choose Edit security configuration (Figure 7).
The diagram depicts the enablement of OpenSearch security configuration for SAML

Figure 7. Enabling Amazon OpenSearch security configuration for SAML

  • Under SAML authentication for OpenSearch Dashboards/Kibana, select the Enable SAML authentication check box (Figure 8). When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.
The diagram depicts the Amazon OpenSearch URLs for SAML configuration

Figure 8. Amazon OpenSearch URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL (highlighted in Figure 8) for Auth0 SAML configuration. We will complete the rest of the Amazon OpenSearch SAML configuration after the Auth0 SAML configuration.

Auth0 SAML configuration

Go back to Auth0.com, and navigate to Applications from the left menu. Then select the opensearch application that you created as a part of the Auth0 setup.

  • Click on the Addons tab on the application opensearch.
  • Click on the SAML2 WEB APP, then select Settings to provide SAML URLs from Amazon OpenSearch, as shown in Figure 9:
    • Application Callback URL = https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
    • audience”: “https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com” (Service provider entity ID)
    • destination”: “ https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_plugin/kibana/_opendistro/_security/saml/acs” (SP-initiated SSO URL)
    • Mappings and other configurations shown in Figure 9

{
  "audience": "https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com",
  "destination": "https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_plugin/kibana/_opendistro/_security/saml/acs",
  "mappings":
  {
    "email":
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress",
    "name": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name",
    "groups": "http://schemas.xmlsoap.org/claims/Group"
  },
  "createUpnClaim": false,
  "passthroughClaimsWithNoMapping": false,
  "mapUnknownClaimsAsIs": false,
  "mapIdentities": false,
  "nameIdentifierFormat":
  "urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress", "nameIdentifierProbes": [
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" ]
}

The diagram depicts the configuration of Auth0 SAML parameters

Figure 9. Configuring Auth0 SAML parameters

  • Click on Enable to save the SAML configurations.
  • Go to the Usage tab, and click on the Download button to download Identity Provider Metadata, see Figure 10.
The diagram depicts the downloading of Auth0 identity provider meta data for SAML configuration

Figure 10. Downloading Auth0 identity provider metadata for SAML configuration

Amazon OpenSearch SAML configuration

  • Switch back to Amazon OpenSearch domain:
    • Navigate to Amazon OpenSearch console
    • Click on Actions, then click on Modify Security configuration
    • Select Enable SAML authentication check box
  • Under Import IdP metadata section (Figure 11):
    • Metadata from IdP: Import the Auth0 identity provider metadata from downloaded XML file
    • SAML master backend role: opensearch (Auth0 group). Provide a SAML backend role/group SAML assertion key for group SSO into Kibana
The diagram depicts the configuration of Amazon OpenSearch SAML parameters

Figure 11. Configuring Amazon OpenSearch SAML parameters

  • Under Optional SAML settings (Figure 12):
    • Leave Subject Key as blank, as Auth0 provides NameIdentifier
    • Role key should be http://schemas.xmlsoap.org/claims/Group. Auth0 lets you view a sample assertion during the configuration process by clicking on the DEBUG button on SAML2 WebApp. Tools like SAML-tracer can help you examine and troubleshoot the contents of real assertions.
    • Session time to live (mins): 60
The diagram depicts the configuration of Amazon OpenSearch optional SAML parameters

Figure 12. Configuring Amazon OpenSearch optional SAML parameters

Click on the Save changes button to complete Amazon OpenSearch SAML configuration for Kibana. We have successfully completed SAML configuration and are now ready for testing.

Validating access with Auth0 users

  • Access OpenSearch Dashboards from the previously created OpenSearch cluster. The OpenSearch Dashboards URL can be found as shown in Figure 13. The first access to the OpenSearch Dashboards URL redirects you to the Auth0 login screen.
The diagram depicts the validation of Auth0 users access with Amazon OpenSearch

Figure 13. Validating Auth0 users access with Amazon OpenSearch

  • Now copy and paste the OpenSearch Dashboards URL in your browser, and enter the user credentials.
  • If your OpenSearch domain is hosted within a private VPC, you will not be able to access your OpenSearch Dashboard over the public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch cluster and your identity provider.
  • You can create a Mac or Windows EC2 instance within the same VPC. This way you can access Amazon OpenSearch Dashboards from your EC2 instance’s web browser to validate your SAML configuration. You can also access Amazon OpenSearch Dashboards through Site-to-Site VPN from an on-premises environment.
  • After successful login, you will be redirected into the OpenSearch Dashboards home page. Explore our sample data and visualizations in OpenSearch Dashboards, as shown in Figure 14.
SAML authenticated Amazon OpenSearch Dashboards

Figure 14. SAML authenticated Amazon OpenSearch Dashboards

  • You now have successfully federated Amazon OpenSearch Dashboards with Auth0 as an identity provider. You can connect OpenSearch Dashboards by using your Auth0 credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created to avoid incurring future charges. Refer to these links:

Conclusion

In this blog post, we have demonstrated how to set up Auth0 as an identity provider over SAML authentication for Amazon OpenSearch Dashboards access. With this solution, you now have an OpenSearch Dashboard that uses Auth0 as the custom identity provider for your users. This reduces the customer login process to one set of credentials and improves employee productivity.

Get started by checking the Amazon OpenSearch Developer Guide, which provides guidance on how to build applications using Amazon OpenSearch for your operational analytics.

Understanding the JVMMemoryPressure metric changes in Amazon OpenSearch Service

Post Syndicated from Liz Snyder original https://aws.amazon.com/blogs/big-data/understanding-the-jvmmemorypressure-metric-changes-in-amazon-opensearch-service/

Amazon OpenSearch Service is a managed service that makes it easy to secure, deploy, and operate OpenSearch and legacy Elasticsearch clusters at scale.

In the latest service software release of Amazon OpenSearch Service, we’ve changed the behavior of the JVMMemoryPressure metric. This metric now reports the overall heap usage, including young and old pools, for all domains that use the G1GC garbage collector. If you’re using Graviton-based data nodes (C6, R6, and M6 instances), or if you enabled Auto-Tune and it has switched your garbage collection algorithm to G1GC, this change will improve your ability to detect and respond to problems with OpenSearch’s Java heap.

Basics of Java garbage collection

Objects in Java are allocated in a heap memory, occupying half of the instance’s RAM up to approximately 32 GB. As your application runs, it creates and destroys objects in the heap, leaving the heap fragmented and making it harder to allocate new objects. Java’s garbage collection algorithm periodically goes through the heap and reclaims the memory of any unused objects. It also compacts the heap when necessary to provide more contiguous free space.

The heap is allocated into smaller memory pools:

Young generation – The young generation memory pool is where new objects are allocated. The young generation is further divided into an Eden space, where all new objects start, and two survivor spaces (S0 and S1), where objects are moved from Eden after surviving one garbage collection cycle. When the young generation fills up, Java performs a minor garbage collection to clean up unmarked objects. Objects that remain in the young generation age until they eventually move to the old generation.

Old generation – The old generation memory pool stores long-lived objects. When objects reach a certain age after multiple garbage collection iterations in the young generation, they are then moved to the old generation.

Permanent generation – The permanent generation contains metadata required by the JVM to describe the classes and methods used in the application at runtime. It is not populated when the old generation’s objects reach a certain age.

Java processes can employ different garbage collection algorithms, selected by command-line option.

  • Concurrent Mark Sweep (CMS) – The different pools are segregated in memory. Stop-the-world pauses, and heap compaction are regular occurrences. The young generation pool is small. All non-Graviton data nodes use CMS.
  • G1 Garbage Collection (G1GC) – All heap memory is a single block, with different areas of memory (regions) allocated to the different pools. The pools are interleaved in physical memory. Stop-the-world pauses and heap compaction are infrequent. The young generation pool is larger. All Graviton data nodes use G1GC. Amazon OpenSearch Service’s Auto-Tune feature can choose G1GC for non-Graviton data nodes.

You can use the CloudWatch console to retrieve statistics about those data points as an ordered set of time-series data, known as metrics. Amazon OpenSearch Service currently publishes three metrics related to JVM memory pressure to CloudWatch:

  • JVMMemoryPressure – The maximum percentage of the Java heap used for all data nodes in the cluster.
  • MasterJVMMemoryPressure – The maximum percentage of the Java heap used for all dedicated master nodes in the cluster.
  • WarmJVMMemoryPressure – The maximum percentage of the Java heap used for UltraWarm nodes in the cluster.

In the latest service software update, Amazon OpenSearch Service improved the logic that it uses to compute these metrics in order to more accurately reflect actual memory utilization.

The problem

Previously, all data nodes used CMS, where the young pool was a small portion of memory. The JVM memory pressure metrics that Amazon OpenSearch Service published to CloudWatch only considered the old pool of the Java heap. You could detect problems in the heap usage by looking only at old generation usage.

When the domain uses G1GC, the young pool is larger, representing a larger percentage of the total heap. Since objects are created first in the young pool, and then moved to the old pool, a significant portion of the usage could be in the young pool. However, the prior metric reported only on the old pool. This leaves domains vulnerable to invisibly running out of memory in the young pool.

What’s changing?

In the latest service software update, Amazon OpenSearch Service changed the logic for the three JVM memory pressure metrics that it sends to CloudWatch to account for the total Java heap in use (old generation and young generation). The goal of this update is to provide a more accurate representation of total memory utilization across your Amazon Opensearch Service domains, especially for Graviton instance types, whose garbage collection logic makes it important to consider all memory pools to calculate actual utilization.

What you can expect

After you update your Amazon OpenSearch Service domains to the latest service software release, the following metrics that Amazon OpenSearch Service sends to CloudWatch will begin to report JVM memory usage for the old and young generation memory pools, rather than just old: JVMemoryPressure, MasterJVMMemoryPressure, and WarmJVMMemoryPressure.

You might see an increase in the values of these metrics, predominantly in G1GC configured domains. In some cases, you might notice a different memory usage pattern altogether, because the young generation memory pool has more frequent garbage collection. Any CloudWatch alarms that you have created around these metrics might be triggered. If this keeps happening, consider scaling your instances vertically up to 64 GiB of RAM, at which point you can scale horizontally by adding instances.

As a standard practice, for domains that have low available memory, Amazon OpenSearch Service blocks further write operations to prevent the domain from reaching red status. You should monitor your memory utilization after the update to get a sense of the actual utilization on your domain. The _nodes/stats/jvm API offers a useful summary of JVM statistics, memory pool usage, and garbage collection information.

Conclusion

Amazon OpenSearch Service recently improved the logic that it uses to calculate JVM memory usage to more accurately reflect actual utilization. The JVMMemoryPressure, MasterJVMMemoryPressure, and WarmJVMMemoryPressure CloudWatch metrics now account for both old and young generation memory pools when calculating memory usage, rather than just old generation. For more information about these metrics, see Monitoring OpenSearch cluster metrics with Amazon CloudWatch.

With the updated metrics, your domains will start to more accurately reflect memory utilization numbers, and might breach CloudWatch alarms that you previously configured. Make sure to monitor your alarms for these metrics and scale your clusters accordingly to maintain optimal memory utilization.

Stay tuned for more exciting updates and new features in Amazon OpenSearch Service.


About the Authors

Liz Snyder is a San Francisco-based technical writer for Amazon OpenSearch Service, OpenSearch OSS, and Amazon CloudSearch.

Jon Handler is a Senior Principal Solutions Architect, specializing in AWS search technologies – Amazon CloudSearch, and Amazon OpenSearch Service. Based in Palo Alto, he helps a broad range of customers get their search and log analytics workloads deployed right and functioning well.