Tag Archives: Amazon OpenSearch Service

Stream Amazon EMR on EKS logs to third-party providers like Splunk, Amazon OpenSearch Service, or other log aggregators

2022-07-20 Matthew Tan

Post Syndicated from Matthew Tan original https://aws.amazon.com/blogs/big-data/stream-amazon-emr-on-eks-logs-to-third-party-providers-like-splunk-amazon-opensearch-service-or-other-log-aggregators/

Spark jobs running on Amazon EMR on EKS generate logs that are very useful in identifying issues with Spark processes and also as a way to see Spark outputs. You can access these logs from a variety of sources. On the Amazon EMR virtual cluster console, you can access logs from the Spark History UI. You also have flexibility to push logs into an Amazon Simple Storage Service (Amazon S3) bucket or Amazon CloudWatch Logs. In each method, these logs are linked to the specific job in question. The common practice of log management in DevOps culture is to centralize logging through the forwarding of logs to an enterprise log aggregation system like Splunk or Amazon OpenSearch Service (successor to Amazon Elasticsearch Service). This enables you to see all the applicable log data in one place. You can identify key trends, anomalies, and correlated events, and troubleshoot problems faster and notify the appropriate people in a timely fashion.

EMR on EKS Spark logs are generated by Spark and can be accessed via the Kubernetes API and kubectl CLI. Therefore, although it’s possible to install log forwarding agents in the Amazon Elastic Kubernetes Service (Amazon EKS) cluster to forward all Kubernetes logs, which include Spark logs, this can become quite expensive at scale because you get information that may not be important for Spark users about Kubernetes. In addition, from a security point of view, the EKS cluster logs and access to kubectl may not be available to the Spark user.

To solve this problem, this post proposes using pod templates to create a sidecar container alongside the Spark job pods. The sidecar containers are able to access the logs contained in the Spark pods and forward these logs to the log aggregator. This approach allows the logs to be managed separately from the EKS cluster and uses a small amount of resources because the sidecar container is only launched during the lifetime of the Spark job.

Implementing Fluent Bit as a sidecar container

Fluent Bit is a lightweight, highly scalable, and high-speed logging and metrics processor and log forwarder. It collects event data from any source, enriches that data, and sends it to any destination. Its lightweight and efficient design coupled with its many features makes it very attractive to those working in the cloud and in containerized environments. It has been deployed extensively and trusted by many, even in large and complex environments. Fluent Bit has zero dependencies and requires only 650 KB in memory to operate, as compared to FluentD, which needs about 40 MB in memory. Therefore, it’s an ideal option as a log forwarder to forward logs generated from Spark jobs.

When you submit a job to EMR on EKS, there are at least two Spark containers: the Spark driver and the Spark executor. The number of Spark executor pods depends on your job submission configuration. If you indicate more than one spark.executor.instances, you get the corresponding number of Spark executor pods. What we want to do here is run Fluent Bit as sidecar containers with the Spark driver and executor pods. Diagrammatically, it looks like the following figure. The Fluent Bit sidecar container reads the indicated logs in the Spark driver and executor pods, and forwards these logs to the target log aggregator directly.

Architecture of Fluent Bit sidecar

Pod templates in EMR on EKS

A Kubernetes pod is a group of one or more containers with shared storage, network resources, and a specification for how to run the containers. Pod templates are specifications for creating pods. It’s part of the desired state of the workload resources used to run the application. Pod template files can define the driver or executor pod configurations that aren’t supported in standard Spark configuration. That being said, Spark is opinionated about certain pod configurations and some values in the pod template are always overwritten by Spark. Using a pod template only allows Spark to start with a template pod and not an empty pod during the pod building process. Pod templates are enabled in EMR on EKS when you configure the Spark properties spark.kubernetes.driver.podTemplateFile and spark.kubernetes.executor.podTemplateFile. Spark downloads these pod templates to construct the driver and executor pods.

Forward logs generated by Spark jobs in EMR on EKS

A log aggregating system like Amazon OpenSearch Service or Splunk should always be available that can accept the logs forwarded by the Fluent Bit sidecar containers. If not, we provide the following scripts in this post to help you launch a log aggregating system like Amazon OpenSearch Service or Splunk installed on an Amazon Elastic Compute Cloud (Amazon EC2) instance.

We use several services to create and configure EMR on EKS. We use an AWS Cloud9 workspace to run all the scripts and to configure the EKS cluster. To prepare to run a job script that requires certain Python libraries absent from the generic EMR images, we use Amazon Elastic Container Registry (Amazon ECR) to store the customized EMR container image.

Create an AWS Cloud9 workspace

The first step is to launch and configure the AWS Cloud9 workspace by following the instructions in Create a Workspace in the EKS Workshop. After you create the workspace, we create AWS Identity and Access Management (IAM) resources. Create an IAM role for the workspace, attach the role to the workspace, and update the workspace IAM settings.

Prepare the AWS Cloud9 workspace

Clone the following GitHub repository and run the following script to prepare the AWS Cloud9 workspace to be ready to install and configure Amazon EKS and EMR on EKS. The shell script prepare_cloud9.sh installs all the necessary components for the AWS Cloud9 workspace to build and manage the EKS cluster. These include the kubectl command line tool, eksctl CLI tool, jq, and to update the AWS Command Line Interface (AWS CLI).

$ sudo yum -y install git
$ cd ~ 
$ git clone https://github.com/aws-samples/aws-emr-eks-log-forwarding.git
$ cd aws-emr-eks-log-forwarding
$ cd emreks
$ bash prepare_cloud9.sh

All the necessary scripts and configuration to run this solution are found in the cloned GitHub repository.

Create a key pair

As part of this particular deployment, you need an EC2 key pair to create an EKS cluster. If you already have an existing EC2 key pair, you may use that key pair. Otherwise, you can create a key pair.

Install Amazon EKS and EMR on EKS

After you configure the AWS Cloud9 workspace, in the same folder (emreks), run the following deployment script:

$ bash deploy_eks_cluster_bash.sh 
Deployment Script -- EMR on EKS
-----------------------------------------------

Please provide the following information before deployment:
1. Region (If your Cloud9 desktop is in the same region as your deployment, you can leave this blank)
2. Account ID (If your Cloud9 desktop is running in the same Account ID as where your deployment will be, you can leave this blank)
3. Name of the S3 bucket to be created for the EMR S3 storage location
Region: [xx-xxxx-x]: < Press enter for default or enter region > 
Account ID [xxxxxxxxxxxx]: < Press enter for default or enter account # > 
EC2 Public Key name: < Provide your key pair name here >
Default S3 bucket name for EMR on EKS (do not add s3://): < bucket name >
Bucket created: XXXXXXXXXXX ...
Deploying CloudFormation stack with the following parameters...
Region: xx-xxxx-x | Account ID: xxxxxxxxxxxx | S3 Bucket: XXXXXXXXXXX

...

EKS Cluster and Virtual EMR Cluster have been installed.

The last line indicates that installation was successful.

Log aggregation options

There are several log aggregation and management tools on the market. This post suggests two of the more popular ones in the industry: Splunk and Amazon OpenSearch Service.

Option 1: Install Splunk Enterprise

To manually install Splunk on an EC2 instance, complete the following steps:

Launch an EC2 instance.
Install Splunk.
Configure the EC2 instance security group to permit access to ports 22, 8000, and 8088.

This post, however, provides an automated way to install Spunk on an EC2 instance:

Download the RPM install file and upload it to an accessible Amazon S3 location.
Upload the following YAML script into AWS CloudFormation.
Provide the necessary parameters, as shown in the screenshots below.
Choose Next and complete the steps to create your stack.

Splunk CloudFormation screen - 1

Splunk CloudFormation screen - 2

Splunk CloudFormation screen - 3

Alternatively, run an AWS CLI script like the following:

aws cloudformation create-stack \
--stack-name "splunk" \
--template-body file://splunk_cf.yaml \
--parameters ParameterKey=KeyName,ParameterValue="< Name of EC2 Key Pair >" \
  ParameterKey=InstanceType,ParameterValue="t3.medium" \
  ParameterKey=LatestAmiId,ParameterValue="/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2" \
  ParameterKey=VPCID,ParameterValue="vpc-XXXXXXXXXXX" \
  ParameterKey=PublicSubnet0,ParameterValue="subnet-XXXXXXXXX" \
  ParameterKey=SSHLocation,ParameterValue="< CIDR Range for SSH access >" \
  ParameterKey=VpcCidrRange,ParameterValue="172.20.0.0/16" \
  ParameterKey=RootVolumeSize,ParameterValue="100" \
  ParameterKey=S3BucketName,ParameterValue="< S3 Bucket Name >" \
  ParameterKey=S3Prefix,ParameterValue="splunk/splunk-8.2.5-77015bc7a462-linux-2.6-x86_64.rpm" \
  ParameterKey=S3DownloadLocation,ParameterValue="/tmp" \
--region < region > \
--capabilities CAPABILITY_IAM

After you build the stack, navigate to the stack’s Outputs tab on the AWS CloudFormation console and note the internal and external DNS for the Splunk instance.

You use these later to configure the Splunk instance and log forwarding.

Splunk CloudFormation output screen

To configure Splunk, go to the Resources tab for the CloudFormation stack and locate the physical ID of EC2Instance.
Choose that link to go to the specific EC2 instance.
Select the instance and choose Connect.

Connect to Splunk Instance

On the Session Manager tab, choose Connect.

Connect to Instance

You’re redirected to the instance’s shell.

Install and configure Splunk as follows:

$ sudo /opt/splunk/bin/splunk start --accept-license
…
Please enter an administrator username: admin
Password must contain at least:
   * 8 total printable ASCII character(s).
Please enter a new password: 
Please confirm new password:
…
Done
                                                           [  OK  ]

Waiting for web server at http://127.0.0.1:8000 to be available......... Done
The Splunk web interface is at http://ip-xx-xxx-xxx-x.us-east-2.compute.internal:8000

Enter the Splunk site using the SplunkPublicDns value from the stack outputs (for example, http://ec2-xx-xxx-xxx-x.us-east-2.compute.amazonaws.com:8000). Note the port number of 8000.
Log in with the user name and password you provided.

Splunk Login

Configure HTTP Event Collector

To configure Splunk to be able to receive logs from Fluent Bit, configure the HTTP Event Collector data input:

Go to Settings and choose Data input.
Choose HTTP Event Collector.
Choose Global Settings.
Select Enabled, keep port number 8088, then choose Save.
Choose New Token.
For Name, enter a name (for example, emreksdemo).
Choose Next.
For Available item(s) for Indexes, add at least the main index.
Choose Review and then Submit.
In the list of HTTP Event Collect tokens, copy the token value for emreksdemo.

You use it when configuring the Fluent Bit output.

splunk-http-collector-list

Option 2: Set up Amazon OpenSearch Service

Your other log aggregation option is to use Amazon OpenSearch Service.

Provision an OpenSearch Service domain

Provisioning an OpenSearch Service domain is very straightforward. In this post, we provide a simple script and configuration to provision a basic domain. To do it yourself, refer to Creating and managing Amazon OpenSearch Service domains.

Before you start, get the ARN of the IAM role that you use to run the Spark jobs. If you created the EKS cluster with the provided script, go to the CloudFormation stack emr-eks-iam-stack. On the Outputs tab, locate the IAMRoleArn output and copy this ARN. We also modify the IAM role later on, after we create the OpenSearch Service domain.

iam_role_emr_eks_job

If you’re using the provided opensearch.sh installer, before you run it, modify the file.

From the root folder of the GitHub repository, cd to opensearch and modify opensearch.sh (you can also use your preferred editor):

[../aws-emr-eks-log-forwarding] $ cd opensearch
[../aws-emr-eks-log-forwarding/opensearch] $ vi opensearch.sh

Configure opensearch.sh to fit your environment, for example:

# name of our Amazon OpenSearch cluster
export ES_DOMAIN_NAME="emreksdemo"

# Elasticsearch version
export ES_VERSION="OpenSearch_1.0"

# Instance Type
export INSTANCE_TYPE="t3.small.search"

# OpenSearch Dashboards admin user
export ES_DOMAIN_USER="emreks"

# OpenSearch Dashboards admin password
export ES_DOMAIN_PASSWORD='< ADD YOUR PASSWORD >'

# Region
export REGION='us-east-1'

Run the script:

[../aws-emr-eks-log-forwarding/opensearch] $ bash opensearch.sh

Configure your OpenSearch Service domain

After you set up your OpenSearch service domain and it’s active, make the following configuration changes to allow logs to be ingested into Amazon OpenSearch Service:

On the Amazon OpenSearch Service console, on the Domains page, choose your domain.

Opensearch Domain Console

On the Security configuration tab, choose Edit.

Opensearch Security Configuration

For Access Policy, select Only use fine-grained access control.
Choose Save changes.

The access policy should look like the following code:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:xx-xxxx-x:xxxxxxxxxxxx:domain/emreksdemo/*"
    }
  ]
}

When the domain is active again, copy the domain ARN.

We use it to configure the Amazon EMR job IAM role we mentioned earlier.

Choose the link for OpenSearch Dashboards URL to enter Amazon OpenSearch Service Dashboards.

Opensearch Main Console

In Amazon OpenSearch Service Dashboards, use the user name and password that you configured earlier in the opensearch.sh file.
Choose the options icon and choose Security under OpenSearch Plugins.

opensearch menu

Choose Roles.
Choose Create role.

opensearch-create-role-button

Enter the new role’s name, cluster permissions, and index permissions. For this post, name the role fluentbit_role and give cluster permissions to the following:
1. indices:admin/create
2. indices:admin/template/get
3. indices:admin/template/put
4. cluster:admin/ingest/pipeline/get
5. cluster:admin/ingest/pipeline/put
6. indices:data/write/bulk
7. indices:data/write/bulk*
8. create_index

opensearch-create-role-button

In the Index permissions section, give write permission to the index fluent-*.
On the Mapped users tab, choose Manage mapping.
For Backend roles, enter the Amazon EMR job execution IAM role ARN to be mapped to the fluentbit_role role.
Choose Map.

opensearch-map-backend

To complete the security configuration, go to the IAM console and add the following inline policy to the EMR on EKS IAM role entered in the backend role. Replace the resource ARN with the ARN of your OpenSearch Service domain.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "es:ESHttp*"
            ],
            "Resource": "arn:aws:es:us-east-2:XXXXXXXXXXXX:domain/emreksdemo"
        }
    ]
}

The configuration of Amazon OpenSearch Service is complete and ready for ingestion of logs from the Fluent Bit sidecar container.

Configure the Fluent Bit sidecar container

We need to write two configuration files to configure a Fluent Bit sidecar container. The first is the Fluent Bit configuration itself, and the second is the Fluent Bit sidecar subprocess configuration that makes sure that the sidecar operation ends when the main Spark job ends. The suggested configuration provided in this post is for Splunk and Amazon OpenSearch Service. However, you can configure Fluent Bit with other third-party log aggregators. For more information about configuring outputs, refer to Outputs.

Fluent Bit ConfigMap

The following sample ConfigMap is from the GitHub repo:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-sidecar-config
  namespace: sparkns
  labels:
    app.kubernetes.io/name: fluent-bit
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-application.conf
    @INCLUDE input-event-logs.conf
    @INCLUDE output-splunk.conf
    @INCLUDE output-opensearch.conf

  input-application.conf: |
    [INPUT]
        Name              tail
        Path              /var/log/spark/user/*/*
        Path_Key          filename
        Buffer_Chunk_Size 1M
        Buffer_Max_Size   5M
        Skip_Long_Lines   On
        Skip_Empty_Lines  On

  input-event-logs.conf: |
    [INPUT]
        Name              tail
        Path              /var/log/spark/apps/*
        Path_Key          filename
        Buffer_Chunk_Size 1M
        Buffer_Max_Size   5M
        Skip_Long_Lines   On
        Skip_Empty_Lines  On

  output-splunk.conf: |
    [OUTPUT]
        Name            splunk
        Match           *
        Host            < INTERNAL DNS of Splunk EC2 Instance >
        Port            8088
        TLS             On
        TLS.Verify      Off
        Splunk_Token    < Token as provided by the HTTP Event Collector in Splunk >

  output-opensearch.conf: |
[OUTPUT]
        Name            es
        Match           *
        Host            < HOST NAME of the OpenSearch Domain | No HTTP protocol >
        Port            443
        TLS             On
        AWS_Auth        On
        AWS_Region      < Region >
        Retry_Limit     6

In your AWS Cloud9 workspace, modify the ConfigMap accordingly. Provide the values for the placeholder text by running the following commands to enter the VI editor mode. If preferred, you can use PICO or a different editor:

[../aws-emr-eks-log-forwarding] $  cd kube/configmaps
[../aws-emr-eks-log-forwarding/kube/configmaps] $ vi emr_configmap.yaml

# Modify the emr_configmap.yaml as above
# Save the file once it is completed

Complete either the Splunk output configuration or the Amazon OpenSearch Service output configuration.

Next, run the following commands to add the two Fluent Bit sidecar and subprocess ConfigMaps:

[../aws-emr-eks-log-forwarding/kube/configmaps] $ kubectl apply -f emr_configmap.yaml
[../aws-emr-eks-log-forwarding/kube/configmaps] $ kubectl apply -f emr_entrypoint_configmap.yaml

You don’t need to modify the second ConfigMap because it’s the subprocess script that runs inside the Fluent Bit sidecar container. To verify that the ConfigMaps have been installed, run the following command:

$ kubectl get cm -n sparkns
NAME                         DATA   AGE
fluent-bit-sidecar-config    6      15s
fluent-bit-sidecar-wrapper   2      15s

Set up a customized EMR container image

To run the sample PySpark script, the script requires the Boto3 package that’s not available in the standard EMR container images. If you want to run your own script and it doesn’t require a customized EMR container image, you may skip this step.

Run the following script:

[../aws-emr-eks-log-forwarding] $ cd ecr
[../aws-emr-eks-log-forwarding/ecr] $ bash create_custom_image.sh <region> <EMR container image account number>

The EMR container image account number can be obtained from How to select a base image URI. This documentation also provides the appropriate ECR registry account number. For example, the registry account number for us-east-1 is 755674844232.

To verify the repository and image, run the following commands:

$ aws ecr describe-repositories --region < region > | grep emr-6.5.0-custom
            "repositoryArn": "arn:aws:ecr:xx-xxxx-x:xxxxxxxxxxxx:repository/emr-6.5.0-custom",
            "repositoryName": "emr-6.5.0-custom",
            "repositoryUri": " xxxxxxxxxxxx.dkr.ecr.xx-xxxx-x.amazonaws.com/emr-6.5.0-custom",

$ aws ecr describe-images --region < region > --repository-name emr-6.5.0-custom | jq .imageDetails[0].imageTags
[
  "latest"
]

Prepare pod templates for Spark jobs

Upload the two Spark driver and Spark executor pod templates to an S3 bucket and prefix. The two pod templates can be found in the GitHub repository:

emr_driver_template.yaml – Spark driver pod template
emr_executor_template.yaml – Spark executor pod template

The pod templates provided here should not be modified.

Submitting a Spark job with a Fluent Bit sidecar container

This Spark job example uses the bostonproperty.py script. To use this script, upload it to an accessible S3 bucket and prefix and complete the preceding steps to use an EMR customized container image. You also need to upload the CSV file from the GitHub repo, which you need to download and unzip. Upload the unzipped file to the following location: s3://<your chosen bucket>/<first level folder>/data/boston-property-assessment-2021.csv.

The following commands assume that you launched your EKS cluster and virtual EMR cluster with the parameters indicated in the GitHub repo.

Variable	Where to Find the Information or the Value Required
EMR_EKS_CLUSTER_ID	Amazon EMR console virtual cluster page
EMR_EKS_EXECUTION_ARN	IAM role ARN
EMR_RELEASE	emr-6.5.0-latest
S3_BUCKET	The bucket you create in Amazon S3
S3_FOLDER	The preferred prefix you want to use in Amazon S3
CONTAINER_IMAGE	The URI in Amazon ECR where your container image is
SCRIPT_NAME	emreksdemo-script or a name you prefer

Alternatively, use the provided script to run the job. Change the directory to the scripts folder in emreks and run the script as follows:

[../aws-emr-eks-log-forwarding] cd emreks/scripts
[../aws-emr-eks-log-forwarding/emreks/scripts] bash run_emr_script.sh < S3 bucket name > < ECR container image > < script path>

Example: bash run_emr_script.sh emreksdemo-123456 12345678990.dkr.ecr.us-east-2.amazonaws.com/emr-6.5.0-custom s3://emreksdemo-123456/scripts/scriptname.py

After you submit the Spark job successfully, you get a return JSON response like the following:

{
    "id": "0000000305e814v0bpt",
    "name": "emreksdemo-job",
    "arn": "arn:aws:emr-containers:xx-xxxx-x:XXXXXXXXXXX:/virtualclusters/upobc00wgff5XXXXXXXXXXX/jobruns/0000000305e814v0bpt",
    "virtualClusterId": "upobc00wgff5XXXXXXXXXXX"
}

What happens when you submit a Spark job with a sidecar container

After you submit a Spark job, you can see what is happening by viewing the pods that are generated and the corresponding logs. First, using kubectl, get a list of the pods generated in the namespace where the EMR virtual cluster runs. In this case, it’s sparkns. The first pod in the following code is the job controller for this particular Spark job. The second pod is the Spark executor; there can be more than one pod depending on how many executor instances are asked for in the Spark job setting—we asked for one here. The third pod is the Spark driver pod.

$ kubectl get pods -n sparkns
NAME                                        READY   STATUS    RESTARTS   AGE
0000000305e814v0bpt-hvwjs                   3/3     Running   0          25s
emreksdemo-script-1247bf80ae40b089-exec-1   0/3     Pending   0          0s
spark-0000000305e814v0bpt-driver            3/3     Running   0          11s

To view what happens in the sidecar container, follow the logs in the Spark driver pod and refer to the sidecar. The sidecar container launches with the Spark pods and persists until the file /var/log/fluentd/main-container-terminated is no longer available. For more information about how Amazon EMR controls the pod lifecycle, refer to Using pod templates. The subprocess script ties the sidecar container to this same lifecycle and deletes itself upon the EMR controlled pod lifecycle process.

$ kubectl logs spark-0000000305e814v0bpt-driver -n sparkns  -c custom-side-car-container --follow=true

Waiting for file /var/log/fluentd/main-container-terminated to appear...
AWS for Fluent Bit Container Image Version 2.24.0Start wait: 1652190909
Elapsed Wait: 0
Not found count: 0
Waiting...
Fluent Bit v1.9.3
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/05/10 13:55:09] [ info] [fluent bit] version=1.9.3, commit=9eb4996b7d, pid=11
[2022/05/10 13:55:09] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/05/10 13:55:09] [ info] [cmetrics] version=0.3.1
[2022/05/10 13:55:09] [ info] [output:splunk:splunk.0] worker #0 started
[2022/05/10 13:55:09] [ info] [output:splunk:splunk.0] worker #1 started
[2022/05/10 13:55:09] [ info] [output:es:es.1] worker #0 started
[2022/05/10 13:55:09] [ info] [output:es:es.1] worker #1 started
[2022/05/10 13:55:09] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/05/10 13:55:09] [ info] [sp] stream processor started
Waiting for file /var/log/fluentd/main-container-terminated to appear...
Last heartbeat: 1652190914
Elapsed Time since after heartbeat: 0
Found count: 0
list files:
-rw-r--r-- 1 saslauth 65534 0 May 10 13:55 /var/log/fluentd/main-container-terminated
Last heartbeat: 1652190918

…

[2022/05/10 13:56:09] [ info] [input:tail:tail.0] inotify_fs_add(): inode=58834691 watch_fd=6 name=/var/log/spark/user/spark-0000000305e814v0bpt-driver/stdout-s3-container-log-in-tail.pos
[2022/05/10 13:56:09] [ info] [input:tail:tail.1] inotify_fs_add(): inode=54644346 watch_fd=1 name=/var/log/spark/apps/spark-0000000305e814v0bpt
Outside of loop, main-container-terminated file no longer exists
ls: cannot access /var/log/fluentd/main-container-terminated: No such file or directory
The file /var/log/fluentd/main-container-terminated doesn't exist anymore;
TERMINATED PROCESS
Fluent-Bit pid: 11
Killing process after sleeping for 15 seconds
root        11     8  0 13:55 ?        00:00:00 /fluent-bit/bin/fluent-bit -e /fluent-bit/firehose.so -e /fluent-bit/cloudwatch.so -e /fluent-bit/kinesis.so -c /fluent-bit/etc/fluent-bit.conf
root       114     7  0 13:56 ?        00:00:00 grep fluent
Killing process 11
[2022/05/10 13:56:24] [engine] caught signal (SIGTERM)
[2022/05/10 13:56:24] [ info] [input] pausing tail.0
[2022/05/10 13:56:24] [ info] [input] pausing tail.1
[2022/05/10 13:56:24] [ warn] [engine] service will shutdown in max 5 seconds
[2022/05/10 13:56:25] [ info] [engine] service has stopped (0 pending tasks)
[2022/05/10 13:56:25] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=54644346 watch_fd=1
[2022/05/10 13:56:25] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=60917120 watch_fd=1
[2022/05/10 13:56:25] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=60917121 watch_fd=2
[2022/05/10 13:56:25] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=58834690 watch_fd=3
[2022/05/10 13:56:25] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=58834692 watch_fd=4
[2022/05/10 13:56:25] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=58834689 watch_fd=5
[2022/05/10 13:56:25] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=58834691 watch_fd=6
[2022/05/10 13:56:25] [ info] [output:splunk:splunk.0] thread worker #0 stopping...
[2022/05/10 13:56:25] [ info] [output:splunk:splunk.0] thread worker #0 stopped
[2022/05/10 13:56:25] [ info] [output:splunk:splunk.0] thread worker #1 stopping...
[2022/05/10 13:56:25] [ info] [output:splunk:splunk.0] thread worker #1 stopped
[2022/05/10 13:56:25] [ info] [output:es:es.1] thread worker #0 stopping...
[2022/05/10 13:56:25] [ info] [output:es:es.1] thread worker #0 stopped
[2022/05/10 13:56:25] [ info] [output:es:es.1] thread worker #1 stopping...
[2022/05/10 13:56:25] [ info] [output:es:es.1] thread worker #1 stopped

View the forwarded logs in Splunk or Amazon OpenSearch Service

To view the forwarded logs, do a search in Splunk or on the Amazon OpenSearch Service console. If you’re using a shared log aggregator, you may have to filter the results. In this configuration, the logs tailed by Fluent Bit are in the /var/log/spark/*. The following screenshots show the logs generated specifically by the Kubernetes Spark driver stdout that were forwarded to the log aggregators. You can compare the results with the logs provided using kubectl:

kubectl logs < Spark Driver Pod > -n < namespace > -c spark-kubernetes-driver --follow=true

…
root
 |-- PID: string (nullable = true)
 |-- CM_ID: string (nullable = true)
 |-- GIS_ID: string (nullable = true)
 |-- ST_NUM: string (nullable = true)
 |-- ST_NAME: string (nullable = true)
 |-- UNIT_NUM: string (nullable = true)
 |-- CITY: string (nullable = true)
 |-- ZIPCODE: string (nullable = true)
 |-- BLDG_SEQ: string (nullable = true)
 |-- NUM_BLDGS: string (nullable = true)
 |-- LUC: string (nullable = true)
…

|02108|RETAIL CONDO           |361450.0            |63800.0        |5977500.0      |
|02108|RETAIL STORE DETACH    |2295050.0           |988200.0       |3601900.0      |
|02108|SCHOOL                 |1.20858E7           |1.20858E7      |1.20858E7      |
|02108|SINGLE FAM DWELLING    |5267156.561085973   |1153400.0      |1.57334E7      |
+-----+-----------------------+--------------------+---------------+---------------+
only showing top 50 rows

The following screenshot shows the Splunk logs.

splunk-result-driver-stdout

The following screenshots show the Amazon OpenSearch Service logs.

opensearch-result-driver-stdout

Optional: Include a buffer between Fluent Bit and the log aggregators

If you expect to generate a lot of logs because of high concurrent Spark jobs creating multiple individual connects that may overwhelm your Amazon OpenSearch Service or Splunk log aggregation clusters, consider employing a buffer between the Fluent Bit sidecars and your log aggregator. One option is to use Amazon Kinesis Data Firehose as the buffering service.

Kinesis Data Firehose has built-in delivery to both Amazon OpenSearch Service and Splunk. If using Amazon OpenSearch Service, refer to Loading streaming data from Amazon Kinesis Data Firehose. If using Splunk, refer to Configure Amazon Kinesis Firehose to send data to the Splunk platform and Choose Splunk for Your Destination.

To configure Fluent Bit to Kinesis Data Firehose, add the following to your ConfigMap output. Refer to the GitHub ConfigMap example and add the @INCLUDE under the [SERVICE] section:

     @INCLUDE output-kinesisfirehose.conf
…

  output-kinesisfirehose.conf: |
    [OUTPUT]
        Name            kinesis_firehose
        Match           *
        region          < region >
        delivery_stream < Kinesis Firehose Stream Name >

Optional: Use data streams for Amazon OpenSearch Service

If you’re in a scenario where the number of documents grows rapidly and you don’t need to update older documents, you need to manage the OpenSearch Service cluster. This involves steps like creating a rollover index alias, defining a write index, and defining common mappings and settings for the backing indexes. Consider using data streams to simplify this process and enforce a setup that best suits your time series data. For instructions on implementing data streams, refer to Data streams.

Clean up

To avoid incurring future charges, delete the resources by deleting the CloudFormation stacks that were created with this script. This removes the EKS cluster. However, before you do that, remove the EMR virtual cluster first by running the delete-virtual-cluster command. Then delete all the CloudFormation stacks generated by the deployment script.

If you launched an OpenSearch Service domain, you can delete the domain from the OpenSearch Service domain. If you used the script to launch a Splunk instance, you can go to the CloudFormation stack that launched the Splunk instance and delete the CloudFormation stack. This removes remove the Splunk instance and associated resources.

You can also use the following scripts to clean up resources:

To remove an OpenSearch Service domain, run the delete_opensearch_domain.sh script
To remove the Splunk resource, run the remove_splunk_deployment.sh script
To remove the EKS cluster, including the EMR virtual cluster, VPC, and other IAM roles, run the remove_emr_eks_deployment.sh script

Conclusion

EMR on EKS facilitates running Spark jobs on Kubernetes to achieve very fast and cost-efficient Spark operations. This is made possible through scheduling transient pods that are launched and then deleted the jobs are complete. To log all these operations in the same lifecycle of the Spark jobs, this post provides a solution using pod templates and Fluent Bit that is lightweight and powerful. This approach offers a decoupled way of log forwarding based at the Spark application level and not at the Kubernetes cluster level. It also avoids routing through intermediaries like CloudWatch, reducing cost and complexity. In this way, you can address security concerns and DevOps and system administration ease of management while providing Spark users with insights into their Spark jobs in a cost-efficient and functional way.

If you have questions or suggestions, please leave a comment.

About the Author

Matthew Tan is a Senior Analytics Solutions Architect at Amazon Web Services and provides guidance to customers developing solutions with AWS Analytics services on their analytics workloads.

How Plugsurfing doubled performance and reduced cost by 70% with purpose-built databases and AWS Graviton

2022-07-18 Anand Shah

Post Syndicated from Anand Shah original https://aws.amazon.com/blogs/big-data/how-plugsurfing-doubled-performance-and-reduced-cost-by-70-with-purpose-built-databases-and-aws-graviton/

Plugsurfing aligns the entire car charging ecosystem—drivers, charging point operators, and carmakers—within a single platform. The over 1 million drivers connected to the Plugsurfing Power Platform benefit from a network of over 300,000 charging points across Europe. Plugsurfing serves charging point operators with a backend cloud software for managing everything from country-specific regulations to providing diverse payment options for customers. Carmakers benefit from white label solutions as well as deeper integrations with their in-house technology. The platform-based ecosystem has already processed more than 18 million charging sessions. Plugsurfing was acquired fully by Fortum Oyj in 2018.

Plugsurfing uses Amazon OpenSearch Service as a central data store to store 300,000 charging stations’ information and to power search and filter requests coming from mobile, web, and connected car dashboard clients. With the increasing usage, Plugsurfing created multiple read replicas of an OpenSearch Service cluster to meet demand and scale. Over time and with the increase in demand, this solution started to become cost exhaustive and limited in terms of cost performance benefit.

AWS EMEA Prototyping Labs collaborated with the Plugsurfing team for 4 weeks on a hands-on prototyping engagement to solve this problem, which resulted in 70% cost savings and doubled the performance benefit over the current solution. This post shows the overall approach and ideas we tested with Plugsurfing to achieve the results.

The challenge: Scaling higher transactions per second while keeping costs under control

One of the key issues of the legacy solution was keeping up with higher transactions per second (TPS) from APIs while keeping costs low. The majority of the cost was coming from the OpenSearch Service cluster, because the mobile, web, and EV car dashboards use different APIs for different use cases, but all query the same cluster. The solution to achieve higher TPS with the legacy solution was to scale the OpenSearch Service cluster.

The following figure illustrates the legacy architecture.

Legacy Architecture

Plugsurfing APIs are responsible for serving data for four different use cases:

Radius search – Find all the EV charging stations (latitude/longitude) with in x km radius from the point of interest (or current location on GPS).
Square search – Find all the EV charging stations within a box of length x width, where the point of interest (or current location on GPS) is at the center.
Geo clustering search – Find all the EV charging stations clustered (grouped) by their concentration within a given area. For example, searching all EV chargers in all of Germany results in something like 50 in Munich and 100 in Berlin.
Radius search with filtering – Filter the results by EV charger that are available or in use by plug type, power rating, or other filters.

The OpenSearch Service domain configuration was as follows:

m4.10xlarge.search x 4 nodes
Elasticsearch 7.10 version
A single index to store 300,000 EV charger locations with five shards and one replica
A nested document structure

The following code shows the example document

{
   "locationId":"location:1",
   "location":{
      "latitude":32.1123,
      "longitude":-9.2523
   },
   "adress":"parking lot 1",
   "chargers":[
      {
         "chargerId":"location:1:charger:1",
         "connectors":[
            {
               "connectorId":"location:1:charger:1:connector:1",
               "status":"AVAILABLE",
               "plug_type":"Type A"
            }
         ]
      }
   ]
}

Solution overview

AWS EMEA Prototyping Labs proposed an experimentation approach to try three high-level ideas for performance optimization and to lower overall solution costs.

We launched an Amazon Elastic Compute Cloud (EC2) instance in a prototyping AWS account to host a benchmarking tool based on k6 (an open-source tool that makes load testing simple for developers and QA engineers) Later, we used scripts to dump and restore production data to various databases, transforming it to fit with different data models. Then we ran k6 scripts to run and record performance metrics for each use case, database, and data model combination. We also used the AWS Pricing Calculator to estimate the cost of each experiment.

Experiment 1: Use AWS Graviton and optimize OpenSearch Service domain configuration

We benchmarked a replica of the legacy OpenSearch Service domain setup in a prototyping environment to baseline performance and costs. Next, we analyzed the current cluster setup and recommended testing the following changes:

Use AWS Graviton based memory optimized EC2 instances (r6g) x 2 nodes in the cluster
Reduce the number of shards from five to one, given the volume of data (all documents) is less than 1 GB
Increase the refresh interval configuration from the default 1 second to 5 seconds
Denormalize the full document; if not possible, then denormalize all the fields that are part of the search query
Upgrade to Amazon OpenSearch Service 1.0 from Elasticsearch 7.10

Plugsurfing created multiple new OpenSearch Service domains with the same data and benchmarked them against the legacy baseline to obtain the following results. The row in yellow represents the baseline from the legacy setup; the rows with green represent the best outcome out of all experiments performed for the given use cases.

DB Engine	Version	Node Type	Nodes in Cluster	Configurations	Data Modeling	Radius req/sec	Filtering req/sec	Performance Gain %
Elasticsearch	7.1	m4.10xlarge	4	5 shards, 1 replica	Nested	2841	580	0
Amazon OpenSearch Service	1.0	r6g.xlarge	2	1 shards, 1 replica	Nested	850	271	32.77
Amazon OpenSearch Service	1.0	r6g.xlarge	2	1 shards, 1 replica	Denormalized	872	670	45.07
Amazon OpenSearch Service	1.0	r6g.2xlarge	2	1 shards, 1 replica	Nested	1667	474	62.58
Amazon OpenSearch Service	1.0	r6g.2xlarge	2	1 shards, 1 replica	Denormalized	1993	1268	95.32

Plugsurfing was able to gain 95% (doubled) better performance across the radius and filtering use cases with this experiment.

Experiment 2: Use purpose-built databases on AWS for different use cases

We tested Amazon OpenSearch Service, Amazon Aurora PostgreSQL-Compatible Edition, and Amazon DynamoDB extensively with many data models for different use cases.

We tested the square search use case with an Aurora PostgreSQL cluster with a db.r6g.2xlarge single node as the reader and a db.r6g.large single node as the writer. The square search used a single PostgreSQL table configured via the following steps:

Create the geo search table with geography as the data type to store latitude/longitude:

CREATE TYPE status AS ENUM ('available', 'inuse', 'out-of-order');

CREATE TABLE IF NOT EXISTS square_search
(
id     serial PRIMARY KEY,
geog   geography(POINT),
status status,
data   text -- Can be used as json data type, or add extra fields as flat json
);

Create an index on the geog field:

CREATE INDEX global_points_gix ON square_search USING GIST (geog);

Query the data for the square search use case:

SELECT id, ST_AsText(geog), status, datafrom square_search
where geog && ST_MakeEnvelope(32.5,9,32.8,11,4326) limit 100;

We achieved an eight-times greater improvement in TPS for the square search use case, as shown in the following table.

DB Engine	Version	Node Type	Nodes in Cluster	Configurations	Data modeling	Square req/sec	Performance Gain %
Elasticsearch	7.1	m4.10xlarge	4	5 shards, 1 replica	Nested	412	0
Aurora PostgreSQL	13.4	r6g.large	2	PostGIS, Denormalized	Single table	881	213.83
Aurora PostgreSQL	13.4	r6g.xlarge	2	PostGIS, Denormalized	Single table	1770	429.61
Aurora PostgreSQL	13.4	r6g.2xlarge	2	PostGIS, Denormalized	Single table	3553	862.38

We tested the geo clustering search use case with a DynamoDB model. The partition key (PK) is made up of three components: <zoom-level>:<geo-hash>:<api-key>, and the sort key is the EV charger current status. We examined the following:

The zoom level of the map set by the user
The geo hash computed based on the map tile in the user’s view port area (at every zoom level, the map of Earth is divided into multiple tiles, where each tile can be represented as a geohash)
The API key to identify the API user

Partition Key: String	Sort Key: String	total_pins: Number	filter1_pins: Number	filter2_pins: Number	filter3_pins: Number
5:gbsuv:api_user_1	Available	100	50	67	12
5:gbsuv:api_user_1	in-use	25	12	6	1
6:gbsuvt:api_user_1	Available	35	22	8	0
6:gbsuvt:api_user_1	in-use	88	4	0	35

The writer updates the counters (increment or decrement) against each filter condition and charger status whenever the EV charger status is updated at all zoom levels. With this model, the reader can query pre-clustered data with a single direct partition hit for all the map tiles viewable by the user at the given zoom level.

The DynamoDB model helped us gain a 45-times greater read performance for our geo clustering use case. However, it also added extra work on the writer side to pre-compute numbers and update multiple rows when the status of a single EV charger is updated. The following table summarizes our results.

DB Engine	Version	Node Type	Nodes in Cluster	Configurations	Data modeling	Clustering req/sec	Performance Gain %
Elasticsearch	7.1	m4.10xlarge	4	5 shards, 1 replica	Nested	22	0
DynamoDB	NA	Serverless	0	100 WCU, 500 RCU	Single table	1000	4545.45

Experiment 3: Use AWS Lambda@Edge and AWS Wavelength for better network performance

We recommended that Plugsurfing use Lambda@Edge and AWS Wavelength to optimize network performance by shifting some of the APIs at the edge to closer to the user. The EV car dashboard can use the same 5G network connectivity to invoke Plugsurfing APIs with AWS Wavelength.

Post-prototype architecture

The post-prototype architecture used purpose-built databases on AWS to achieve better performance across all four use cases. We looked at the results and split the workload based on which database performs best for each use case. This approach optimized performance and cost, but added complexity on readers and writers. The final experiment summary represents the database fits for the given use cases that provide the best performance (highlighted in orange).

Plugsurfing has already implemented a short-term plan (light green) as an immediate action post-prototype and plans to implement mid-term and long-term actions (dark green) in the future.

DB Engine	Node Type	Configurations	Radius req/sec	Radius Filtering req/sec	Clustering req/sec	Square req/sec	Monthly Costs $	Cost Benefit %	Performance Gain %
Elasticsearch 7.1	m4.10xlarge x4	5 shards	2841	580	22	412	9584,64	0	0
Amazon OpenSearch Service 1.0	r6g.2xlarge x2	1 shard Nested	1667	474	34	142	1078,56	88,75	-39,9
Amazon OpenSearch Service 1.0	r6g.2xlarge x2	1 shard	1993	1268	125	685	1078,56	88,75	5,6
Aurora PostgreSQL 13.4	r6g.2xlarge x2	PostGIS	0	0	275	3553	1031,04	89,24	782,03
DynamoDB	Serverless	100 WCU 500 RCU	0	0	1000	0	106,06	98,89	4445,45
Summary	.	.	2052	1268	1000	3553	2215,66	76,88	104,23

The following diagram illustrates the updated architecture.

Post Prototype Architecture

Conclusion

Plugsurfing was able to achieve a 70% cost reduction over their legacy setup with two-times better performance by using purpose-built databases like DynamoDB, Aurora PostgreSQL, and AWS Graviton based instances for Amazon OpenSearch Service. They achieved the following results:

The radius search and radius search with filtering use cases achieved better performance using Amazon OpenSearch Service on AWS Graviton with a denormalized document structure
The square search use case performed better using Aurora PostgreSQL, where we used the PostGIS extension for geo square queries
The geo clustering search use case performed better using DynamoDB

Learn more about AWS Graviton instances and purpose-built databases on AWS, and let us know how we can help optimize your workload on AWS.

About the Author

Anand Shah is a Big Data Prototyping Solution Architect at AWS. He works with AWS customers and their engineering teams to build prototypes using AWS Analytics services and purpose-built databases. Anand helps customers solve the most challenging problems using art-of-the-possible technology. He enjoys beaches in his leisure time.

Custom packages and hot reload of dictionary files with Amazon OpenSearch Service

2022-07-14 Sonam Chaudhary

Post Syndicated from Sonam Chaudhary original https://aws.amazon.com/blogs/big-data/custom-packages-and-hot-reload-of-dictionary-files-with-amazon-opensearch-service/

Amazon OpenSearch Service is a fully managed service that you can use to deploy and operate OpenSearch clusters cost-effectively at scale in the AWS Cloud. The service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more by offering the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions).

There are various use cases such as website search, ecommerce search, and enterprise search where the user wants to get relevant content for specific terms. Search engines match the terms (words) sent through the query API. When there are many different ways of specifying the same concept, you use synonyms to give the search engine more match targets than what the user entered.

Similarly, there are certain use cases where input data has a lot of common or frequently occurring words that don’t add much relevance when used in a search query. These include words like “the,” “this,” and “that.” These can be classified as stopwords.

OpenSearch Service allows you to upload custom dictionary files, which can include synonyms and stopwords to be customized to your use case. This is especially useful for use cases where you want to do the following:

Specify words that can be treated as equivalent. For example, you can specify that words such as “bread,” “danish,” and “croissant” be treated as synonymous. This leads to better search results because instead of returning a null result if an exact match isn’t found, an approximately relevant or equivalent result is returned.
Ignore certain high frequency terms that are common and lack useful information in terms of contributing to the search’s relevance store. These could include “a,” “the,” “of,” “an,” and so on.

Specifying stems, synonyms, and stopwords can greatly help with query accuracy, and allows you to customize and enhance query relevance. They can also help with stemming (such as in the Japanese (kuromoji) Analysis Plugin). Stemming is reducing a word to its root form. For Example, “cooking” and “cooked” can be stemmed to the same root word “cook.” This way, any variants of a word can be stemmed to one root word to enhance the query results.

In this post, we show how we can add custom packages for synonyms and stopwords to an OpenSearch Service domain. We start by creating custom packages for synonyms and stopwords and creating a custom analyzer for a sample index that uses the standard tokenizer and a synonym token filter, followed by a demonstration of hot reload of dictionary files.

Tokenizers and token filters

Tokenizers break streams of characters into tokens (typically words) based on some set of rules. The simplest example is the whitespace tokenizer, which breaks the preceding characters into a token each time it encounters a whitespace character. A more complex example is the standard tokenizer, which uses a set of grammar-based rules to work across many languages.

Token filters add, modify, or delete tokens. For example, a synonym token filter adds tokens when it finds a word in the synonyms list. The stop token filter removes tokens when finds a word in the stopwords list.

Prerequisites

For this demo, you must have an OpenSearch Service cluster (version 1.2) running. You can use this feature on any version of OpenSearch Service running 7.8+.

Users without administrator access require certain AWS Identity and Access Management (IAM) actions in order to manage packages: es:CreatePackage, es:DeletePackage, es:AssociatePackage, and es:DissociatePackage. The user also needs permissions on the Amazon Simple Storage Service (Amazon S3) bucket path or object where the custom package resides. Grant all permission within IAM, not in the domain access policy. This allows for better management of permissions because any change in permissions can be separate from the domain and allows the user to perform the same action across multiple OpenSearch Service domains (if needed).

Set up the custom packages

To set up the solution, complete the following steps:

On the Amazon S3 console, create a bucket to hold the custom packages.
Upload the files with the stopwords and synonyms to this bucket. For this post, the file contents are as follows:
1. synonyms.txt:
```
pasta, penne, ravioli 
ice cream, gelato, frozen custard
danish, croissant, pastry, bread
```
2. stopwords.txt:
```
the
a
an
of
```
  The following screenshot shows the uploaded files:

Now we import our packages and associate them with a domain.

On the OpenSearch Service console, choose Packages in the navigation pane.
Choose Import package.
Enter a name for your package (for the synonym package, we use my-custom-synonym-package) and optional description.
For Package source, enter the S3 location where synonyms.txt is stored.
Choose Submit.
Repeat these steps to create a package with stopwords.txt.
Choose your synonym package when its status shows as Available.
Choose Associate to a domain.
Select your OpenSearch Service domain, then choose Associate.
Repeat these steps to associate your OpenSearch Service domain to the stopwords package.
When the packages are available, note their IDs.

You use analyzers/id as the file path in your requests to OpenSearch.

Use the custom packages with your data

After you associate a file with a domain, you can use it in parameters such as synonyms_path and stopwords_path when you create tokenizers and token filters. For more information, see OpenSearch Service.

You can create a new index (my-index-test) using the following snippet in the OpenSearch Service domain and specify the Analyzers/id values for the synonyms and stopwords packages.

Open OpenSearch Dashboards.
On the Home menu, choose Dev Tools.

Enter the following code in the left pane:

PUT my-index-test
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": ["my_stop_filter" , "my_synonym_filter"]
          }
        },
        "filter": {
          "my_stop_filter": {
            "type": "stop",
            "stopwords_path": "analyzers/Fxxxxxxxxx",
            "updateable": true
          },
          "my_synonym_filter": {
            "type": "synonym",
            "synonyms_path": "analyzers/Fxxxxxxxxx",
            "updateable": true
          }
            
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "standard",
        "search_analyzer": "my_analyzer"
      }
    }
  }
}

Choose the play sign to send the request to create the index with our custom synonyms and stopwords.

The following screenshot shows our results.

This request creates a custom analyzer for my index that uses the standard tokenizer and a synonym and stop token filter. This request also adds a text field (description) to the mapping and tells OpenSearch to use the new analyzer as its search analyzer. It still uses the standard analyzer as its index analyzer.

Note the line "updateable": true in the token filter. This field only applies to search analyzers, not index analyzers, and is critical if you later want to update the search analyzer automatically.

Let’s start by adding some sample data to the index my-index-test:

POST _bulk
{ "index": { "_index": "my-index-test", "_id": "1" } }
{ "description": "pasta" }
{ "index": { "_index": "my-index-test", "_id": "2" } }
{ "description": "the bread" }
{ "index": { "_index": "my-index-test", "_id": "3" } }
{ "description": "ice cream" }
{ "index": { "_index": "my-index-test", "_id": "4" } }
{ "description": "croissant" }

Now If you search for the words you specified in the synonyms.txt file, you get the required results. Note that my test index only has pasta in the indexed data, but because I specified “ravioli” as a synonym for “pasta” in my associated package, I get the results for all documents that have the word “pasta” when I search for “ravioli.”

GET my-index-test/_search
{
  "query": {
    "match": {
      "description": "ravioli"
    }
  }
}

Similarly, you can use the stopwords feature to specify common words that can be filtered out while showing search results and don’t impact the relevance much while returning search query results.

Hot reload

Now let’s say you want to add another synonym (“spaghetti”) for “pasta.”

The first step is to update the synonyms.txt file as follows and upload this updated file to your S3 bucket:
```
pasta , penne , ravioli, spaghetti
ice cream, gelato, frozen custard
danish, croissant, pastry , bread
```
Uploading a new version of a package to Amazon S3 doesn’t automatically update the package on OpenSearch Service. OpenSearch Service stores its own copy of the file, so if you upload a new version to Amazon S3, you must manually update it in OpenSearch Service.

If you try to run the search query against the index for the term “spaghetti” at this point, you don’t get any results:
```
GET my-index-test/_search
{
  "query": {
    "match": {
      "description": "spaghetti"
    }
  }
}
```
After the file is modified in Amazon S3, update the package in OpenSearch Service, then apply the update. To do this, perform the following steps:
On the OpenSearch Service console, choose Packages.
Choose the package you created for custom synonyms and choose Update.
Provide the S3 path to the file, then choose Update package.
Enter a description and choose Update package.

You return to the Packages page.
When the package status shows as Available, choose it and wait for the associated domain to show as updated.
Select the domain and choose Apply update.
Choose Apply update again to confirm.

Wait for the association status to change to Active to confirm that the package version is also updated.

If your domain runs Elasticsearch 7.7 or earlier, uses index analyzers, or doesn’t use the updateable field, and if you want to add some additional synonyms at a later time, you have to reindex your data with the new dictionary file. Previously, on Amazon Elasticsearch Service, these analyzers could only process data as it was indexed.

If your domains runs OpenSearch Service or Amazon Elasticsearch Service 7.8 or later and only uses search analyzers with the updateable field set to true, you don’t need to take any further action. OpenSearch Service automatically updates your indexes using the _plugins/_refresh_search_analyzers API. This allows for refresh of search analyzers in real time without you needing to close and reopen the index.

This feature called hot reload provides the ability to reload dictionary files without reindexing your data. With the new hot reload capability, you can call analyzers at search time, and your dictionary files augment the query. This feature also lets you version your dictionary files in OpenSearch Service and update them on your domains, without having to reindex your data.

Because the domain used in this demonstration runs OpenSearch Service 1.2, you can utilize this hot reload feature and without re-indexing of any data. Simply run a search query for the newly added synonym (“spaghetti”) and get all resultant documents that are synonymous to it:

GET my-index-test/_search
{
  "query": {
    "match": {
      "description": "spaghetti"
    }
  }
}

Conclusion

In this post, we showed how easy it is to set up synonyms in OpenSearch Service so you can find the relevant documents that match a synonym for a word, even when the specific word isn’t used as search term. We also demonstrated how to add and update existing synonym dictionaries and load those files to reflect the changes.

If you have feedback about this post, submit your comments in the comments section. You can also start a new thread on the OpenSearch Service forum or contact AWS Support with questions.

About the Authors

Sonam Chaudhary is a Solutions Architect and Big Data and Analytics Specialist at AWS. She works with customers to build scalable, highly available, cost-effective, and secure solutions in the AWS Cloud. In her free time, she likes traveling with her husband, shopping, and watching movies.

Prashant Agrawal is a Search Specialist Solutions Architect with OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining AWS, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.

Building SAML federation for Amazon OpenSearch Dashboards with Ping Identity

2022-04-19 Raghavarao Sodabathina

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-ping-identity/

Amazon OpenSearch is an open search and log analytics service, powered by the Apache Lucene search library.

In this blog post, we provide step-by-step guidance for SP-initiated SSO by showing how to set up a trial Ping Identity account. We’ll show how to build users and groups within your organization’s directory and enable SSO in OpenSearch Dashboards.

Ping Identity is an AWS Competency Partner, and the provider of the PingOne Cloud Platform is a multi-tenant Identity-as-a-Service (IDaaS) platform. Ping Identity supports both service provider (SP)-initiated and identity provider (IdP)-initiated SSO.

Overview of Ping Identity SAML authenticated solution

Figure 1 shows a sample architecture of a generic integrated solution between Ping Identity and OpenSearch Dashboards over SAML authentication.

Figure 1. SAML transactions between Amazon OpenSearch and Ping Identity

The sign-in flow is as follows:

User opens browser window and navigates to Amazon OpenSearch Dashboards
Amazon OpenSearch generates SAML authentication request
Amazon OpenSearch redirects request back to browser
Browser redirects to Ping Identity URL
Ping Identity parses SAML request, authenticates user, and generates SAML response
Ping Identity returns encoded SAML response to browser
Browser sends SAML response back to Amazon OpenSearch Assertion Consumer Service (ACS) URL
ACS verifies SAML response
User logs into Amazon OpenSearch domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
A virtual private cloud (VPC)-based Amazon OpenSearch domain with fine-grained access control enabled
Ping Identity account with user and a group
A browser with network connectivity to Ping Identity, Amazon OpenSearch domain, and Amazon OpenSearch Dashboards.

The steps in this post are structured into the following sections:

Identity provider (Ping Identity) setup
Prepare Amazon OpenSearch for SAML configuration
Identity provider (Ping Identity) SAML configuration
Finish Amazon OpenSearch for SAML configuration
Validation
Cleanup

Identity provider (Ping Identity) setup

Step 1: Sign up for a Ping Identity account

Sign up for a Ping Identity account, then click on the Sign up button to complete your account setup.
If you already have an account with Ping Identity, login to your Ping Identity account.

Step 2: Create Population in Ping Identity

Choose Identities in the left menu and click Populations to proceed.
Click on the blue + button next to Populations, enter the name as IT, then click on the Save button (see Figure 2).

Figure 2. Creating population in Ping Identity

Step 3: Create a group in Ping Identity

Choose Groups from the left menu and click on the blue + button next to Groups. For this example, we will create a group called opensearch for Kibana access. Click on the Save button to complete the group creation.

Step 4: Create users in Ping Identity

Choose Users in left menu, then click the + Add User button.
Provide GIVEN NAME, FAMILY NAME, EMAIL ADDRESS, and choose Population as users, as created in Step 1. Choose your own USERNAME. Click on the SAVE button to create your user.
Add more users as needed.

Step 5: Assign role and group to users

Click on Identities/users in the left menu, and click on Users. Then click on the edit button for a particular user, as shown in Figure 3.

Figure 3. Assigning roles and groups to users in Ping Identity

Click on the Edit button, click on + Add Role button, and click on the edit button to assign a role to the user.
For this example, choose Environment Admin, as shown in Figure 4. You can choose different roles depending on your use case.

Figure 4. Assigning roles to users in Ping Identity

For this example, assign administrator responsibilities for our users. Click on Show Environments, and drag Administrators into the ADDED RESPONSIBILITES section. Then click on the Add Role button.
Add Group to users. Go to the Groups tab, search for the opensearch group created in Step 3. Click on the + button next to opensearch to add into group memberships.

Prepare Amazon OpenSearch for SAML configuration

Once the Amazon OpenSearch domain is up and running, we can proceed with configuration.

Under Actions, choose Edit security configuration, as shown in Figure 5.

Figure 5. Enabling Amazon OpenSearch security configuration for SAML

Under SAML authentication for OpenSearch Dashboards/Kibana, select Enable SAML authentication check box (Figure 6). When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.

Figure 6. Amazon OpenSearch URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL as highlighted in Figure 6 for Ping Identity SAML configuration. We will complete the rest of the Amazon OpenSearch SAML configuration after the Ping Identity SAML configuration.

Ping Identity SAML configuration

Go back to PingIdentity.com, and navigate to Connections on the left menu. Then select Applications, and click on Application +.

For this example, we are creating an application called “Kibana”
Select WEB APP as APPLICATION TYPE and CHOOSE CONNECTION TYPE as SAML, and click on Configure button to proceed as shown in Figure 7.

Figure 7. Configuring a new application in Ping Identity

On the “Create App Profile” page, click on the Next button, and choose the “Manually Enter” option for PROVIDE APP METADATA. Enter the following under Configure SAML Connection section
- ACS URL https://vpc-XXXXX-XXXXX-west-2.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
- Choose Sign Assertion & Response under SIGNING KEY
- ENTITY ID: https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com (Service provider entity ID)
- ASSERTION VALIDITY DURATION (IN SECONDS) as 3600
- Choose default options, then click on the Save and Continue button as shown in Figure 8

Figure 8. Configuring SAML connection in Ping Identity

Enter the following under Configure Attribute Mapping, then click on Save and Close.
- Set User ID to default
- Click on +ADD ATTRIBUTE button to add following SAML attributes
  - OUTGOING VALUE: Group Names, SAML ATTRIBUTE: saml_group
  - OUTGOING VALUE: Username, SAML ATTRIBUTE: saml_username
Select the Policies tab and click on edit icon on the right.
Add the Single_Factor policy to the application, then click on Save.
Select the Access tab, add the opensearch group to the application, then click on Save to complete SAML configuration.
Finally, go to the Configuration tab, click on the Download Metadata button to download the Ping Identity metadata for the Amazon OpenSearch SAML configuration. Enable opensearch SAML application (Figure 9).

Figure 9. Downloading metadata in Ping Identity

Amazon OpenSearch SAML configuration

Switch back to Amazon OpenSearch domain:
- Navigate to the Amazon OpenSearch console.
- Click on Actions, then click on Modify Security configuration.
- Select the Enable SAML authentication check box.

Under Import IdP metadata section:
- Metadata from IdP: Import the Ping Identity identity provider metadata from the downloaded XML file, shown in Figure 10.
- SAML master backend role: opensearch (Ping Identity group). Provide SAML backend role/group SAML assertion key for group SSO into Kibana.

Figure 10. Configuring Amazon OpenSearch SAML parameters

Under Optional SAML settings:
- Leave the Subject Key as saml_subject from Ping Identity SAML application attribute name.
- Role key should be saml_group. You can view a sample assertion during the configuration process by tools like SAML-tracer. This can help you examine and troubleshoot the contents of real assertions.
- Session time to live (mins): 60
Click on the Submit button to complete Amazon OpenSearch SAML configuration for Kibana. We have successfully completed SAML configuration and are now ready for testing.

Validating Access with Ping Identity Users

The OpenSearch Dashboards URL can be found in the Overview tab within “General Information” in the Amazon OpenSearch console (Figure 11). The first access to the OpenSearch Dashboards URL redirects you to the Ping Identity login screen.

Figure 11. Validating Ping Identity users access with Amazon OpenSearch

If your OpenSearch domain is hosted within a private VPC, you will not be able to access OpenSearch Dashboards over public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch cluster and your identity provider.
You can create a Mac or Windows EC2 instance within the same VPC and access Amazon OpenSearch Dashboards from an EC2 instance’s web browser to validate your SAML configuration. Or you can access your Amazon OpenSearch Dashboards through Site-to-Site VPN if you are trying to access it from your on-premises environment.
Now copy and paste the OpenSearch Dashboards URL in your browser, and enter user credentials.
After successful login, you will be redirected into the OpenSearch Dashboards home page. Explore our sample data and visualizations in OpenSearch Dashboards, as shown in Figure 12.

Figure 12. SAML authenticated Amazon OpenSearch Dashboards

You have successfully federated Amazon OpenSearch Dashboards with Ping Identity as an identity provider. You can connect OpenSearch Dashboards by using your Ping Identity credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created to avoid incurring future charges. Refer to these links:

Deleting your Amazon OpenSearch domain
Reach out to Ping Identity to delete your account (If needed)

Conclusion

In this blog post, we have demonstrated how to set up Ping Identity as an identity provider over SAML authentication for Amazon OpenSearch Dashboards access. With this solution, you now have an OpenSearch Dashboard that uses Ping Identity as the custom identity provider for your users. This reduces the customer login process to one set of credentials and improves employee productivity.

Get started by checking the Amazon OpenSearch Developer Guide, which provides guidance on how to build applications using Amazon OpenSearch for your operational analytics.

Building SAML federation for Amazon OpenSearch Service with Okta

2022-04-15 Raghavarao Sodabathina

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-okta/

Amazon OpenSearch Service is a fully managed open search and analytics service powered by the Apache Lucene search library. Security Assertion Markup Language (SAML)-based federation for OpenSearch Dashboards lets you use your existing identity provider (IdP) like Okta to provide single sign-on (SSO) for OpenSearch Dashboards on OpenSearch Service domains.

This post shows step-by-step guidance to enable SP-initiated single sign-on (SSO) into OpenSearch Dashboards using Okta.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or the internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in to OpenSearch Dashboards. SAML authentication for OpenSearch Dashboards is only for accessing OpenSearch Dashboards through a web browser.

Overview of Okta SAML authenticated solution

Figure 1 depicts a sample architecture of a generic, integrated solution between Okta and OpenSearch Dashboards over SAML authentication.

Figure 1. SAML transactions between Amazon OpenSearch Service and Okta

The initial sign-in flow is as follows:

User opens browser window and navigates to OpenSearch Dashboards
OpenSearch Service generates SAML authentication request
OpenSearch Service redirects request back to browser
Browser redirects to Okta URL
Okta parses SAML request, authenticates user, and generates SAML response
Okta returns encoded SAML response to browser
Browser sends SAML response back to OpenSearch Service Assertion Consumer Services (ACS) URL
ACS verifies SAML response
User logs into OpenSearch Service domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
A virtual private cloud (VPC)-based OpenSearch Service domain with fine-grained access control enabled
Okta account with user and a group
A browser with network connectivity to Okta, OpenSearch Service domain, and OpenSearch Dashboards.

The steps in this post are structured into the following sections:

Identity provider (Okta) setup
Prepare OpenSearch Service for SAML configuration
Identity provider (Okta) SAML configuration
Finish OpenSearch Service for SAML configuration
Validation
Cleanup

Identity provider (Okta) setup

Step 1: Sign up for an Okta account

Sign up for an Okta account, then click on the Sign up button to complete your account setup.
If you already have an account with Okta, login to your Okta account.

Step 2: Create Groups in Okta

Choose Directory in the left menu and click Groups to proceed.
Click on Add Group and enter name as opensearch. Then click on the Save button, see Figure 2.

Figure 2. Creating a group in Okta

Step 3: Create users in Okta

Choose People in left menu under Directory section and click the +Add Person button.
Provide First name, Last name, username (email ID), and primary email. Then select set by admin from the Password dropdown, and choose first time password. Click on the Save button to create your user.
Add more users as needed.

Step 4: Assign Groups to users

Choose Groups from the left menu, then click on the opensearch group created in Step 2. Click on the Assign People button to add users to the opensearch group. Next, either click on individual user under Person & Username, or use the Add All button to add all existing users to the opensearch group. Click on the Save button to complete adding users to your group.

Prepare OpenSearch Service for SAML configuration

Once OpenSearch Service domain is up and running, we can proceed with configuration.

Navigate to the OpenSearch Service console
Under Actions, choose Edit security configuration as shown in Figure 3

Figure 3. Enabling Amazon OpenSearch Service security configuration for SAML

Under SAML authentication for OpenSearch Dashboards/Kibana, select the Enable SAML authentication check box, see Figure 4. When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.

Figure 4. Amazon OpenSearch Service URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL (highlighted in Figure 4) for Okta SAML configuration. The OpenSearch Dashboards login flow can take one of two forms:

Service provider (SP) initiated: You navigate to your OpenSearch Dashboard (for example, https://my-domain.us-east-1.es.amazonaws.com/_dashboards), which redirects you to the login screen. After you log in, the identity provider redirects you to OpenSearch Dashboards.
Identity provider (IdP) initiated: You navigate to your identity provider, log in, and choose OpenSearch Dashboards from an application directory.

We will complete the rest of the OpenSearch Service SAML configuration after the Okta SAML configuration.

Okta SAML configuration

Go back to Okta.com, and choose Applications from the left menu. Click on Applications, then click on Create App Integration and choose SAML 2.0. Click on the Next button to proceed, as shown in Figure 5.
For this example, we are creating an application called “OpenSearch Dashboard”.
Select Platform as Web, and select Sign on method as SAML 2.0. Click on the Create button to proceed.

Figure 5. Creating a SAML app integration in Okta

Enter the App name as OpenSearch, use default options, and click on the Next button to proceed.
Enter the following under the SAML Settings section, as shown in Figure 6. Click on the Next button to proceed.
- Single Sign on URL = https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
- Audience URI(SP Entity ID) = https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com (Service Provider entity ID)
- Default RelayState = leave it blank
- Name ID format = Select EmailAddress from drop down
- Application username = Select Okta username from dropdown
- Update application username on = leave it set to default
Enter the following under Attribute Statements (optional) section.
- Name = http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress
- Name format = Select URI Reference from dropdown
- Value = user.email
Enter the following under the Group Attribute Statements (optional) section.
- Name = http://schemas.xmlsoap.org/claims/Group
- Name format = Select URI Reference from dropdown
- Filter = Select Matches regex from dropdown and enter value as .*open.* to match the group created in previous steps for OpenSearch Dashboards access.

Figure 6. SAML configuration in Okta

Select I’m a software vendor. I’d like to integrate my app with Okta under the Help Okta Support understand how you configured this application section.
Click on the Finish button to complete the Okta SAML application configuration.
Choose Sign on menu. Right click on the Identity Provider metadata hyperlink to download the Okta identity provider metadata as okta.xml. You will use this for the SAML configuration in OpenSearch Service, see Figure 7.

Figure 7. Downloading Okta identity provider metadata for SAML configuration

Choose the Assignments menu and click on Assign-> Assign to Groups
Select the opensearch group, click on Assign, and click on the Done button to complete the Group assignment, as shown in Figure 8.

Figure 8. Assigning groups to the app in Okta

Switch back to the OpenSearch Service domain
Under the Import IdP metadata section:
- Metadata from IdP: Import the Okta identity provider metadata from the downloaded XML file
- SAML master backend role: opensearch (Okta group). Provide the SAML backend role/group SAML assertion key for group SSO into OpenSearch Dashboard.
Under Optional SAML settings:
- Leave Subject Key blank
- Role key should be http://schemas.xmlsoap.org/claims/Group. You can view a sample assertion during the configuration process with tools like SAML-tracer. This can help you examine and troubleshoot the contents of real assertions.
- Session time to live (mins): 60
Click on the Save changes button (Figure 9) to complete OpenSearch Service SAML configuration for OpenSearch Dashboards. We have successfully completed SAML configuration, and now we are ready for testing.

Figure 9. Configuring Amazon OpenSearch Service SAML parameters

Validating access with Okta users

Access the OpenSearch Dashboards endpoint from the previously created OpenSearch Service cluster. The OpenSearch Dashboards URL can be found in General information within “My Domains” of the OpenSearch Service console, as shown in Figure 10. The first access to OpenSearch Dashboards URL redirects you to the Okta login screen.

Figure 10. Validating Okta user access with Amazon OpenSearch Service

Now copy and paste the OpenSearch Dashboards URL in your browser, and enter the user credentials.
If your OpenSearch Service domain is hosted within a private VPC, you will not be able to access your OpenSearch Dashboard over public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch Service cluster and your identity provider.
You can create a Mac or Windows EC2 instance within the same VPC so that you can access Amazon OpenSearch Dashboard from EC2 instance’s web browser to validate your SAML configuration. Or you can access your OpenSearch Dashboard through Site-to-Site VPN from your on-premises environment.
After successful login, you will be redirected into the OpenSearch Dashboards home page. Here, you can explore our sample data and visualizations in OpenSearch Dashboards (Figure 11).

Figure 11. SAML authenticated OpenSearch Dashboards

Now, you have successfully federated OpenSearch Dashboards with Okta as an identity provider. You can connect OpenSearch Dashboards by using your Okta credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created, to avoid incurring future charges. Refer to these links:

Deleting your OpenSearch Service domain
Deleting your Okta account (if needed)

Conclusion

In this blog post, we have demonstrated how to set up Okta as an identity provider over SAML authentication for OpenSearch Dashboards access. Get started by checking the Amazon OpenSearch Service Developer Guide, which provides guidance on how to build applications using OpenSearch Service.

Building SAML federation for Amazon OpenSearch Dashboards with Auth0

2022-04-11 Raghavarao Sodabathina

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-auth0/

Amazon OpenSearch is a fully managed, distributed, open search, and analytics service that is powered by the Apache Lucene search library. OpenSearch is derived from Elasticsearch 7.10.2, and is used for real-time application monitoring, log analytics, and website search. It’s ideal for use cases that require fast access and response for large volumes of data. OpenSearch Dashboards is derived from Kibana 7.10.2, and used for visual data exploration. With Security Assertion Markup Language (SAML)-based federation for OpenSearch, Dashboards lets you use your existing identity provider (IdP) like Auth0. You can use Auth0 to provide single sign-on (SSO) for OpenSearch Dashboards on Amazon OpenSearch search domains. It also gives you fine-grained access control, and the ability to search your data and build visualizations. Amazon OpenSearch supports providers that use the SAML 2.0 standard, such as Auth0, Okta, Keycloak, Active Directory Federation Services (AD FS), and Ping Identity (PingID).

In this post, we provide step-by-step guidance to show you how to set up a trial Auth0 account. We’ll demonstrate how to build users and groups within your organization’s directory, and enable SP-initiated single sign-on (SSO) into OpenSearch Dashboards.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or an internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in to the OpenSearch Dashboards. SAML authentication for OpenSearch Dashboards is only for accessing the OpenSearch Dashboards through a web browser. Your SAML credentials do not let you make direct HTTP requests to OpenSearch or OpenSearch Dashboards APIs.

Auth0 is an AWS Competency Partner and popular Identity-as-a-Service (IDaaS) solution. It supports both service provider (SP)-initiated and identity provider (IdP)-initiated SSO. For SP-initiated SSO, when you sign into the OpenSearch Dashboards login page it sends an authorization request to Auth0. Once it authenticates your identity, you are redirected to OpenSearch Dashboards. In IdP-initiated SSO, you log in to the Auth0 SSO page, and choose OpenSearch Dashboards to open the application.

Overview of AuthO SAML authenticated solution

Figure 1 depicts a sample architecture of a generic, integrated solution between Auth0 and OpenSearch Dashboards over SAML authentication.

High level flow of SAML transactions between Amazon OpenSearch and Auth0

Figure 1. A high-level view of a SAML transaction between Amazon OpenSearch and Auth0

The sign-in flow is as follows:

User opens browser window and navigates to Amazon OpenSearch Dashboards
Amazon OpenSearch generates SAML authentication request
Amazon OpenSearch redirects request back to browser
Browser redirects to Auth0 URL
Auth0 parses SAML request, authenticates user, and generates SAML response
Auth0 returns encoded SAML response to browser
Browser sends SAML response back to Amazon OpenSearch Assertion Consumer Service (ACS) URL
ACS verifies SAML response
User logs into Amazon OpenSearch domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
A virtual private cloud (VPC) based Amazon OpenSearch domain with fine-grained access control enabled
An Auth0 account with user and a group
A browser with network connectivity to Auth0, Amazon OpenSearch domain, and Amazon OpenSearch Dashboards.

The steps in this post are structured into the following sections:

Identity provider (Auth0) setup
Prepare Amazon OpenSearch for SAML configuration
Identity provider (Auth0) SAML configuration
Finish Amazon OpenSearch for SAML configuration
Validation
Cleanup

Identity provider (Auth0) setup

Step 1: Sign up for an Auth0 account

Sign up for an Auth0 account, then click on the Sign up button to complete your account setup.
If you already have an account with Auth0, log in to your Auth0 account.

Step 2: Create Groups in Auth0

Choose User Management in the left menu and click Users, then click on the +Create User button.
Provide an email, password, and connection to your users. Click on the Create button to create your user.
Add more users to your Auth0 account.

Step 3: Install Auth0 Extension to create a group and assign users to the group

Click on Extensions in the left menu and search for “Auth0 Authorization”. Click on Auth0 Authorization to install the extension, shown in Figure 2.

The diagram depicts the Installing of Auth0 Authorization extension

Figure 2. Installing Auth0 Authorization extension

Use all default options and click on the Install button to install the extension.
Click on the Auth0 Authorization extension and choose the Accept button to provide access to your Auth0 account.
The Auth0 Authorization extension must be configured. Click on Go to Configuration (Figure 3).

The diagram depicts the configuration of Auth0 Authorization extension

Figure 3. Configuring the Auth0 Authorization extension

Rotate your API keys and check Groups, Roles, and Permissions to provide authorization to the extension and then click on PUBLISH RULE to complete the configuration, see Figure 4.

The diagram depicts the providing permissions to Auth0 Authorization extension

Figure 4. Providing the permissions to Auth0 Authorization extension

Step 4: Create a group in Auth0

Choose Groups from the left menu and click on the Create your first Group button. For this example, we will create a group called opensearch for OpenSearch Dashboards access.
Add your users to opensearch by clicking on ADD MEMBERS BUTTON, then click on the CONFIRM button to complete your group assignment (Figure 5).

The diagram depicts the adding users to Auth0 Group

Figure 5. Adding users to Auth0 Group

Step 5: Create an Auth0 Application

Choose Applications from the left menu. Click on the +Create Application button.
For this example, we are creating an application called “opensearch”.
Select Single Page Web Applications, then click on the CREATE button to proceed.
Click on the Addons tab on the application Kibana (Figure 6).

The diagram depicts the creation of Auth0 SAML application

Figure 6. Creating an Auth0 SAML application

Click on the SAML2 WEB APP, then select settings to provide SAML URLs from Amazon OpenSearch. We will configure these details after preparing the Amazon OpenSearch cluster for SAML.

Prepare Amazon OpenSearch for SAML configuration

Once the Amazon OpenSearch domain is up and running, we can proceed with configuration.

Under Actions, choose Edit security configuration (Figure 7).

The diagram depicts the enablement of OpenSearch security configuration for SAML

Figure 7. Enabling Amazon OpenSearch security configuration for SAML

Under SAML authentication for OpenSearch Dashboards/Kibana, select the Enable SAML authentication check box (Figure 8). When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.

The diagram depicts the Amazon OpenSearch URLs for SAML configuration

Figure 8. Amazon OpenSearch URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL (highlighted in Figure 8) for Auth0 SAML configuration. We will complete the rest of the Amazon OpenSearch SAML configuration after the Auth0 SAML configuration.

Auth0 SAML configuration

Go back to Auth0.com, and navigate to Applications from the left menu. Then select the opensearch application that you created as a part of the Auth0 setup.

Click on the Addons tab on the application opensearch.
Click on the SAML2 WEB APP, then select Settings to provide SAML URLs from Amazon OpenSearch, as shown in Figure 9:
- Application Callback URL = https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
- audience”: “https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com” (Service provider entity ID)
- destination”: “ https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_plugin/kibana/_opendistro/_security/saml/acs” (SP-initiated SSO URL)
- Mappings and other configurations shown in Figure 9

{
"audience": "https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com",
"destination": "https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_plugin/kibana/_opendistro/_security/saml/acs",
"mappings":
{
    "email":
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress",
    "name": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name",
    "groups": "http://schemas.xmlsoap.org/claims/Group"
},
"createUpnClaim": false,
"passthroughClaimsWithNoMapping": false,
"mapUnknownClaimsAsIs": false,
"mapIdentities": false,
"nameIdentifierFormat":
"urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress", "nameIdentifierProbes": [
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" ]
}

The diagram depicts the configuration of Auth0 SAML parameters

Figure 9. Configuring Auth0 SAML parameters

Click on Enable to save the SAML configurations.
Go to the Usage tab, and click on the Download button to download Identity Provider Metadata, see Figure 10.

The diagram depicts the downloading of Auth0 identity provider meta data for SAML configuration

Figure 10. Downloading Auth0 identity provider metadata for SAML configuration

Amazon OpenSearch SAML configuration

Switch back to Amazon OpenSearch domain:
- Navigate to Amazon OpenSearch console
- Click on Actions, then click on Modify Security configuration
- Select Enable SAML authentication check box
Under Import IdP metadata section (Figure 11):
- Metadata from IdP: Import the Auth0 identity provider metadata from downloaded XML file
- SAML master backend role: opensearch (Auth0 group). Provide a SAML backend role/group SAML assertion key for group SSO into Kibana

The diagram depicts the configuration of Amazon OpenSearch SAML parameters

Figure 11. Configuring Amazon OpenSearch SAML parameters

Under Optional SAML settings (Figure 12):
- Leave Subject Key as blank, as Auth0 provides NameIdentifier
- Role key should be http://schemas.xmlsoap.org/claims/Group. Auth0 lets you view a sample assertion during the configuration process by clicking on the DEBUG button on SAML2 WebApp. Tools like SAML-tracer can help you examine and troubleshoot the contents of real assertions.
- Session time to live (mins): 60

The diagram depicts the configuration of Amazon OpenSearch optional SAML parameters

Figure 12. Configuring Amazon OpenSearch optional SAML parameters

Click on the Save changes button to complete Amazon OpenSearch SAML configuration for Kibana. We have successfully completed SAML configuration and are now ready for testing.

Validating access with Auth0 users

Access OpenSearch Dashboards from the previously created OpenSearch cluster. The OpenSearch Dashboards URL can be found as shown in Figure 13. The first access to the OpenSearch Dashboards URL redirects you to the Auth0 login screen.

The diagram depicts the validation of Auth0 users access with Amazon OpenSearch

Figure 13. Validating Auth0 users access with Amazon OpenSearch

Now copy and paste the OpenSearch Dashboards URL in your browser, and enter the user credentials.
If your OpenSearch domain is hosted within a private VPC, you will not be able to access your OpenSearch Dashboard over the public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch cluster and your identity provider.
You can create a Mac or Windows EC2 instance within the same VPC. This way you can access Amazon OpenSearch Dashboards from your EC2 instance’s web browser to validate your SAML configuration. You can also access Amazon OpenSearch Dashboards through Site-to-Site VPN from an on-premises environment.
After successful login, you will be redirected into the OpenSearch Dashboards home page. Explore our sample data and visualizations in OpenSearch Dashboards, as shown in Figure 14.

Figure 14. SAML authenticated Amazon OpenSearch Dashboards

You now have successfully federated Amazon OpenSearch Dashboards with Auth0 as an identity provider. You can connect OpenSearch Dashboards by using your Auth0 credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created to avoid incurring future charges. Refer to these links:

Deleting your Amazon OpenSearch domain
Deleting your Auth0 account (if needed)

Conclusion

In this blog post, we have demonstrated how to set up Auth0 as an identity provider over SAML authentication for Amazon OpenSearch Dashboards access. With this solution, you now have an OpenSearch Dashboard that uses Auth0 as the custom identity provider for your users. This reduces the customer login process to one set of credentials and improves employee productivity.

Get started by checking the Amazon OpenSearch Developer Guide, which provides guidance on how to build applications using Amazon OpenSearch for your operational analytics.

Understanding the JVMMemoryPressure metric changes in Amazon OpenSearch Service

2022-04-04 Liz Snyder

Post Syndicated from Liz Snyder original https://aws.amazon.com/blogs/big-data/understanding-the-jvmmemorypressure-metric-changes-in-amazon-opensearch-service/

Amazon OpenSearch Service is a managed service that makes it easy to secure, deploy, and operate OpenSearch and legacy Elasticsearch clusters at scale.

In the latest service software release of Amazon OpenSearch Service, we’ve changed the behavior of the JVMMemoryPressure metric. This metric now reports the overall heap usage, including young and old pools, for all domains that use the G1GC garbage collector. If you’re using Graviton-based data nodes (C6, R6, and M6 instances), or if you enabled Auto-Tune and it has switched your garbage collection algorithm to G1GC, this change will improve your ability to detect and respond to problems with OpenSearch’s Java heap.

Basics of Java garbage collection

Objects in Java are allocated in a heap memory, occupying half of the instance’s RAM up to approximately 32 GB. As your application runs, it creates and destroys objects in the heap, leaving the heap fragmented and making it harder to allocate new objects. Java’s garbage collection algorithm periodically goes through the heap and reclaims the memory of any unused objects. It also compacts the heap when necessary to provide more contiguous free space.

The heap is allocated into smaller memory pools:

Young generation – The young generation memory pool is where new objects are allocated. The young generation is further divided into an Eden space, where all new objects start, and two survivor spaces (S0 and S1), where objects are moved from Eden after surviving one garbage collection cycle. When the young generation fills up, Java performs a minor garbage collection to clean up unmarked objects. Objects that remain in the young generation age until they eventually move to the old generation.

Old generation – The old generation memory pool stores long-lived objects. When objects reach a certain age after multiple garbage collection iterations in the young generation, they are then moved to the old generation.

Permanent generation – The permanent generation contains metadata required by the JVM to describe the classes and methods used in the application at runtime. It is not populated when the old generation’s objects reach a certain age.

Java processes can employ different garbage collection algorithms, selected by command-line option.

Concurrent Mark Sweep (CMS) – The different pools are segregated in memory. Stop-the-world pauses, and heap compaction are regular occurrences. The young generation pool is small. All non-Graviton data nodes use CMS.
G1 Garbage Collection (G1GC) – All heap memory is a single block, with different areas of memory (regions) allocated to the different pools. The pools are interleaved in physical memory. Stop-the-world pauses and heap compaction are infrequent. The young generation pool is larger. All Graviton data nodes use G1GC. Amazon OpenSearch Service’s Auto-Tune feature can choose G1GC for non-Graviton data nodes.

You can use the CloudWatch console to retrieve statistics about those data points as an ordered set of time-series data, known as metrics. Amazon OpenSearch Service currently publishes three metrics related to JVM memory pressure to CloudWatch:

JVMMemoryPressure – The maximum percentage of the Java heap used for all data nodes in the cluster.
MasterJVMMemoryPressure – The maximum percentage of the Java heap used for all dedicated master nodes in the cluster.
WarmJVMMemoryPressure – The maximum percentage of the Java heap used for UltraWarm nodes in the cluster.

In the latest service software update, Amazon OpenSearch Service improved the logic that it uses to compute these metrics in order to more accurately reflect actual memory utilization.

The problem

Previously, all data nodes used CMS, where the young pool was a small portion of memory. The JVM memory pressure metrics that Amazon OpenSearch Service published to CloudWatch only considered the old pool of the Java heap. You could detect problems in the heap usage by looking only at old generation usage.

When the domain uses G1GC, the young pool is larger, representing a larger percentage of the total heap. Since objects are created first in the young pool, and then moved to the old pool, a significant portion of the usage could be in the young pool. However, the prior metric reported only on the old pool. This leaves domains vulnerable to invisibly running out of memory in the young pool.

What’s changing?

In the latest service software update, Amazon OpenSearch Service changed the logic for the three JVM memory pressure metrics that it sends to CloudWatch to account for the total Java heap in use (old generation and young generation). The goal of this update is to provide a more accurate representation of total memory utilization across your Amazon Opensearch Service domains, especially for Graviton instance types, whose garbage collection logic makes it important to consider all memory pools to calculate actual utilization.

What you can expect

After you update your Amazon OpenSearch Service domains to the latest service software release, the following metrics that Amazon OpenSearch Service sends to CloudWatch will begin to report JVM memory usage for the old and young generation memory pools, rather than just old: JVMemoryPressure, MasterJVMMemoryPressure, and WarmJVMMemoryPressure.

You might see an increase in the values of these metrics, predominantly in G1GC configured domains. In some cases, you might notice a different memory usage pattern altogether, because the young generation memory pool has more frequent garbage collection. Any CloudWatch alarms that you have created around these metrics might be triggered. If this keeps happening, consider scaling your instances vertically up to 64 GiB of RAM, at which point you can scale horizontally by adding instances.

As a standard practice, for domains that have low available memory, Amazon OpenSearch Service blocks further write operations to prevent the domain from reaching red status. You should monitor your memory utilization after the update to get a sense of the actual utilization on your domain. The _nodes/stats/jvm API offers a useful summary of JVM statistics, memory pool usage, and garbage collection information.

Conclusion

Amazon OpenSearch Service recently improved the logic that it uses to calculate JVM memory usage to more accurately reflect actual utilization. The JVMMemoryPressure, MasterJVMMemoryPressure, and WarmJVMMemoryPressure CloudWatch metrics now account for both old and young generation memory pools when calculating memory usage, rather than just old generation. For more information about these metrics, see Monitoring OpenSearch cluster metrics with Amazon CloudWatch.

With the updated metrics, your domains will start to more accurately reflect memory utilization numbers, and might breach CloudWatch alarms that you previously configured. Make sure to monitor your alarms for these metrics and scale your clusters accordingly to maintain optimal memory utilization.

Stay tuned for more exciting updates and new features in Amazon OpenSearch Service.

About the Authors

Liz Snyder is a San Francisco-based technical writer for Amazon OpenSearch Service, OpenSearch OSS, and Amazon CloudSearch.

Jon Handler is a Senior Principal Solutions Architect, specializing in AWS search technologies – Amazon CloudSearch, and Amazon OpenSearch Service. Based in Palo Alto, he helps a broad range of customers get their search and log analytics workloads deployed right and functioning well.