All posts by Siddhant Gupta

Track Amazon OpenSearch Service configuration changes more easily with new visibility improvements

Post Syndicated from Siddhant Gupta original https://aws.amazon.com/blogs/big-data/track-amazon-opensearch-service-configuration-with-improved-visibility/

Amazon OpenSearch Service offers multiple domain configuration settings to meet your workload-specific requirements. As part of standard service operations, you may be required to update these configuration settings on a regular basis. Recently, Amazon OpenSearch Service launched visibility improvements that allow you to track configuration changes more effectively. We’ve introduced granular and more descriptive configuration statuses that enable you to set up alarms and use them in automation to minimize manual monitoring.

We recommend that you take advantage of these visibility improvements in your applications. These changes are backward compatible, and if your automations rely on the legacy processing parameter to determine configuration change status, then they should still continue to work without any disruption. To simplify tracking of multiple in-flight configuration change requests, Amazon OpenSearch Service allows configuration request only when Domain Processing Status is Active. Additional details are in section ‘Single configuration change at a time’.

Solution overview

Earlier, configuration change status visibility was available through processing parameters in the OpenSearch Service APIs (Application Programming Interface), and as a Domain Status field in the OpenSearch Service console. We have now introduced the following changes to improve the configuration update experience:

  • Introduced two new parameters, DomainProcessingStatus and ConfigChangeStatus, in the API responses. Similarly, added Domain Processing Status and Configuration Change Status fields in the console. These changes provide better visibility through multiple, intuitive statuses. Earlier statuses were limited to only two values: Active and Processing.
  • Ability to easily compare active and in-flight configurations for clarity. Earlier, it required multiple steps.
  • Amazon OpenSearch Service has now adopted the approach of allowing a single configuration change request at a time. There is no limit on the number of domain configuration changes you can bundle in a single request. However, you can submit the next configuration request when the previous request is complete and the domain processing status becomes Active. This improvement streamlines configuration updates and addresses previous challenges of tracking multiple, in-flight configuration change requests.
  • Ability to cancel a change request in case of a validation failure. Previously, when instances were unavailable, domains remained in processing state. Now, upon encountering any validation failure, you can cancel the change request and retry after some time.
  • Domain processing status turns to Active only after all the background activities, including shard movement is complete. This means that you can confidently use newly introduced statuses in your automation scripts without needing to infer if all the internal processes, such as data movement, are complete.

How do you get granular details to track the configuration update status?

As part of recent improvements, Amazon OpenSearch Service introduced DomainProcessingStatus and ConfigChangeStatus parameters in the APIs along with the respective Domain Processing Status and Configuration Change Status fields in the console. You can rely on these statuses to get accurate and consistent information during different configuration change scenarios, like when configuration changes involve blue/green operations or without blue/green operations, and when configuration changes are triggered by the operator or by the OpenSearch Service. Let us explore these enhanced visibility experiences.

  1. Domain processing status visibility: You can track the staus of domain-level configuration changes through the Domain Processing Status field in the console. Similarly, API responses include the DomainProcessingStatus parameter. The values and a brief description are provided in the following details:
    1. Active: No configuration change is in progress. You can submit a new configuration change request.
    2. Creating: New domain creation is in progress.
    3. Modifying: This status indicates that one or more configuration changes, such as new data node addition, Amazon Elastic Block Store (Amazon EBS) GP3 storage provisioning, or setting up KMS keys, are in progress. In other words, changes made through the UpdateDomainConfig API, set the status to modifying. The ‘Modifying’ status also covers situations where domains require shard movement to complete configuration changes. Note: For backward compatibility, we have kept the behavior of the processing parameter unchanged in the API responses, and it is set to false as soon as the core configuration changes are complete, without waiting for shard movement completion.
    4. Upgrading Engine Version: Engine version upgrades are in progress, such as from Elasticsearch version 7.9 to OpenSearch version 1.0.
    5. Updating Service Software: This status refers to configuration changes related to service software updates.
    6. Deleting: Domain deletion is progressing.
    7. Isolated: This represents domains that are suspended due to a variety of reasons, such as account-related billing issues or domains that are not compliant with critical security patch updates.
  2. Configuration change status visibility: Configuration changes can be initiated by the user (e.g., new data node addition, instance type change) or by the Service (e.g., AutoTune and mandatory service software updates). You can find the latest status details through Configuration Change Status field in the console, and through the ConfigChangeStatus parameter in API responses. Below are the values and a brief description:
    1. Pending: Indicates that a configuration change request has been submitted.
    2. Initializing: Service is initializing a configuration change request.
    3. Validating: Service is validating the requested changes and resources required.
    4. Validation Failed: Requested changes failed validation. At this point, no configuration changes are applied. Some possible validation failures could be the presence of red indices in the domain, unavailability of a chosen instance type, and low disk space. Here is a list of potential validation failures. During a validation failure event, you can cancel, retry, or edit configuration changes.
    5. Awaiting user inputs: Scenarios where user may be able to fix validation errors such as invalid KMS key. At this status, user can edit the configuration changes.
    6. Applying changes: Service is applying requested configuration changes.
    7. Cancelled: During validation failed status, you can either click on the Cancel button in the console or call the CancelDomainConfigChange API. All the applied changes that were part of the change request will be rolled back.
    8. Completed: Requested configuration changes have been successfully completed.

Console enhancements

The Amazon OpenSearch Service console offers enhanced visibility to track configuration change progress. Below are a few screenshots to give you an idea of these improvements.

  • Amazon OpenSearch Service console provides Domain Processing Status, Configuration Change Status, and Change ID fields. Note: To know the change details associated with the Change ID, you can use the DescribeDomainChangeProgress API.

  • Configuration change summary. To see a side by side comparison of your active configurations and requested changes, on the domain detail page, navigate to the cluster configuration tab, scroll down to the configuration change summary section. Pending Changes field shows the status of the pending properties at that time and does not include changes that have been applied. You can also get similar details from the DescribeDomain and DescribeDomainConfig APIs through theModifyingProperties parameter.

Cancelling during validation failure. In the below screenshots, you can see a new option to cancel a change request when a configuration change request fails validations. For example, when you encounter SubnetNotFound error, you can use the Cancel request button to roll back to the previous active configuration, fix the issue and then retry the configuration update.

Single configuration change at a time

Previously, it was not straightforward to track the success and failure of individual change requests, when several requests were made. To provide a simplified experience, OpenSearch Service now limits you to only a single change request at a time. In a single configuration change request, you can bundle multiple changes at once. Once a configuration change request is submitted, it must be completed before you can request the next configuration change through the console, or through the UpdateDomainConfig API. This simplified experience makes it easier to keep track of changes that have been requested and their most recent status. If your automation is written to call configuration change update APIs multiple times, then it should be updated to group multiple configuration changes in a single update call, or wait for individual updates to complete before you submit the next configuration change. You can update domain configuration when domain processing status becomes active. For a list of changes that might need a blue/green deployment, please see here.

The below screenshot shows an example alert on the ‘Edit domain’ page informing the user that another change or update is in progress. OpenSearch Service no longer allows you to submit new configuration update requests, and the ‘Apply change’ button is disabled until the change in progress is completed.

API changes

You can use the DescribeDomain, DescribeDomainChangeProgress, and DescribeDomainConfig APIs to get detailed configuration update statuses. In addition, you can use CancelDomainConfigChange to cancel the change request in the event of a validation failures. You can refer Amazon OpenSearch Service API documentation here.

Conclusion

In this post, we showed you how to get granular information about a configuration update request. These newly introduced changes will help you gain better visibility into the progress of configuration change requests, and easily distinguish between applied changes and pending ones. You need to ensure that the DomainProcessingStatus processing status value is Active before submitting configuration change requests. The ability to cancel changes in the event of validation failures gives you better control in getting your domain out of processing state in a self-service manner. Visit product documentation to learn more.


About the Authors

Siddhant Gupta is a Sr. Technical Product Manager at Amazon Web Services based in Hyderabad, India. Siddhant has been with Amazon for over six years and is currently working with the OpenSearch Service team, helping with new region launches, pricing strategy, and bringing EC2 and EBS innovations to OpenSearch Service customers. He is passionate about analytics and machine learning. In his free time, he loves traveling, fitness activities, spending time with his family and reading non-fiction books.

Deniz Ercelebi is a Sr. UX Designer at Amazon OpenSearch Service. In her role she contributes to the creation, implementation, and successful delivery of design solutions for complex problems. Her personal drive is fueled by a passion for user experience, a dedication to customer-centric solutions, and a firm belief in collaborative innovation.

Shashank Gupta is a Sr. Software Developer at Amazon OpenSearch Service, specializing in the enhancement of the Managed service aspect of the platform. His primary focus is on optimizing the managed experience, spanning from the console to APIs and resource provisioning in an efficient manner. With a dedicated commitment to innovation, Shashank aims to elevate the overall customer experience by introducing inventive solutions within the service.

Lower your Amazon OpenSearch Service storage cost with gp3 Amazon EBS volumes

Post Syndicated from Siddhant Gupta original https://aws.amazon.com/blogs/big-data/lower-your-amazon-opensearch-service-storage-cost-with-gp3-amazon-ebs-volumes/

Amazon OpenSearch Service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open-source, distributed search and analytics suite comprising OpenSearch, a distributed search and analytics engine, and OpenSearch Dashboards, a UI and visualization tool. When you use Amazon OpenSearch Service, you configure a set of data nodes to store indexes and serve queries. The service supports instance types for data nodes with different storage options. Some supported Amazon Elastic Compute Cloud (Amazon EC2) instance types, like the R6GD or I3, have local NVMe disks. Others use Amazon Elastic Block Store (Amazon EBS) storage.

On July 2022, OpenSearch Service launched support for the next generation, general purpose SSD (gp3) EBS volumes. OpenSearch Service data nodes require low latency and high throughput storage to provide fast indexing and query. With gp3 EBS volumes, you get higher baseline performance (IOPS and throughput) at a 9.6% lower cost than with the previously offered gp2 EBS volume type. You can provision additional IOPS and throughput independent of volume size using gp3. gp3 volumes are also more stable because they don’t use burst credits. OpenSearch support for gp3 volumes includes doubling the limit on per-data node volume sizes. With these larger volumes, you can reduce the cost of passive data, increasing the amount of storage per node.

We recommend that you consider gp3 as the best Amazon EBS option for price/performance and flexibility. In this post, I discuss the basics of gp3 and various cost-saving use cases. Migrating from previous generation storage (gp2, PIOPS, and magnetic) volumes to the latest generation gp3 volumes allows you to reduce monthly storage costs and optimize instance utilization.

Comparing gp2 and gp3

gp3 is the successor to the general purpose SSD gp2 volume. The key benefits of gp3 include higher baseline performance, 9.6% lower cost, and the ability to provision higher performance regardless of volume. The following table summarizes the key differences between gp2 and gp3.

Volume type gp3 gp2
Volume size Depends on instance type. Max OpenSearch Service supports 24 TiB for R6g.12Xlarge. For the latest instance limits, see Amazon OpenSearch Service quotas. Depends on instance type. Max OpenSearch Service supports 12 TiB for R6g.12Xlarge.
Baseline IOPS 3,000 IOPS for volume size up to 1,024 GiB. For volumes above 1,024 GiB, you get 3 IOPS/GiB, without burst credit complexity. 3 IOPS/GiB (minimum 100 IOPS) to a maximum of 16,000 IOPS. Volumes smaller than 1 TiB can also burst up to 3,000 IOPS.
Max IOPS/volume 16,000 16,000
Baseline throughput 125 MiB/s free for volume size up to 170 GiB, or 250 MiB/s free for volume above 170 GiB. Between 125 MiB/s and 250 MiB/s, depending on the volume size.
Max throughput/volume 1,000 MiB/s 250 MiB/s
Price for us-east-1 Region
  • Storage – $0.122/GB-month.
  • IOPS – 3,000 IOPS free for volumes up to 1,024 GiB, or 3 IOPS/GiB free for volumes above 1,024 GiB. $0.008/provisioned IOPS-month over free limits.
  • Throughput – 125 MiB/s free for volumes up to 170 GiB, or +250 MiB/s free for every 3 TiB for volumes above 170 GiB. $0.064/provisioned MiB/s-month over free limits.
  • Storage – $0.135/GB-month.
  • IOPS and throughput provisioning not allowed.
Instance supported T3, C5, M5, R5, C6g, M6g, and R6g T2, C4, M4, R4, T3, C5, M5, R5, C6g, M6g, and R6g

Lower your monthly bills with gp3

The ability to provision IOPS and throughput independent of volume size and support for denser (twice as large) volume sizes are two significant advantages of gp3 adoption. Together, these benefits enable multiple use cases to lower your monthly bills. In this section, we present a few examples of pricing comparisons for OpenSearch domains.

gp2 vs. gp3

This is the most common scenario, in which existing gp2 customers switch to gp3 and immediately begin saving 9.6% due to the lower monthly price per GB for gp3 storage. You can also benefit from the fact that gp3 supports volume sizes two times larger for the R5, R6g, M5, and M6g instance families. This means that you don’t need to spin up new instances for denser storage requirements and can achieve higher storage on the same instance. OpenSearch Service currently supports a maximum of 24 TiB of gp3 storage on R6g.12Xlarge instances.

PIOPS (io1) vs. gp3

OpenSearch Service supports the PIOPS SSD (io1) EBS volume type. You can switch to gp3 and provision additional IOPS and throughput to meet your specific performance requirements. The following table compares the monthly cost of PIOPS (io1) and gp3 storage with R5.large.search instances for storage requirements of 6 TiB and 16000 IOPS. In this example, you would save 65% with gp3 adoption.

. PIOPS (io1) gp3
Instance cost

6 instances * $0.186/hr = $830/month

(r5.large.search can support up to 1 TiB storage for io1; to support 6 TiB we require six instances.)

3 instances * $0.167Hr = $372/month

(r6g.large.search can support up to 2 TiB storage for gp3; to support 6 TiB we require three instances.)

Storage cost (6 TiB)

6,597 GB * $0.169/GB-month = $1115/month

Notes:
(a) Price for PIOPS(io1) is $0.169 per GB/month.
(b) 6TiB = 6597 GB

6,597 GB * $0.122/GB-month = $805/month

Notes:
(a) Price for gp3 storage is $0.122 per GB/month.
(b) 6TiB = 6597 GB

PIOPS cost (16000 PIOPS)

16000 IOPS * $0.088/IOPS-month = $1408/month

Note: io1 PIOPS rate is $0.088 per IOPS-month.

18,000 IOPS is included in the price for 6 TiB volume of gp3; you don’t need to pay.

Note: 3 IOPS/ GiB Storage IOPS inlcued in price.

Total monthly bills $3,353/month $1,177/month

I3 vs. gp3

I3 instances include Non-Volatile Memory Express (NVMe) SSD-based instance storage optimized for low latency, very high random I/O performance, and high sequential read throughput, and delivers high IOPS. However, I3 uses older third-generation CPUs, and the largest storage supported size is 15 TiB with i3.16xlarge.search instance. You should consider using the largest generation instances such as R6g with gp3 storage to get lower cost and better performance over I3 instances.

To comprehend the cost advantage, let’s compare I3 and gp3 for 12 TiB of data storage needs. By switching to gp3 along with the current generation of instances, you can reduce your monthly bills by 56%, according to the calculations in the following table.

. I3.4xlarge gp3 with R6g.xlarge
On-demand instance cost for us-east-1 Region

4 instances * $1.99/hr = $5,922/month

Note: I3.4xlarge.search supports up to 3.8 TiB, so we require four instances to manage 12 TiB storage. Instance cost is $1.99/hr.

4 instances * $0.335/hr = $996/month

Note: R6g.xlarge.search supports up to 3 TiB with gp3, so we require four instances to manage 12 TiB. Instance cost is $0.335/hr.

Storage cost (12 TiB) N/A (included in instance price)

13,194 GB * $0.122/GB-month = $1,610/month

Notes:
(a) 12 TiB = 13,194 GB
(b) Storage cost is $0.122 per GB / month

Total monthly bills $5,922/month $2,606/month

UltraWarm vs. gp3

UltraWarm is designed to provide inexpensive access to infrequently accessed data, such as logs older than 30 days. Warm storage is useful for indexes that aren’t actively being written to, are queried less frequently, and don’t require high performance. If you have large and query-intensive workloads and are attempting to use UltraWarm to optimize costs but encountering higher query volumes than it can handle, you should consider moving some of the data volume to hot nodes with gp3 storage. UltraWarm will remain the least expensive option for your warm data (less-frequently accessed) type use cases, but you shouldn’t use it for hot data use cases. A combination of low-cost gp3 storage and denser instances can help you achieve cost-optimized higher performance for hot data.

The following table shows the monthly costs associated with running a 30 TiB UltraWarm workload, along with a comparison to the potential monthly costs of gp2 and gp3. With gp3, you can save up to 36% compared to gp2. Please note that UltraWarm setup does require hot data nodes; however, we excluded them in the UltraWarm column to focus on UltraWarm replacement costs with hot data nodes using gp2 and gp3.

. UltraWarm All Hot (gp2 with R6g.8xlarge) All Hot (gp3 with R6g.8xlarge)
Instance cost (On-demand)

2 UW large instances * $2.68/hr = $3,987/month

Note: ultrawarm1.large.search supports max 20 TiB, so we need two instances.

4 instances * $2.677/hr = $7,966/month

Note: r6g.8xlarge.search supports max 8 TiB with gp2, so we require four instances.

2 Instances * $2.677/hr= $3,984/month

Note: r6g.8xlarge.search supports max 16 TiB with gp3, so we only require two instances.

Storage cost (30 TiB)

32,985 GB * $0.024/GB-month = $792/month

Notes:
(1) Storage price is $0.024/per GB/month).
(2) 30 TiB = 32985 GB

32,985 GB * $0.135/GB-month = $4,453/month

Notes:
(1) Storage price is $0.135 per GB/month.
(2) 30 TiB = 32985 GB

32,985 GB * $0.122/GB-month = $4,024/month

Notes:
(1) Storage price is $0.122 per GB/month.
(2) 30 TiB = 32985 GB

Total Monthly Bills $4,779/month $12,419/month $8,008/month

All the preceding use cases are from a cost perspective. Before making any changes to the production environment, we recommend validating performance in a test environment for your unique workload and ensuring that configuration changes don’t result in performance degradation.

Optimize instance cost with gp3’s denser storage

OpenSearch Service increased the maximum volume size supported per instance for gp3 by 100% when compared to gp2 for the R5, R6g, M5, and M6g instance families due to gp3’s improved baseline performance. You can optimize your instance needs by taking advantage of the increased storage per instance volume. For example, R6g.large supports up to 2 TiB with gp3, but only 1 TiB with gp2. If you require support for 12 TiB of data storage, you can reconfigure your domains from six data nodes to three R6g.large in order to reduce your instance costs. For OpenSearch EBS instance-specific volume limits, refer to EBS volume size quotas.

Upgrade from gp2 to gp3

To use the EBS gp3 volume type, you must first upgrade your domain’s instances to supported instance types if they don’t already support gp3. For a list of OpenSearch Service supported instances, see EBS volume size quotas. The transition from gp2 to gp3 is seamless. You can upgrade domain configurations from existing EBS volume types such as gp2, Magnetic, and PIOS (io1) to gp3 through OpenSearch Service console or the UpdatedomainConfig API. The configuration change will initiate blue/green deployment, which runs in the background without impacting your online traffic and, depending on the data size, is complete in a few hours. Blue/green deployments run in the background, ensuring that your online traffic is uninterrupted and preventing data loss.

gp3 baseline performance, and additional provisioning limits

One of the gp3’s key features is the ability to scale IOPS and throughput independent of volume. When your application requires more performance, you can scale up to 16,000 IOPS and 1,000 MiB/s throughput for an additional fee. OpenSearch Service EBS gp3 delivers a baseline performance of 3,000 IOPS and 125 MiB/s throughput at any volume size. In addition, OpenSearch Service provisions additional IOPS and throughput for larger volumes to ensure optimal performance. For volumes above 1,024 GiB, you receive 3 IOPS/GiB, and for volumes above 170 GiB, you get an incremental 250 MiB/s for every 3 TiB of storage.

The following table outlines OpenSearch Service baseline IOPS and throughput, as well as the maximum amount you can provision. Note that your instance type may have additional limitations regarding how much and for how long it can support these performance baselines in a 24-hour period. For more information about instances and their limits, refer to Amazon EBS-optimized instances.

Additional performance customers can provisions

.. Baseline (included in storage price) Additional performance customers can provision
Volume Storage (in GiB) IOPS throughput (MiB/s) IOPS throughput (MiB/s)
170 3,000 125 13,000 875
172 3,000 250 13,000 750
1,024 3,000 250 13,000 750
1,025 3,075 250 12,925 750
3,000 9,000 250 7,000 750
3,001 9,003 500 6,997 500
6,000 18,000 500 NA 500
6,001 18,003 750 NA 250
9,001 27,003 1,000 NA NA
24,000 72,000 2,000 NA NA

Do you need additional performance?

In the majority of use cases, you don’t need to provision additional IOPS and throughput, and gp3 baseline performance should suffice. You can use Amazon CloudWatch metrics to find the usage patterns, and if you observe current limits of IOPS and throughput bottlenecking your index and query performance, you should provision additional performance. For more information, refer to EBS volume metrics.

Conclusion

This post explains how OpenSearch Service general purpose SSD gp3 volumes can significantly reduce monthly storage and instance costs, making them more cost-effective than gp2 volumes. Migration to gp3 volumes with the same size and performance configurations as gp2 is the quickest and simplest way to reduce costs. Additionally, you should also consider reducing instance costs by taking advantage of gp3’s support for denser storage per data node.

For more details, check out Amazon OpenSearch Service pricing and Configuration API reference for Amazon OpenSearch Service.


About the author

Siddhant Gupta is a Sr. Technical Product Manager at Amazon Web Services based in Hyderabad, India. Siddhant has been with Amazon for over five years and is currently working with the OpenSearch Service team, helping with new region launches, pricing strategy, and bringing EC2 and EBS innovations to OpenSearch Service customers . He is passionate about analytics and machine learning. In his free time, he loves traveling, fitness activities, spending time with his family and reading non-fiction books.