All posts by Raghavarao Sodabathina

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/big-data/architectural-patterns-for-real-time-analytics-using-amazon-kinesis-data-streams-part-1/

We’re living in the age of real-time data and insights, driven by low-latency data streaming applications. Today, everyone expects a personalized experience in any application, and organizations are constantly innovating to increase their speed of business operation and decision making. The volume of time-sensitive data produced is increasing rapidly, with different formats of data being introduced across new businesses and customer use cases. Therefore, it is critical for organizations to embrace a low-latency, scalable, and reliable data streaming infrastructure to deliver real-time business applications and better customer experiences.

This is the first post to a blog series that offers common architectural patterns in building real-time data streaming infrastructures using Kinesis Data Streams for a wide range of use cases. It aims to provide a framework to create low-latency streaming applications on the AWS Cloud using Amazon Kinesis Data Streams and AWS purpose-built data analytics services.

In this post, we will review the common architectural patterns of two use cases: Time Series Data Analysis and Event Driven Microservices. In the subsequent post in our series, we will explore the architectural patterns in building streaming pipelines for real-time BI dashboards, contact center agent, ledger data, personalized real-time recommendation, log analytics, IoT data, Change Data Capture, and real-time marketing data. All these architecture patterns are integrated with Amazon Kinesis Data Streams.

Real-time streaming with Kinesis Data Streams

Amazon Kinesis Data Streams is a cloud-native, serverless streaming data service that makes it easy to capture, process, and store real-time data at any scale. With Kinesis Data Streams, you can collect and process hundreds of gigabytes of data per second from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time. The collected data is available in milliseconds to allow real-time analytics use cases, such as real-time dashboards, real-time anomaly detection, and dynamic pricing. By default, the data within the Kinesis Data Stream is stored for 24 hours with an option to increase the data retention to 365 days. If customers want to process the same data in real-time with multiple applications, then they can use the Enhanced Fan-Out (EFO) feature. Prior to this feature, every application consuming data from the stream shared the 2MB/second/shard output. By configuring stream consumers to use enhanced fan-out, each data consumer receives dedicated 2MB/second pipe of read throughput per shard to further reduce the latency in data retrieval.

For high availability and durability, Kinesis Data Streams achieves high durability by synchronously replicating the streamed data across three Availability Zones in an AWS Region and gives you the option to retain data for up to 365 days. For security, Kinesis Data Streams provide server-side encryption so you can meet strict data management requirements by encrypting your data at rest and Amazon Virtual Private Cloud (VPC) interface endpoints to keep traffic between your Amazon VPC and Kinesis Data Streams private.

Kinesis Data Streams has native integrations with other AWS services such as AWS Glue and Amazon EventBridge to build real-time streaming applications on AWS. Refer to Amazon Kinesis Data Streams integrations for additional details.

Modern data streaming architecture with Kinesis Data Streams

A modern streaming data architecture with Kinesis Data Streams can be designed as a stack of five logical layers; each layer is composed of multiple purpose-built components that address specific requirements, as illustrated in the following diagram:

The architecture consists of the following key components:

  • Streaming sources – Your source of streaming data includes data sources like clickstream data, sensors, social media, Internet of Things (IoT) devices, log files generated by using your web and mobile applications, and mobile devices that generate semi-structured and unstructured data as continuous streams at high velocity.
  • Stream ingestion – The stream ingestion layer is responsible for ingesting data into the stream storage layer. It provides the ability to collect data from tens of thousands of data sources and ingest in real time. You can use the Kinesis SDK for ingesting streaming data through APIs, the Kinesis Producer Library for building high-performance and long-running streaming producers, or a Kinesis agent for collecting a set of files and ingesting them into Kinesis Data Streams. In addition, you can use many pre-build integrations such as AWS Database Migration Service (AWS DMS), Amazon DynamoDB, and AWS IoT Core to ingest data in a no-code fashion. You can also ingest data from third-party platforms such as Apache Spark and Apache Kafka Connect
  • Stream storage – Kinesis Data Streams offer two modes to support the data throughput: On-Demand and Provisioned. On-Demand mode, now the default choice, can elastically scale to absorb variable throughputs, so that customers do not need to worry about capacity management and pay by data throughput. The On-Demand mode automatically scales up 2x the stream capacity over its historic maximum data ingestion to provide sufficient capacity for unexpected spikes in data ingestion. Alternatively, customers who want granular control over stream resources can use the Provisioned mode and proactively scale up and down the number of Shards to meet their throughput requirements. Additionally, Kinesis Data Streams can store streaming data up to 24 hours by default, but can extend to 7 days or 365 days depending upon use cases. Multiple applications can consume the same stream.
  • Stream processing – The stream processing layer is responsible for transforming data into a consumable state through data validation, cleanup, normalization, transformation, and enrichment. The streaming records are read in the order they are produced, allowing for real-time analytics, building event-driven applications or streaming ETL (extract, transform, and load). You can use Amazon Managed Service for Apache Flink for complex stream data processing, AWS Lambda for stateless stream data processing, and AWS Glue & Amazon EMR for near-real-time compute. You can also build customized consumer applications with Kinesis Consumer Library, which will take care of many complex tasks associated with distributed computing.
  • Destination – The destination layer is like a purpose-built destination depending on your use case. You can stream data directly to Amazon Redshift for data warehousing and Amazon EventBridge for building event-driven applications. You can also use Amazon Kinesis Data Firehose for streaming integration where you can light stream processing with AWS Lambda, and then deliver processed streaming into destinations like Amazon S3 data lake, OpenSearch Service for operational analytics, a Redshift data warehouse, No-SQL databases like Amazon DynamoDB, and relational databases like Amazon RDS to consume real-time streams into business applications. The destination can be an event-driven application for real-time dashboards, automatic decisions based on processed streaming data, real-time altering, and more.

Real-time analytics architecture for time series

Time series data is a sequence of data points recorded over a time interval for measuring events that change over time. Examples are stock prices over time, webpage clickstreams, and device logs over time. Customers can use time series data to monitor changes over time, so that they can detect anomalies, identify patterns, and analyze how certain variables are influenced over time. Time series data is typically generated from multiple sources in high volumes, and it needs to be cost-effectively collected in near real time.

Typically, there are three primary goals that customers want to achieve in processing time-series data:

  • Gain insights real-time into system performance and detect anomalies
  • Understand end-user behavior to track trends and query/build visualizations from these insights
  • Have a durable storage solution to ingest and store both archival and frequently accessed data.

With Kinesis Data Streams, customers can continuously capture terabytes of time series data from thousands of sources for cleaning, enrichment, storage, analysis, and visualization.

The following architecture pattern illustrates how real time analytics can be achieved for Time Series data with Kinesis Data Streams:

Build a serverless streaming data pipeline for time series data

The workflow steps are as follows:

  1. Data Ingestion & Storage – Kinesis Data Streams can continuously capture and store terabytes of data from thousands of sources.
  2. Stream Processing – An application created with Amazon Managed Service for Apache Flink can read the records from the data stream to detect and clean any errors in the time series data and enrich the data with specific metadata to optimize operational analytics. Using a data stream in the middle provides the advantage of using the time series data in other processes and solutions at the same time. A Lambda function is then invoked with these events, and can perform time series calculations in memory.
  3. Destinations – After cleaning and enrichment, the processed time series data can be streamed to Amazon Timestream database for real-time dashboarding and analysis, or stored in databases such as DynamoDB for end-user query. The raw data can be streamed to Amazon S3 for archiving.
  4. Visualization & Gain insights – Customers can query, visualize, and create alerts using Amazon Managed Service for Grafana. Grafana supports data sources that are storage backends for time series data. To access your data from Timestream, you need to install the Timestream plugin for Grafana. End-users can query data from the DynamoDB table with Amazon API Gateway acting as a proxy.

Refer to Near Real-Time Processing with Amazon Kinesis, Amazon Timestream, and Grafana showcasing a serverless streaming pipeline to process and store device telemetry IoT data into a time series optimized data store such as Amazon Timestream.

Enriching & replaying data in real time for event-sourcing microservices

Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs. When building event-driven microservices, customers want to achieve 1. high scalability to handle the volume of incoming events and 2. reliability of event processing and maintain system functionality in the face of failures.

Customers utilize microservice architecture patterns to accelerate innovation and time-to-market for new features, because it makes applications easier to scale and faster to develop. However, it is challenging to enrich and replay the data in a network call to another microservice because it can impact the reliability of the application and make it difficult to debug and trace errors. To solve this problem, event-sourcing is an effective design pattern that centralizes historic records of all state changes for enrichment and replay, and decouples read from write workloads. Customers can use Kinesis Data Streams as the centralized event store for event-sourcing microservices, because KDS can 1/ handle gigabytes of data throughput per second per stream and stream the data in milliseconds, to meet the requirement on high scalability and near real-time latency, 2/ integrate with Flink and S3 for data enrichment and achieving while being completely decoupled from the microservices, and 3/ allow retry and asynchronous read in a later time, because KDS retains the data record for a default of 24 hours, and optionally up to 365 days.

The following architectural pattern is a generic illustration of how Kinesis Data Streams can be used for Event-Sourcing Microservices:

The steps in the workflow are as follows:

  1. Data Ingestion and Storage – You can aggregate the input from your microservices to your Kinesis Data Streams for storage.
  2. Stream processing Apache Flink Stateful Functions simplifies building distributed stateful event-driven applications. It can receive the events from an input Kinesis data stream and route the resulting stream to an output data stream. You can create a stateful functions cluster with Apache Flink based on your application business logic.
  3. State snapshot in Amazon S3 – You can store the state snapshot in Amazon S3 for tracking.
  4. Output streams – The output streams can be consumed through Lambda remote functions through HTTP/gRPC protocol through API Gateway.
  5. Lambda remote functions – Lambda functions can act as microservices for various application and business logic to serve business applications and mobile apps.

To learn how other customers built their event-based microservices with Kinesis Data Streams, refer to the following:

Key considerations and best practices

The following are considerations and best practices to keep in mind:

  • Data discovery should be your first step in building modern data streaming applications. You must define the business value and then identify your streaming data sources and user personas to achieve the desired business outcomes.
  • Choose your streaming data ingestion tool based on your steaming data source. For example, you can use the Kinesis SDK for ingesting streaming data through APIs, the Kinesis Producer Library for building high-performance and long-running streaming producers, a Kinesis agent for collecting a set of files and ingesting them into Kinesis Data Streams, AWS DMS for CDC streaming use cases, and AWS IoT Core for ingesting IoT device data into Kinesis Data Streams. You can ingest streaming data directly into Amazon Redshift to build low-latency streaming applications. You can also use third-party libraries like Apache Spark and Apache Kafka to ingest streaming data into Kinesis Data Streams.
  • You need to choose your streaming data processing services based on your specific use case and business requirements. For example, you can use Amazon Kinesis Managed Service for Apache Flink for advanced streaming use cases with multiple streaming destinations and complex stateful stream processing or if you want to monitor business metrics in real time (such as every hour). Lambda is good for event-based and stateless processing. You can use Amazon EMR for streaming data processing to use your favorite open source big data frameworks. AWS Glue is good for near-real-time streaming data processing for use cases such as streaming ETL.
  • Kinesis Data Streams on-demand mode charges by usage and automatically scales up resource capacity, so it’s good for spiky streaming workloads and hands-free maintenance. Provisioned mode charges by capacity and requires proactive capacity management, so it’s good for predictable streaming workloads.
  • You can use the Kinesis Shared Calculator to calculate the number of shards needed for provisioned mode. You don’t need to be concerned about shards with on-demand mode.
  • When granting permissions, you decide who is getting what permissions to which Kinesis Data Streams resources. You enable specific actions that you want to allow on those resources. Therefore, you should grant only the permissions that are required to perform a task. You can also encrypt the data at rest by using a KMS customer managed key (CMK).
  • You can update the retention period via the Kinesis Data Streams console or by using the IncreaseStreamRetentionPeriod and the DecreaseStreamRetentionPeriod operations based on your specific use cases.
  • Kinesis Data Streams supports resharding. The recommended API for this function is UpdateShardCount, which allows you to modify the number of shards in your stream to adapt to changes in the rate of data flow through the stream. The resharding APIs (Split and Merge) are typically used to handle hot shards.

Conclusion

This post demonstrated various architectural patterns for building low-latency streaming applications with Kinesis Data Streams. You can build your own low-latency steaming applications with Kinesis Data Streams using the information in this post.

For detailed architectural patterns, refer to the following resources:

If you want to build a data vision and strategy, check out the AWS Data-Driven Everything (D2E) program.


About the Authors

Raghavarao Sodabathina is a Principal Solutions Architect at AWS, focusing on Data Analytics, AI/ML, and cloud security. He engages with customers to create innovative solutions that address customer business problems and to accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.

Hang Zuo is a Senior Product Manager on the Amazon Kinesis Data Streams team at Amazon Web Services. He is passionate about developing intuitive product experiences that solve complex customer problems and enable customers to achieve their business goals.

Shwetha Radhakrishnan is a Solutions Architect for AWS with a focus in Data Analytics. She has been building solutions that drive cloud adoption and help organizations make data-driven decisions within the public sector. Outside of work, she loves dancing, spending time with friends and family, and traveling.

Brittany Ly is a Solutions Architect at AWS. She is focused on helping enterprise customers with their cloud adoption and modernization journey and has an interest in the security and analytics field. Outside of work, she loves to spend time with her dog and play pickleball.

Building event-driven architectures with IoT sensor data

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-event-driven-architectures-with-iot-sensor-data/

The Internet of Things (IoT) brings sensors, cloud computing, analytics, and people together to improve productivity and efficiency. It empowers customers with the intelligence they need to build new services and business models, improve products and services over time, understand their customers’ needs to provide better services, and improve customer experiences. Business operations become more efficient by making intelligent decisions more quickly and over time develop a data-driven discipline leading to revenue growth and greater operational efficiency.

In this post, we showcase how to build an event-driven architecture by using AWS IoT services and AWS purpose-built data services. We also discuss key considerations and best practices while building event-driven application architectures with IoT sensor data.

Deriving insights from IoT sensor data

Organizations create value by making decisions from their IoT sensor data in near real time. Some common use cases and solutions that fit under event-driven architecture using IoT sensor data include:

  • Medical device data collection for personalized patient health monitoring, adverse event prediction, and avoidance.
  • Industrial IoT use cases to monitor equipment quality and determine actions like adjusting machine settings, using different sources of raw materials, or performing additional worker training to improve the quality of the factory output.
  • Connected vehicle use cases, such as voice interaction, navigation, location-based services, remote vehicle diagnostics, predictive maintenance, media streaming, and vehicle safety, that are based on in-vehicle computing and near real-time predictive analytics in the cloud.
  • Sustainability and waste reduction solutions, which provide access to dashboards, monitoring systems, data collection, and summarization tools that use machine learning (ML) algorithms to meet sustainability goals. Meeting sustainability goals is paramount for customers in the travel and hospitality industries.

Event-driven reference architecture with IoT sensor data

Figure 1 illustrates how to architect an event-driven architecture with IoT sensor data for near real-time predictive analytics and recommendations.

Building event-driven architecture with IoT sensor data

Figure 1. Building event-driven architecture with IoT sensor data

Architecture flow:

  1. Data originates in IoT devices such as medical devices, car sensors, industrial IoT sensors.This telemetry data is collected using AWS IoT Greengrass, an open-source IoT edge runtime and cloud service that helps your devices collect and analyze data closer to where the data is generated.When an event arrives, AWS IoT Greengrass reacts autonomously to local events, filters and aggregates device data, then communicates securely with the cloud and other local devices in your network to send the data.
  2. Event data is ingested into the cloud using edge-to-cloud interface services such as AWS IoT Core, a managed cloud platform that connects, manages, and scales devices easily and securely.AWS IoT Core interacts with cloud applications and other devices. You can also use AWS IoT SiteWise, a managed service that helps you collect, model, analyze, and visualize data from industrial equipment at scale.
  3. AWS IoT Core can directly stream ingested data into Amazon Kinesis Data Streams. The ingested data gets transformed and analyzed in near real time using Amazon Kinesis Data Analytics with Apache Flink and Apache Beam frameworks.Stream data can further be enriched using lookup data hosted in a data warehouse such as Amazon Redshift. Amazon Kinesis Data Analytics can persist SQL results to Amazon Redshift after the customer’s integration and stream aggregation (for example, one minute or five minutes).The results in Amazon Redshift can be used for further downstream business intelligence (BI) reporting services, such as Amazon QuickSight.
  4. Amazon Kinesis Data Analytics can also write to an AWS Lambda function, which can invoke Amazon SageMaker models. Amazon SageMaker is a the most complete, end-to-end service for machine learning.
  5. Once the ML model is trained and deployed in SageMaker, inferences are invoked in a micro batch using AWS Lambda. Inferenced data is sent to Amazon OpenSearch Service to create personalized monitoring dashboards using Amazon OpenSearch Service dashboards.The transformed IoT sensor data can be stored in Amazon DynamoDB. Customers can use AWS AppSync to provide near real-time data queries to API services for downstream applications. These enterprise applications can be mobile apps or business applications to track and monitor the IoT sensor data in near real-time.Amazon Kinesis Data Analytics can write to an Amazon Kinesis Data Firehose stream, which is a fully managed service for delivering near real-time streaming data to destinations like Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon OpenSearch Service, Splunk, and any custom HTTP endpoints or endpoints owned by supported third-party service providers, including Datadog, Dynatrace, LogicMonitor, MongoDB, New Relic, and Sumo Logic.

    In this example, data from Amazon Kinesis Data Analytics is written to Amazon Kinesis Data Firehose, which micro-batch streams data into an Amazon S3 data lake. The Amazon S3 data lake stores telemetry data for future batch analytics.

Key considerations and best practices

Keep the following best practices in mind:

  • Define the business value from IoT sensor data through interactive discovery sessions with various stakeholders within your organization.
  • Identify the type of IoT sensor data you want to collect and analyze for predictive analytics.
  • Choose the right tools for the job, depending upon your business use case and your data consumers. Please refer to step 5 earlier in this post, where different purpose-built data services were used based on user personas.
  • Consider the event-driven architecture as three key components: event producers, event routers, and event consumers. A producer publishes an event to the router, which filters and pushes the events to consumers. Producer and consumer services are decoupled, which allows them to be scaled, updated, and deployed independently.
  • In this architecture, IoT sensors are event producers. Amazon IoT Greengrass, Amazon IoT Core, Amazon Kinesis Data Streams, and Amazon Kinesis Data Analytics work together as the router from which multiple consumers can consume IoT sensor-generated data. These consumers include Amazon S3 data lakes for telemetry data analysis, Amazon OpenSearch Service for personalized dashboards, and Amazon DynamoDB or AWS AppSync for the downstream enterprise application’s consumption.

Conclusion

In this post, we demonstrated how to build an event-driven architecture with IoT sensor data using AWS IoT services and AWS purpose-built data services. You can now build your own event-driven applications using this post with your IoT sensor data and integrate with your business applications as needed.

Further reading

Architecting near real-time personalized recommendations with Amazon Personalize

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/architecting-near-real-time-personalized-recommendations-with-amazon-personalize/

Delivering personalized customer experiences enables organizations to improve business outcomes such as acquiring and retaining customers, increasing engagement, driving efficiencies, and improving discoverability. Developing an in-house personalization solution can take a lot of time, which increases the time it takes for your business to launch new features and user experiences.

In this post, we show you how to architect near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services.  We also discuss key considerations and best practices while building near real-time personalized recommendations.

Building personalized recommendations with Amazon Personalize

Amazon Personalize makes it easy for developers to build applications capable of delivering a wide array of personalization experiences, including specific product recommendations, personalized product re-ranking, and customized direct marketing.

Amazon Personalize provisions the necessary infrastructure and manages the entire machine learning (ML) pipeline, including processing the data, identifying features, using the most appropriate algorithms, and training, optimizing, and hosting the models. You receive results through an Application Programming Interface (API) and pay only for what you use, with no minimum fees or upfront commitments.

Figure 1 illustrates the comparison of Amazon Personalize with the ML lifecycle.

Machine learning lifecycle vs. Amazon Personalize

Figure 1. Machine learning lifecycle vs. Amazon Personalize

First, provide the user and items data to Amazon Personalize. In general, there are three steps for building near real-time recommendations with Amazon Personalize:

  1. Data preparation: Preparing data is one of the prerequisites for building accurate ML models and analytics, and it is the most time-consuming part of an ML project. There are three types of data you use for modeling on Amazon Personalize:
    • An Interactions data set captures the activity of your users, also known as events. Examples include items your users click on, purchase, or watch. The events you choose to send are dependent on your business domain. This data set has the strongest signal for personalization, and is the only mandatory data set.
    • An Items data set includes details about your items, such as price point, category information, and other essential information from your catalog. This data set is optional, but very useful for scenarios such as recommending new items.
    • A Users data set includes details about the users, such as their location, age, and other details.
  2. Train the model with Amazon Personalize: Amazon Personalize provides recipes, based on common use cases for training models. A recipe is an Amazon Personalize algorithm prepared for a given use case. Refer to Amazon Personalize recipes for more details. The four types of recipes are:
    • USER_PERSONALIZATION: Recommends items for a user from a catalog. This is often included on a landing page.
    • RELATED_ITEM: Suggests items similar to a selected item on a detail page.
    • PERSONALZIED_RANKING: Re-ranks a list of items for a user within a category or in within search results.
    • USER_SEGMENTATION: Generates segments of users based on item input data. You can use this to create a targeted marketing campaign for particular products by brand.
  3. Get near real-time recommendations: Once your model is trained, a private personalization model is hosted for you. You can then provide recommendations for your users through a private API.

Figure 2 illustrates a high-level overview of Amazon Personalize:

Figure 2. Building recommendations with Amazon Personalize

Figure 2. Building recommendations with Amazon Personalize

Near real-time personalized recommendations reference architecture

Figure 3 illustrates how to architect near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services.

Reference architecture for near real-time recommendations

Figure 3. Near real-time recommendations reference architecture

Architecture flow:

  1. Data preparation: Start by creating a dataset group, schemas, and datasets representing your items, interactions, and user data.
  2. Train the model: After importing your data, select the recipe matching your use case, and then create a solution to train a model by creating a solution version.
    Once your solution version is ready, you can create a campaign for your solution version. You can create a campaign for every solution version that you want to use for near real-time recommendations.
    In this example architecture, we’re just showing a single solution version and campaign. If you were building out multiple personalization use cases with different recipes, you could create multiple solution versions and campaigns from the same datasets.
  3. Get near real-time recommendations: Once you have a campaign, you can integrate calls to the campaign in your application. This is where calls to the GetRecommendations or GetPersonalizedRanking APIs are made to request near real-time recommendations from Amazon Personalize.
    • The approach you take to integrate recommendations into your application varies based on your architecture but it typically involves encapsulating recommendations in a microservice or AWS Lambda function that is called by your website or mobile application through a RESTful or GraphQL API interface.
    • Near real-time recommendations support the ability to adapt to each user’s evolving interests. This is done by creating an event tracker in Amazon Personalize.
    • An event tracker provides an endpoint that allows you to stream interactions that occur in your application back to Amazon Personalize in near real-time. You do this by using the PutEvents API.
    • Again, the architectural details on how you integrate PutEvents into your application varies, but it typically involves collecting events using a JavaScript library in your website or a native library in your mobile apps, and making API calls to stream them to your backend. AWS provides the AWS Amplify framework that can be integrated into your web and mobile apps to handle this for you.
    • In this example architecture, you can build an event collection pipeline using  Amazon API Gateway, Amazon Kinesis Data Streams, and Lambda to receive and forward interactions to Amazon Personalize.
    • The Event Tracker performs two primary functions. First, it persists all streamed interactions so they will be incorporated into future retraining of your model. This also how Amazon Personalize cold starts new users. When a new user visits your site, Amazon Personalize will recommend popular items. After you stream in an event or two, Amazon Personalize immediately starts adjusting recommendations.

Key considerations and best practices

  1. For all use cases, your interactions data must have a minimum 1000 interaction records from users interacting with items in your catalog. These interactions can be from bulk imports, streamed events, or both, and a minimum 25 unique user IDs with at least two interactions for each.
  2. Metadata fields (user or item) can be used for training, filters, or both.
  3. Amazon Personalize supports the encryption of your imported data. You can specify a role allowing Amazon Personalize to use an AWS Key Management Service (AWS KMS) key to decrypt your data, or use the Amazon Simple Storage Service (Amazon S3) AES-256 server-side default encryption.
  4. You can re-train Amazon Personalize deployments based on how much interaction data you generate on a daily basis. A good rule is to re-train your models once every week or two as needed.
  5. You can apply business rules for personalized recommendations using filters. Refer to Filtering recommendations and user segments for more details.

Conclusion

In this post, we showed you how to build near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services. With the information in this post, you can now build your own personalized recommendations for your applications.

Read more and get started on building personalized recommendations on AWS:

Building SAML federation for Amazon OpenSearch Dashboards with Ping Identity

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-ping-identity/

Amazon OpenSearch is an open search and log analytics service, powered by the Apache Lucene search library.

In this blog post, we provide step-by-step guidance for SP-initiated SSO by showing how to set up a trial Ping Identity account. We’ll show how to build users and groups within your organization’s directory and enable SSO in OpenSearch Dashboards.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or the internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in.

Ping Identity is an AWS Competency Partner, and the provider of the PingOne Cloud Platform is a multi-tenant Identity-as-a-Service (IDaaS) platform. Ping Identity supports both service provider (SP)-initiated and identity provider (IdP)-initiated SSO.

Overview of Ping Identity SAML authenticated solution

Figure 1 shows a sample architecture of a generic integrated solution between Ping Identity and OpenSearch Dashboards over SAML authentication.

SAML transactions between Amazon OpenSearch and Ping Identity

Figure 1. SAML transactions between Amazon OpenSearch and Ping Identity

The sign-in flow is as follows:

  1. User opens browser window and navigates to Amazon OpenSearch Dashboards
  2. Amazon OpenSearch generates SAML authentication request
  3. Amazon OpenSearch redirects request back to browser
  4. Browser redirects to Ping Identity URL
  5. Ping Identity parses SAML request, authenticates user, and generates SAML response
  6. Ping Identity returns encoded SAML response to browser
  7. Browser sends SAML response back to Amazon OpenSearch Assertion Consumer Service (ACS) URL
  8. ACS verifies SAML response
  9. User logs into Amazon OpenSearch domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account
  2. A virtual private cloud (VPC)-based Amazon OpenSearch domain with fine-grained access control enabled
  3. Ping Identity account with user and a group
  4. A browser with network connectivity to Ping Identity, Amazon OpenSearch domain, and Amazon OpenSearch Dashboards.

The steps in this post are structured into the following sections:

  1. Identity provider (Ping Identity) setup
  2. Prepare Amazon OpenSearch for SAML configuration
  3. Identity provider (Ping Identity) SAML configuration
  4. Finish Amazon OpenSearch for SAML configuration
  5. Validation
  6. Cleanup

Identity provider (Ping Identity) setup

Step 1: Sign up for a Ping Identity account

  • Sign up for a Ping Identity account, then click on the Sign up button to complete your account setup.
  • If you already have an account with Ping Identity, login to your Ping Identity account.

Step 2: Create Population in Ping Identity

  • Choose Identities in the left menu and click Populations to proceed.
  • Click on the blue + button next to Populations, enter the name as IT, then click on the Save button (see Figure 2).
Creating population in Ping Identity

Figure 2. Creating population in Ping Identity

Step 3: Create a group in Ping Identity

  • Choose Groups from the left menu and click on the blue + button next to Groups. For this example, we will create a group called opensearch for Kibana access. Click on the Save button to complete the group creation.

Step 4: Create users in Ping Identity

  • Choose Users in left menu, then click the + Add User button.
  • Provide GIVEN NAME, FAMILY NAME, EMAIL ADDRESS, and choose Population as users, as created in Step 1. Choose your own USERNAME. Click on the SAVE button to create your user.
  • Add more users as needed.

Step 5: Assign role and group to users

  • Click on Identities/users in the left menu, and click on Users. Then click on the edit button for a particular user, as shown in Figure 3.
Assigning roles and groups to users in Ping Identity

Figure 3. Assigning roles and groups to users in Ping Identity

  • Click on the Edit button, click on + Add Role button, and click on the edit button to assign a role to the user.
  • For this example, choose Environment Admin, as shown in Figure 4. You can choose different roles depending on your use case.
Assigning roles to users in Ping Identity

Figure 4. Assigning roles to users in Ping Identity

  • For this example, assign administrator responsibilities for our users. Click on Show Environments, and drag Administrators into the ADDED RESPONSIBILITES section. Then click on the Add Role button.
  • Add Group to users. Go to the Groups tab, search for the opensearch group created in Step 3. Click on the + button next to opensearch to add into group memberships.

Prepare Amazon OpenSearch for SAML configuration

Once the Amazon OpenSearch domain is up and running, we can proceed with configuration.

  • Under Actions, choose Edit security configuration, as shown in Figure 5.
Enabling Amazon OpenSearch security configuration for SAML

Figure 5. Enabling Amazon OpenSearch security configuration for SAML

  • Under SAML authentication for OpenSearch Dashboards/Kibana, select Enable SAML authentication check box (Figure 6). When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.
Amazon OpenSearch URLs for SAML configuration

Figure 6. Amazon OpenSearch URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL as highlighted in Figure 6 for Ping Identity SAML configuration. We will complete the rest of the Amazon OpenSearch SAML configuration after the Ping Identity SAML configuration.

Ping Identity SAML configuration

Go back to PingIdentity.com, and navigate to Connections on the left menu. Then select Applications, and click on Application +.

  • For this example, we are creating an application called “Kibana”
  • Select WEB APP as APPLICATION TYPE and CHOOSE CONNECTION TYPE as SAML, and click on Configure button to proceed as shown in Figure 7.
Configuring a new application in Ping Identity

Figure 7. Configuring a new application in Ping Identity

  • On the “Create App Profile” page, click on the Next button, and choose the “Manually Enter” option for PROVIDE APP METADATA. Enter the following under Configure SAML Connection section
    • ACS URL https://vpc-XXXXX-XXXXX-west-2.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
    • Choose Sign Assertion & Response under SIGNING KEY
    • ENTITY ID: https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com (Service provider entity ID)
    • ASSERTION VALIDITY DURATION (IN SECONDS) as 3600
    • Choose default options, then click on the Save and Continue button as shown in Figure 8
Configuring SAML connection in Ping Identity

Figure 8. Configuring SAML connection in Ping Identity

  • Enter the following under Configure Attribute Mapping, then click on Save and Close.
    • Set User ID to default
    • Click on +ADD ATTRIBUTE button to add following SAML attributes
      • OUTGOING VALUE: Group Names, SAML ATTRIBUTE: saml_group
      • OUTGOING VALUE: Username, SAML ATTRIBUTE: saml_username
  • Select the Policies tab and click on edit icon on the right.
  • Add the Single_Factor policy to the application, then click on Save.
  • Select the Access tab, add the opensearch group to the application, then click on Save to complete SAML configuration.
  • Finally, go to the Configuration tab, click on the Download Metadata button to download the Ping Identity metadata for the Amazon OpenSearch SAML configuration. Enable opensearch SAML application (Figure 9).
Downloading metadata in Ping Identity

Figure 9. Downloading metadata in Ping Identity

Amazon OpenSearch SAML configuration

  • Switch back to Amazon OpenSearch domain:
    • Navigate to the Amazon OpenSearch console.
    • Click on Actions, then click on Modify Security configuration.
    • Select the Enable SAML authentication check box.
  • Under Import IdP metadata section:
    • Metadata from IdP: Import the Ping Identity identity provider metadata from the downloaded XML file, shown in Figure 10.
    • SAML master backend role: opensearch (Ping Identity group). Provide SAML backend role/group SAML assertion key for group SSO into Kibana.
Configuring Amazon OpenSearch SAML parameters

Figure 10. Configuring Amazon OpenSearch SAML parameters

  • Under Optional SAML settings:
    • Leave the Subject Key as saml_subject from Ping Identity SAML application attribute name.
    • Role key should be saml_group. You can view a sample assertion during the configuration process by tools like SAML-tracer. This can help you examine and troubleshoot the contents of real assertions.
    • Session time to live (mins): 60
  • Click on the Submit button to complete Amazon OpenSearch SAML configuration for Kibana. We have successfully completed SAML configuration and are now ready for testing.

Validating Access with Ping Identity Users

  • The OpenSearch Dashboards URL can be found in the Overview tab within “General Information” in the Amazon OpenSearch console (Figure 11). The first access to the OpenSearch Dashboards URL redirects you to the Ping Identity login screen.
Validating Ping Identity users access with Amazon OpenSearch

Figure 11. Validating Ping Identity users access with Amazon OpenSearch

  • If your OpenSearch domain is hosted within a private VPC, you will not be able to access OpenSearch Dashboards over public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch cluster and your identity provider.
  • You can create a Mac or Windows EC2 instance within the same VPC and access Amazon OpenSearch Dashboards from an EC2 instance’s web browser to validate your SAML configuration. Or you can access your Amazon OpenSearch Dashboards through Site-to-Site VPN if you are trying to access it from your on-premises environment.
  • Now copy and paste the OpenSearch Dashboards URL in your browser, and enter user credentials.
  • After successful login, you will be redirected into the OpenSearch Dashboards home page. Explore our sample data and visualizations in OpenSearch Dashboards, as shown in Figure 12.
SAML authenticated Amazon OpenSearch Dashboards

Figure 12. SAML authenticated Amazon OpenSearch Dashboards

  • You have successfully federated Amazon OpenSearch Dashboards with Ping Identity as an identity provider. You can connect OpenSearch Dashboards by using your Ping Identity credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created to avoid incurring future charges. Refer to these links:

Conclusion

In this blog post, we have demonstrated how to set up Ping Identity as an identity provider over SAML authentication for Amazon OpenSearch Dashboards access. With this solution, you now have an OpenSearch Dashboard that uses Ping Identity as the custom identity provider for your users. This reduces the customer login process to one set of credentials and improves employee productivity.

Get started by checking the Amazon OpenSearch Developer Guide, which provides guidance on how to build applications using Amazon OpenSearch for your operational analytics.

Building SAML federation for Amazon OpenSearch Service with Okta

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-okta/

Amazon OpenSearch Service is a fully managed open search and analytics service powered by the Apache Lucene search library. Security Assertion Markup Language (SAML)-based federation for OpenSearch Dashboards lets you use your existing identity provider (IdP) like Okta to provide single sign-on (SSO) for OpenSearch Dashboards on OpenSearch Service domains.

This post shows step-by-step guidance to enable SP-initiated single sign-on (SSO) into OpenSearch Dashboards using Okta.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or the internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in to OpenSearch Dashboards. SAML authentication for OpenSearch Dashboards is only for accessing OpenSearch Dashboards through a web browser.

Overview of Okta SAML authenticated solution

Figure 1 depicts a sample architecture of a generic, integrated solution between Okta and OpenSearch Dashboards over SAML authentication.

SAML transactions between Amazon OpenSearch Service and Okta

Figure 1. SAML transactions between Amazon OpenSearch Service and Okta

The initial sign-in flow is as follows:

  1. User opens browser window and navigates to OpenSearch Dashboards
  2. OpenSearch Service generates SAML authentication request
  3. OpenSearch Service redirects request back to browser
  4. Browser redirects to Okta URL
  5. Okta parses SAML request, authenticates user, and generates SAML response
  6. Okta returns encoded SAML response to browser
  7. Browser sends SAML response back to OpenSearch Service Assertion Consumer Services (ACS) URL
  8. ACS verifies SAML response
  9. User logs into OpenSearch Service domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account
  2. A virtual private cloud (VPC)-based OpenSearch Service domain with fine-grained access control enabled
  3. Okta account with user and a group
  4. A browser with network connectivity to Okta, OpenSearch Service domain, and OpenSearch Dashboards.

The steps in this post are structured into the following sections:

  1. Identity provider (Okta) setup
  2. Prepare OpenSearch Service for SAML configuration
  3. Identity provider (Okta) SAML configuration
  4. Finish OpenSearch Service for SAML configuration
  5. Validation
  6. Cleanup

Identity provider (Okta) setup

Step 1: Sign up for an Okta account

  • Sign up for an Okta account, then click on the Sign up button to complete your account setup.
  • If you already have an account with Okta, login to your Okta account.

Step 2: Create Groups in Okta

  • Choose Directory in the left menu and click Groups to proceed.
  • Click on Add Group and enter name as opensearch. Then click on the Save button, see Figure 2.
Creating a group in Okta

Figure 2. Creating a group in Okta

Step 3: Create users in Okta

  • Choose People in left menu under Directory section and click the +Add Person button.
  • Provide First name, Last name, username (email ID), and primary email. Then select set by admin from the Password dropdown, and choose first time password. Click on the Save button to create your user.
  • Add more users as needed.

Step 4: Assign Groups to users 

  • Choose Groups from the left menu, then click on the opensearch group created in Step 2. Click on the Assign People button to add users to the opensearch group. Next, either click on individual user under Person & Username, or use the Add All button to add all existing users to the opensearch group. Click on the Save button to complete adding users to your group.

Prepare OpenSearch Service for SAML configuration

Once OpenSearch Service domain is up and running, we can proceed with configuration.

  • Navigate to the OpenSearch Service console
  • Under Actions, choose Edit security configuration as shown in Figure 3
Enabling Amazon OpenSearch Service security configuration for SAML

Figure 3. Enabling Amazon OpenSearch Service security configuration for SAML

  • Under SAML authentication for OpenSearch Dashboards/Kibana, select the Enable SAML authentication check box, see Figure 4. When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.
Amazon OpenSearch Service URLs for SAML configuration

Figure 4. Amazon OpenSearch Service URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL (highlighted in Figure 4) for Okta SAML configuration. The OpenSearch Dashboards login flow can take one of two forms:

  • Service provider (SP) initiated: You navigate to your OpenSearch Dashboard (for example, https://my-domain.us-east-1.es.amazonaws.com/_dashboards), which redirects you to the login screen. After you log in, the identity provider redirects you to OpenSearch Dashboards.
  • Identity provider (IdP) initiated: You navigate to your identity provider, log in, and choose OpenSearch Dashboards from an application directory.

We will complete the rest of the OpenSearch Service SAML configuration after the Okta SAML configuration.

Okta SAML configuration

  • Go back to Okta.com, and choose Applications from the left menu. Click on Applications, then click on Create App Integration and choose SAML 2.0. Click on the Next button to proceed, as shown in Figure 5.
  • For this example, we are creating an application called “OpenSearch Dashboard”.
  • Select Platform as Web, and select Sign on method as SAML 2.0. Click on the Create button to proceed.
Creating a SAML app integration in Okta

Figure 5. Creating a SAML app integration in Okta

  • Enter the App name as OpenSearch, use default options, and click on the Next button to proceed.
  • Enter the following under the SAML Settings section, as shown in Figure 6. Click on the Next button to proceed.
    • Single Sign on URL = https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
    • Audience URI(SP Entity ID) = https://vpc-XXXXX-XXXXX.us-west-2.es.amazonaws.com (Service Provider entity ID)
    • Default RelayState = leave it blank
    • Name ID format = Select EmailAddress from drop down
    • Application username = Select Okta username from dropdown
    • Update application username on = leave it set to default
  • Enter the following under Attribute Statements (optional) section.
    • Name = http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress
    • Name format = Select URI Reference from dropdown
    • Value = user.email
  • Enter the following under the Group Attribute Statements (optional) section.
    • Name = http://schemas.xmlsoap.org/claims/Group
    • Name format = Select URI Reference from dropdown
    • Filter = Select Matches regex from dropdown and enter value as .*open.* to match the group created in previous steps for OpenSearch Dashboards access.
SAML configuration in Okta

Figure 6. SAML configuration in Okta

  • Select I’m a software vendor. I’d like to integrate my app with Okta under the Help Okta Support understand how you configured this application section.
  • Click on the Finish button to complete the Okta SAML application configuration.
  • Choose Sign on menu. Right click on the Identity Provider metadata hyperlink to download the Okta identity provider metadata as okta.xml. You will use this for the SAML configuration in OpenSearch Service, see Figure 7.
SAML configuration in Okta

Figure 7. Downloading Okta identity provider metadata for SAML configuration

  • Choose the Assignments menu and click on Assign-> Assign to Groups
  • Select the opensearch group, click on Assign, and click on the Done button to complete the Group assignment, as shown in Figure 8.
Assigning groups to the app in Okta

Figure 8. Assigning groups to the app in Okta

  • Switch back to the OpenSearch Service domain
  • Under the Import IdP metadata section:
    • Metadata from IdP: Import the Okta identity provider metadata from the downloaded XML file
    • SAML master backend role: opensearch (Okta group). Provide the SAML backend role/group SAML assertion key for group SSO into OpenSearch Dashboard.
  • Under Optional SAML settings:
    • Leave Subject Key blank
    • Role key should be http://schemas.xmlsoap.org/claims/Group. You can view a sample assertion during the configuration process with tools like SAML-tracer. This can help you examine and troubleshoot the contents of real assertions.
    • Session time to live (mins): 60
  • Click on the Save changes button (Figure 9) to complete OpenSearch Service SAML configuration for OpenSearch Dashboards. We have successfully completed SAML configuration, and now we are ready for testing.
Configuring Amazon OpenSearch Service SAML parameters

Figure 9. Configuring Amazon OpenSearch Service SAML parameters

Validating access with Okta users

  • Access the OpenSearch Dashboards endpoint from the previously created OpenSearch Service cluster. The OpenSearch Dashboards URL can be found in General information within “My Domains” of the OpenSearch Service console, as shown in Figure 10. The first access to OpenSearch Dashboards URL redirects you to the Okta login screen.
Validating Okta user access with Amazon OpenSearch Service

Figure 10. Validating Okta user access with Amazon OpenSearch Service

  • Now copy and paste the OpenSearch Dashboards URL in your browser, and enter the user credentials.
  • If your OpenSearch Service domain is hosted within a private VPC, you will not be able to access your OpenSearch Dashboard over public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch Service cluster and your identity provider.
  • You can create a Mac or Windows EC2 instance within the same VPC so that you can access Amazon OpenSearch Dashboard from EC2 instance’s web browser to validate your SAML configuration. Or you can access your OpenSearch Dashboard through Site-to-Site VPN from your on-premises environment.
  • After successful login, you will be redirected into the OpenSearch Dashboards home page. Here, you can explore our sample data and visualizations in OpenSearch Dashboards (Figure 11).
SAML authenticated OpenSearch dashboard

Figure 11. SAML authenticated OpenSearch Dashboards

  • Now, you have successfully federated OpenSearch Dashboards with Okta as an identity provider. You can connect OpenSearch Dashboards by using your Okta credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created, to avoid incurring future charges. Refer to these links:

Conclusion

In this blog post, we have demonstrated how to set up Okta as an identity provider over SAML authentication for OpenSearch Dashboards access. Get started by checking the Amazon OpenSearch Service Developer Guide, which provides guidance on how to build applications using OpenSearch Service.

Building SAML federation for Amazon OpenSearch Dashboards with Auth0

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-auth0/

Amazon OpenSearch is a fully managed, distributed, open search, and analytics service that is powered by the Apache Lucene search library. OpenSearch is derived from Elasticsearch 7.10.2, and is used for real-time application monitoring, log analytics, and website search. It’s ideal for use cases that require fast access and response for large volumes of data. OpenSearch Dashboards is derived from Kibana 7.10.2, and used for visual data exploration. With Security Assertion Markup Language (SAML)-based federation for OpenSearch, Dashboards lets you use your existing identity provider (IdP) like Auth0. You can use Auth0 to provide single sign-on (SSO) for OpenSearch Dashboards on Amazon OpenSearch search domains. It also gives you fine-grained access control, and the ability to search your data and build visualizations. Amazon OpenSearch supports providers that use the SAML 2.0 standard, such as Auth0, Okta, Keycloak, Active Directory Federation Services (AD FS), and Ping Identity (PingID).

In this post, we provide step-by-step guidance to show you how to set up a trial Auth0 account. We’ll demonstrate how to build users and groups within your organization’s directory, and enable SP-initiated single sign-on (SSO) into OpenSearch Dashboards.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or an internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in to the OpenSearch Dashboards. SAML authentication for OpenSearch Dashboards is only for accessing the OpenSearch Dashboards through a web browser. Your SAML credentials do not let you make direct HTTP requests to OpenSearch or OpenSearch Dashboards APIs.

Auth0 is an AWS Competency Partner and popular Identity-as-a-Service (IDaaS) solution. It supports both service provider (SP)-initiated and identity provider (IdP)-initiated SSO. For SP-initiated SSO, when you sign into the OpenSearch Dashboards login page it sends an authorization request to Auth0. Once it authenticates your identity, you are redirected to OpenSearch Dashboards. In IdP-initiated SSO, you log in to the Auth0 SSO page, and choose OpenSearch Dashboards to open the application.

Overview of AuthO SAML authenticated solution

Figure 1 depicts a sample architecture of a generic, integrated solution between Auth0 and OpenSearch Dashboards over SAML authentication.

High level flow of SAML transactions between Amazon OpenSearch and Auth0

Figure 1. A high-level view of a SAML transaction between Amazon OpenSearch and Auth0

The sign-in flow is as follows:

  1. User opens browser window and navigates to Amazon OpenSearch Dashboards
  2. Amazon OpenSearch generates SAML authentication request
  3. Amazon OpenSearch redirects request back to browser
  4. Browser redirects to Auth0 URL
  5. Auth0 parses SAML request, authenticates user, and generates SAML response
  6. Auth0 returns encoded SAML response to browser
  7. Browser sends SAML response back to Amazon OpenSearch Assertion Consumer Service (ACS) URL
  8. ACS verifies SAML response
  9. User logs into Amazon OpenSearch domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account
  2. A virtual private cloud (VPC) based Amazon OpenSearch domain with fine-grained access control enabled
  3. An Auth0 account with user and a group
  4. A browser with network connectivity to Auth0, Amazon OpenSearch domain, and Amazon OpenSearch Dashboards.

The steps in this post are structured into the following sections:

  1. Identity provider (Auth0) setup
  2. Prepare Amazon OpenSearch for SAML configuration
  3. Identity provider (Auth0) SAML configuration
  4. Finish Amazon OpenSearch for SAML configuration
  5. Validation
  6. Cleanup

Identity provider (Auth0) setup

Step 1: Sign up for an Auth0 account

  • Sign up for an Auth0 account, then click on the Sign up button to complete your account setup.
  • If you already have an account with Auth0, log in to your Auth0 account.

Step 2: Create Groups in Auth0

  • Choose User Management in the left menu and click Users, then click on the +Create User button.
  • Provide an email, password, and connection to your users. Click on the Create button to create your user.
  • Add more users to your Auth0 account.

Step 3: Install Auth0 Extension to create a group and assign users to the group

  • Click on Extensions in the left menu and search for “Auth0 Authorization”. Click on Auth0 Authorization to install the extension, shown in Figure 2.
The diagram depicts the Installing of Auth0 Authorization extension

Figure 2. Installing Auth0 Authorization extension

  • Use all default options and click on the Install button to install the extension.
  • Click on the Auth0 Authorization extension and choose the Accept button to provide access to your Auth0 account.
  • The Auth0 Authorization extension must be configured. Click on Go to Configuration (Figure 3).
The diagram depicts the configuration of Auth0 Authorization extension

Figure 3. Configuring the Auth0 Authorization extension

  • Rotate your API keys and check Groups, Roles, and Permissions to provide authorization to the extension and then click on PUBLISH RULE to complete the configuration, see Figure 4.
The diagram depicts the providing permissions to Auth0 Authorization extension

Figure 4. Providing the permissions to Auth0 Authorization extension

Step 4: Create a group in Auth0

  • Choose Groups from the left menu and click on the Create your first Group button. For this example, we will create a group called opensearch for OpenSearch Dashboards access.
  • Add your users to opensearch by clicking on ADD MEMBERS BUTTON, then click on the CONFIRM button to complete your group assignment (Figure 5).
The diagram depicts the adding users to Auth0 Group

Figure 5. Adding users to Auth0 Group

Step 5: Create an Auth0 Application

  • Choose Applications from the left menu. Click on the +Create Application button.
  • For this example, we are creating an application called “opensearch”.
  • Select Single Page Web Applications, then click on the CREATE button to proceed.
  • Click on the Addons tab on the application Kibana (Figure 6).
The diagram depicts the creation of Auth0 SAML application

Figure 6. Creating an Auth0 SAML application

  • Click on the SAML2 WEB APP, then select settings to provide SAML URLs from Amazon OpenSearch. We will configure these details after preparing the Amazon OpenSearch cluster for SAML.

Prepare Amazon OpenSearch for SAML configuration

Once the Amazon OpenSearch domain is up and running, we can proceed with configuration.

  • Under Actions, choose Edit security configuration (Figure 7).
The diagram depicts the enablement of OpenSearch security configuration for SAML

Figure 7. Enabling Amazon OpenSearch security configuration for SAML

  • Under SAML authentication for OpenSearch Dashboards/Kibana, select the Enable SAML authentication check box (Figure 8). When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.
The diagram depicts the Amazon OpenSearch URLs for SAML configuration

Figure 8. Amazon OpenSearch URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL (highlighted in Figure 8) for Auth0 SAML configuration. We will complete the rest of the Amazon OpenSearch SAML configuration after the Auth0 SAML configuration.

Auth0 SAML configuration

Go back to Auth0.com, and navigate to Applications from the left menu. Then select the opensearch application that you created as a part of the Auth0 setup.

  • Click on the Addons tab on the application opensearch.
  • Click on the SAML2 WEB APP, then select Settings to provide SAML URLs from Amazon OpenSearch, as shown in Figure 9:
    • Application Callback URL = https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
    • audience”: “https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com” (Service provider entity ID)
    • destination”: “ https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_plugin/kibana/_opendistro/_security/saml/acs” (SP-initiated SSO URL)
    • Mappings and other configurations shown in Figure 9

{
  "audience": "https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com",
  "destination": "https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_plugin/kibana/_opendistro/_security/saml/acs",
  "mappings":
  {
    "email":
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress",
    "name": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name",
    "groups": "http://schemas.xmlsoap.org/claims/Group"
  },
  "createUpnClaim": false,
  "passthroughClaimsWithNoMapping": false,
  "mapUnknownClaimsAsIs": false,
  "mapIdentities": false,
  "nameIdentifierFormat":
  "urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress", "nameIdentifierProbes": [
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" ]
}

The diagram depicts the configuration of Auth0 SAML parameters

Figure 9. Configuring Auth0 SAML parameters

  • Click on Enable to save the SAML configurations.
  • Go to the Usage tab, and click on the Download button to download Identity Provider Metadata, see Figure 10.
The diagram depicts the downloading of Auth0 identity provider meta data for SAML configuration

Figure 10. Downloading Auth0 identity provider metadata for SAML configuration

Amazon OpenSearch SAML configuration

  • Switch back to Amazon OpenSearch domain:
    • Navigate to Amazon OpenSearch console
    • Click on Actions, then click on Modify Security configuration
    • Select Enable SAML authentication check box
  • Under Import IdP metadata section (Figure 11):
    • Metadata from IdP: Import the Auth0 identity provider metadata from downloaded XML file
    • SAML master backend role: opensearch (Auth0 group). Provide a SAML backend role/group SAML assertion key for group SSO into Kibana
The diagram depicts the configuration of Amazon OpenSearch SAML parameters

Figure 11. Configuring Amazon OpenSearch SAML parameters

  • Under Optional SAML settings (Figure 12):
    • Leave Subject Key as blank, as Auth0 provides NameIdentifier
    • Role key should be http://schemas.xmlsoap.org/claims/Group. Auth0 lets you view a sample assertion during the configuration process by clicking on the DEBUG button on SAML2 WebApp. Tools like SAML-tracer can help you examine and troubleshoot the contents of real assertions.
    • Session time to live (mins): 60
The diagram depicts the configuration of Amazon OpenSearch optional SAML parameters

Figure 12. Configuring Amazon OpenSearch optional SAML parameters

Click on the Save changes button to complete Amazon OpenSearch SAML configuration for Kibana. We have successfully completed SAML configuration and are now ready for testing.

Validating access with Auth0 users

  • Access OpenSearch Dashboards from the previously created OpenSearch cluster. The OpenSearch Dashboards URL can be found as shown in Figure 13. The first access to the OpenSearch Dashboards URL redirects you to the Auth0 login screen.
The diagram depicts the validation of Auth0 users access with Amazon OpenSearch

Figure 13. Validating Auth0 users access with Amazon OpenSearch

  • Now copy and paste the OpenSearch Dashboards URL in your browser, and enter the user credentials.
  • If your OpenSearch domain is hosted within a private VPC, you will not be able to access your OpenSearch Dashboard over the public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch cluster and your identity provider.
  • You can create a Mac or Windows EC2 instance within the same VPC. This way you can access Amazon OpenSearch Dashboards from your EC2 instance’s web browser to validate your SAML configuration. You can also access Amazon OpenSearch Dashboards through Site-to-Site VPN from an on-premises environment.
  • After successful login, you will be redirected into the OpenSearch Dashboards home page. Explore our sample data and visualizations in OpenSearch Dashboards, as shown in Figure 14.
SAML authenticated Amazon OpenSearch Dashboards

Figure 14. SAML authenticated Amazon OpenSearch Dashboards

  • You now have successfully federated Amazon OpenSearch Dashboards with Auth0 as an identity provider. You can connect OpenSearch Dashboards by using your Auth0 credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created to avoid incurring future charges. Refer to these links:

Conclusion

In this blog post, we have demonstrated how to set up Auth0 as an identity provider over SAML authentication for Amazon OpenSearch Dashboards access. With this solution, you now have an OpenSearch Dashboard that uses Auth0 as the custom identity provider for your users. This reduces the customer login process to one set of credentials and improves employee productivity.

Get started by checking the Amazon OpenSearch Developer Guide, which provides guidance on how to build applications using Amazon OpenSearch for your operational analytics.

How Parametric Built Audit Surveillance using AWS Data Lake Architecture

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/how-parametric-built-audit-surveillance-using-aws-data-lake-architecture/

Parametric Portfolio Associates (Parametric), a wholly owned subsidiary of Morgan Stanley, is a registered investment adviser. Parametric provides investment advisory services to individual and institutional investors around the world. Parametric manages over 100,000 client portfolios with assets under management exceeding $400B (as of 9/30/21).

As a registered investment adviser, Parametric is subject to numerous regulatory requirements. The Parametric Compliance team conducts regular reviews on the firm’s portfolio management activities. To accomplish this, the organization needs both active and archived audit data to be readily available.

Parametric’s on-premises data lake solution was based on an MS-SQL server. They used an Apache Hadoop platform for their data storage, data management, and analytics. Significant gaps existed with the on-premises solution, which complicated audit processes. They were spending a large amount of effort on system maintenance, operational management, and software version upgrades. This required expensive consulting services and challenges with keeping the maintenance windows updated. This limited their agility, and also impacted their ability to derive more insights and value from their data. In an environment of rapid growth, adoption of more sophisticated analytics tools and processes has been slower to evolve.

In this blog post, we will show how Parametric implemented their Audit Surveillance Data Lake on AWS with purpose-built fully managed analytics services. With this solution, Parametric was able to respond to various audit requests within hours rather than days or weeks. This resulted in a system with a cost savings of 5x, with no data growth. Additionally, this new system can seamlessly support a 10x data growth.

Audit surveillance platform

The Parametric data management office (DMO) was previously running their data workloads using an on-premises data lake, which ran on the Hortonworks data platform of Apache Hadoop. This platform wasn’t up to date, and Parametric’s hardware was reaching end-of-life. Parametric was faced with a decision to either reinvest in their on-premises infrastructure or modernize their infrastructure using a modern data analytics platform on AWS. After doing a detailed cost/benefit analysis, the DMO calculated a 5x cost savings by using AWS. They decided to move forward and modernize with AWS due to these cost benefits, in addition to elasticity and security features.

The PPA compliance team asked the DMO to provide an enterprise data service to consume data from a data lake. This data was destined for downstream applications and ad-hoc data querying capabilities. It was accessed via standard JDBC tools and user-friendly business intelligence dashboards. The goal was to ensure that seven years of audit data would be readily available.

The DMO team worked with AWS to conceptualize an audit surveillance data platform architecture and help accelerate the implementation. They attended a series of AWS Immersion Days focusing on AWS fundamentals, Data Lakes, Devops, Amazon EMR, and serverless architectures. They later were involved in a four-day AWS Data Lab with AWS SMEs to create a data lake. The first use case in this Lab was creating the Audit Surveillance system on AWS.

Audit surveillance architecture on AWS

The following diagram shows the Audit Surveillance data lake architecture on AWS by using AWS purpose-built analytics services.

Figure 1. Audit Surveillance data lake architecture diagram

Figure 1. Audit Surveillance data lake architecture diagram

Architecture flow

  1. User personas: As first step, the DMO team identified three user personas for the Audit Surveillance system on AWS.
    • Data service compliance users who would like to consume audit surveillance data from the data lake into their respective applications through an enterprise data service.
    • Business users who would like to create business intelligence dashboards using a BI tool to audit data for compliance needs.
    • Complaince IT users who would like to perform ad-hoc queries on the data lake to perform analytics using an interactive query tool.
  2. Data ingestion: Data is ingested into Amazon Simple Storage Service (S3) from different on-premises data sources by using AWS Lake Formation blueprints. AWS Lake Formation provides workflows that define the data source and schedule to import data into the data lake. It is a container for AWS Glue crawlers, jobs, and triggers that are used to orchestrate the process to load and update the data lake.
  3. Data storage: Parametric used Amazon S3 as a data storage to build an Audit Surveillance data lake, as it has unmatched 11 nines of durability and 99.99% availability. The existing Hadoop storage was replaced with Amazon S3. The DMO team created a drop zone (raw), an analytics zone (transformed), and curated (enriched) storage layers for their data lake on AWS.
  4. Data cataloging: AWS Glue Data Catalog was the central catalog used to store and manage metadata for all datasets hosted in the Audit Surveillance data lake. The existing Hadoop metadata store was replaced with AWS Glue Data Catalog. AWS services such as AWS Glue, Amazon EMR, and Amazon Athena, natively integrate with AWS Glue Data Catalog.
  5. Data processing: Amazon EMR and AWS Glue process the raw data and places it into analytics zones (transformed) and curated zones (enriched) S3 buckets. Amazon EMR was used for big data processing and AWS Glue for standard ETL processes. AWS Lambda and AWS Step Functions were used to initiate monitoring and ETL processes.
  6. Data consumption: After Audit Surveillance data was transformed and enriched, the data was consumed by various personas within the firm as follows:
    • AWS Lambda and Amazon API Gateway were used to support consumption for data service compliance users.
    • Amazon QuickSight was used to create business intelligence dashboards for compliance business users.
    • Amazon Athena was used to query transformed and enriched data for compliance IT users.
  7. Security: AWS Key Management Service (KMS) customer managed keys were used for encryption at rest, and TLS for encryption at transition. Access to the encryption keys is controlled using AWS Identity and Access Management (IAM) and is monitored through detailed audit trails in AWS CloudTrail. Amazon CloudWatch was used for monitoring, and thresholds were created to determine when to send alerts.
  8. Governance: AWS IAM roles were attached to compliance users that permitted the administrator to grant access. This was only given to approved users or programs that went through authentication and authorization through AWS SSO. Access is logged and permissions can be granted or denied by the administrator. AWS Lake Formation is used for fine-grained access controls to grant/revoke permissions at the database, table, or column-level access.

Conclusion

The Parametric DMO team successfully replaced their on-premises Audit Surveillance Data Lake. They now have a modern, flexible, highly available, and scalable data platform on AWS, with purpose-built analytics services.

This change resulted in a 5x cost savings, and provides for a 10x data growth. There are now fast responses to internal and external audit requests (hours rather than days or weeks). This migration has given the company access to a wider breadth of AWS analytics services, which offers greater flexibility and options.

Maintaining the on-premises data lake would have required significant investment in both hardware upgrade costs and annual licensing and upgrade vendor consulting fees. Parametric’s decision to migrate their on-premises data lake has yielded proven cost benefits. And it has introduced new functions, service, and capabilities that were previously unavailable to Parametric DMO.

You may also achieve similar efficiencies and increase scalability by migrating on-premises data platforms into AWS. Read more and get started on building Data Lakes on AWS.

Amazon SES configuration for an external SMTP provider with Auth0

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/messaging-and-targeting/amazon-ses-configuration-for-an-external-smtp-provider-with-auth0/

Many organizations are using an external identity provider to manage user identities. With an identity provider (IdP), customers can manage their user identities outside of AWS and give these external user identities permissions to use AWS resources in customer AWS accounts. The most common requirement when setting up an external identity provider is sending outgoing emails, such as verification e-mails using a link or code, welcome e-mails, MFA enrollment, password changes and blocked account e-mails. This said, most external identity providers’ existing e-mail infrastructure is limited to testing e-mails only and customers need to set up an external SMTP provider for outgoing e-mails.

Managing and running e-mail servers on-premises or deploying an EC2 instance dedicated to run a SMTP server is costly and complex. Customers have to manage operational issues such as hardware, software installation, configuration, patching, and backups.

In this blog post, we will provide step-by-step guidance showing how you can set up Amazon SES as an external SMTP provider with Auth0 to take advantage of Amazon SES capabilities like sending email securely, globally, and at scale.

Amazon Simple Email Service (SES) is a cost-effective, flexible, and scalable email service that enables developers to send email from within any application. You can configure Amazon SES quickly to support several email use cases, including transactional, marketing, or mass email communications.

Auth0 is an identity provider that provides flexible, drop-in solution to add authentication and authorization services (Identity as a Service, or IDaaS) to customer applications. Auth0’s built-in email infrastructure should be used for testing emails only. Auth0 allows you to configure your own SMTP email provider so you can more completely manage, monitor, and troubleshoot your email communications.

Overview of solution

In this blog post, we’ll show you how to perform the below steps to complete the integration between Amazon SES and Auth0

  • Amazon SES setup for sending emails with SMTP credentials and API credentials
  • Auth0 setup to configure Amazon SES as an external SMTP provider
  • Testing the Configuration

The following diagram shows the architecture of the solution.

Prerequisites

Amazon SES Setup

As first step, you must configure a “Sandbox” account within Amazon SES and verify a sender email address for initial testing. Once all the setup steps are successful, you can convert this account into Production and the SES service will be accepting all emails and for more details on this topic, please see the Amazon SES documentation.

1. Log in to the Amazon SES console and choose the Verify a New Email Address button.

2. Once the verification is completed, the Verification Status will change to green under Verification Status  

3. You need to create SMTP credentials which will be used by Auth0 for sending emails.  To create the credentials, click on SMTP settings from left menu and press the Create My SMTP Credentials button.

Please note down the Server Name as it will be required during Auth0 setup.

4. Enter a meaningful username like autho-ses-user and click on Create bottom in the bottom-right page

5. You can see the SMTP username and password on the screen and also, you can download SMTP credentials into a csv file as shown below.

Please note the SMTP User name and SMTP Password as it will be required during Auth0 setup.

6. You need Access key ID and Secret access key of the SES IAM user autho-ses-user as created in step 3 for configuring Amazon SES with API credentials in Auth0.

  • Navigate to the AWS IAM console and click on Users in left menu
  • Double click on autho-ses-user IAM user and then, click on Security credentials

  • Choose on Create access key button to create new Access key ID and Secret access key. You can see the Access key ID and Secret access key on the screen and also, you can download them into a csv file as shown below.

Please note down the Access key ID and Secret access key as it will be required during Auth0 setup.

Auth0 Setup

To ensure that emails can be sent from Auth0 to your Amazon SES SMTP, you need to configure Amazon SES details into Auth0. There are two ways you can use Amazon SES credentials with Auth0, one with SMTP and the other with API credentials.

1. Navigate to auth0 Dashboard, Select Branding and then, Email Provider from left menu. Enable Use my own email provider button as shown below.

2. Let us start with Auth0 configuration with Amazon SES SMTP credentials.

  • Click on SMTP Provider option as shown below

  • Provide below SMTP Provider settings as shown below and then, click on Save button complete the setup.
    • From: Your from email address.
    • Host: Your Amazon SES Server name as created in step 2 of Amazon SES setup. For this example, it is email-smtp.us-west-1.amazonaws.com
    • Port: 465
    • User Name: Your Amazon SES SMTP user name as created in step 4 of Amazon SES setup.
    • Password: Your Amazon SES SMTP password as created in step 4 of Amazon SES setup.

  • Choose on Send test email button to test Auth0 configuration with Amazon SES SMTP credentials.
  • You can look at Autho logs to validate your test as shown below.

  • If you have configured it successfully, you should receive an email from auth0 as shown below.

3. Now, complete Auth0 configuration with Amazon SES API credentials.

  • Click on Amazon SES as shown below

  • Provide Amazon SES settings as shown below and then, click on Save button complete the setup.
    • From: Your from email address.
    • KeyKey Id: Your autho-ses-user IAM user’s Access key ID as created in step 5 of Amazon SES setup.
    • Secret access key: Your autho-ses-user IAM user’s Secret access key as created in step 5 of Amazon SES setup.
    • Region: For this example, choose us-west-1.

  • Click on the Send test email button to test Auth0 configuration with Amazon SES API credentials.
  • You can look at Auth0 logs and If you have configured successfully, you should receive an email from auth0 as illustrated in Auth0 configuration with Amazon SES SMTP credentials section.

Conclusion

In this blog post, we have demonstrated how to setup Amazon SES as an external SMTP email provider with Auth0 as Auth0’s built-in email infrastructure is limited for testing emails. We have also demonstrated how quickly and easily you can setup Amazon SES with SMTP credentials and API credentials. With this solution you can setup your own Amazon SES with Auth0 as an email provider. You can also get a JumpStart by checking the Amazon SES Developer guide, which provides guidance on Amazon SES that provides an easy, cost-effective way for you to send and receive email using your own email addresses and domains.

About the authors

Raghavarao Sodabathina

Raghavarao Sodabathina

Raghavarao Sodabathina is an Enterprise Solutions Architect at AWS. His areas of focus are Data Analytics, AI/ML, and the Serverless Platform. He engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.

 

Pawan Matta

Pawan Matta is a Boston-based Gametech Solutions Architect for AWS. He enjoys working closely with customers and supporting their digital native business. His core areas of focus are management and governance and cost optimization. In his free time, Pawan loves watching cricket and playing video games with friends.

How to Accelerate Building a Lake House Architecture with AWS Glue

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/how-to-accelerate-building-a-lake-house-architecture-with-aws-glue/

Customers are building databases, data warehouses, and data lake solutions in isolation from each other, each having its own separate data ingestion, storage, management, and governance layers. Often these disjointed efforts to build separate data stores end up creating data silos, data integration complexities, excessive data movement, and data consistency issues. These issues are preventing customers from getting deeper insights. To overcome these issues and easily move data around, a Lake House approach on AWS was introduced.

In this blog post, we illustrate the AWS Glue integration components that you can use to accelerate building a Lake House architecture on AWS. We will also discuss how to derive persona-centric insights from your Lake House using AWS Glue.

Components of the AWS Glue integration system

AWS Glue is a serverless data integration service that facilitates the discovery, preparation, and combination of data. It can be used for analytics, machine learning, and application development. AWS Glue provides all of the capabilities needed for data integration. So you can start analyzing your data and putting it to use in minutes, rather than months.

The following diagram illustrates the various components of the AWS Glue integration system.

Figure 1. AWS Glue integration components

Figure 1. AWS Glue integration components

Connect – AWS Glue allows you to connect to various data sources anywhere

Glue connector: AWS Glue provides built-in support for the most commonly used data stores. You can use Amazon Redshift, Amazon RDS, Amazon Aurora, Microsoft SQL Server, MySQL, MongoDB, or PostgreSQL using JDBC connections. AWS Glue also allows you to use custom JDBC drivers in your extract, transform, and load (ETL) jobs. For data stores that are not natively supported such as SaaS applications, you can use connectors. You can also subscribe to several connectors offered in the AWS Marketplace.

Glue crawlers: You can use a crawler to populate the AWS Glue Data Catalog with tables. A crawler can crawl multiple data stores in a single pass. Upon completion, the crawler creates or updates one or more tables in your Data Catalog. Extract, transform, and load (ETL) jobs that you define in AWS Glue use these Data Catalog tables as sources and targets.

Catalog – AWS Glue simplifies data discovery and governance

Glue Data Catalog: The Data Catalog serves as the central metadata catalog for the entire data landscape.

Glue Schema Registry: The AWS Glue Schema Registry allows you to centrally discover, control, and evolve data stream schemas. With AWS Glue Schema Registry, you can manage and enforce schemas on your data streaming applications.

Data quality – AWS Glue helps you author and monitor data quality rules

Glue DataBrew: AWS Glue DataBrew allows data scientists and data analysts to clean and normalize data. You can use a visual interface, reducing the time it takes to prepare data by up to 80%. With Glue DataBrew, you can visualize, clean, and normalize data directly from your data lake, data warehouses, and databases.

Curate data: You can use either Glue development endpoint or AWS Glue Studio to curate your data.

AWS Glue development endpoint is an environment that you can use to develop and test your AWS Glue scripts. You can choose either Amazon SageMaker notebook or Apache Zeppelin notebook as an environment.

AWS Glue Studio is a new visual interface for AWS Glue that supports extract-transform-and-load (ETL) developers. You can author, run, and monitor AWS Glue ETL jobs. You can now use a visual interface to compose jobs that move and transform data, and run them on AWS Glue.

AWS Data Exchange makes it easy for AWS customers to securely exchange and use third-party data in AWS. This is for data providers who want to structure their data across multiple datasets or enrich their products with additional data. You can publish additional datasets to your products using the AWS Data Exchange.

Deequ is an open-source data quality library developed internally at Amazon, for data quality. It provides multiple features such as automatic constraint suggestions and verification, metrics computation, and data profiling.

Build a Lake House architecture faster, using AWS Glue

Figure 2 illustrates how you can build a Lake House using AWS Glue components.

Figure 2. Building lake house architectures with AWS Glue

Figure 2. Building Lake House architectures with AWS Glue

The architecture flow follows these general steps:

  1. Glue crawlers scan the data from various data sources and populate the Data Catalog for your Lake House.
  2. The Data Catalog serves as the central metadata catalog for the entire data landscape.
  3. Once data is cataloged, fine-grained access control is applied to the tables through AWS Lake Formation.
  4. Curate your data with business and data quality rules by using Glue Studio, Glue development endpoints, or Glue DataBrew. Place transformed data in a curated Amazon S3 for purpose built analytics downstream.
  5. Facilitate data movement with AWS Glue to and from your data lake, databases, and data warehouse by using Glue connections. Use AWS Glue Elastic views to replicate the data across the Lake House.

Derive persona-centric insights from your Lake House using AWS Glue

Many organizations want to gather observations from increasingly larger volumes of acquired data. These insights help them make data-driven decisions with speed and agility. They must use a central data lake, a ring of purpose-built data services, and data warehouses based on persona or job function.

Figure 3 illustrates the Lake House inside-out data movement with AWS Glue DataBrew, Amazon Athena, Amazon Redshift, and Amazon QuickSight to perform persona-centric data analytics.

Figure 3. Lake house persona-centric data analytics using AWS Glue

Figure 3. Lake House persona-centric data analytics using AWS Glue

This shows how Lake House components serve various personas in an organization:

  1. Data ingestion: Data is ingested to Amazon Simple Storage Service (S3) from different sources.
  2. Data processing: Data curators and data scientists use DataBrew to validate, clean, and enrich the data. Amazon Athena is also used to run improvised queries to analyze the data in the lake. The transformation is shared with data engineers to set up batch processing.
  3. Batch data processing: Data engineers or developers set up batch jobs in AWS Glue and AWS Glue DataBrew. Jobs can be initiated by an event, or can be scheduled to run periodically.
  4. Data analytics: Data/Business analysts can now analyze prepared dataset in Amazon Redshift or in Amazon S3 using Athena.
  5. Data visualizations: Business analysts can create visuals in QuickSight. Data curators can enrich data from multiple sources. Admins can enforce security and data governance. Developers can embed QuickSight dashboard in applications.

Conclusion

Using a Lake House architecture will help you get persona-centric insights quickly from all of your data based on user role or job function. In this blog post, we describe several AWS Glue components and AWS purpose-built services that you can use to build Lake House architectures on AWS. We have also presented persona-centric Lake House analytics architecture using AWS Glue, to help you derive insights from your Lake House.

Read more and get started on building Lake House Architectures on AWS.

Analyze Fraud Transactions using Amazon Fraud Detector and Amazon Athena

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/analyze-fraud-transactions-using-amazon-fraud-detector-and-amazon-athena/

Organizations with online businesses have to be on guard constantly for fraudulent activity, such as fake accounts or payments made with stolen credit cards. One way they try to identify fraudsters is by using fraud detection applications. Some of these applications use machine learning (ML).

A common challenge with ML is the need for a large, labeled dataset to create ML models to detect fraud. You will also need the skill set and infrastructure to build, train, deploy, and scale your ML model.

In this post, I discuss how to perform fraud detection on a batch of many events using Amazon Fraud Detector. Amazon Fraud Detector is a fully managed service that can identify potentially fraudulent online activities. These can be situations such as the creation of fake accounts or online payment fraud. Unlike general-purpose ML packages, Amazon Fraud Detector is designed specifically to detect fraud. You can analyze fraud transaction prediction results by using Amazon Athena and Amazon QuickSight. I will explain how to review fraud using Amazon Fraud Detector and Amazon SageMaker built-in algorithms.

Batch fraud prediction use cases

You can use a batch predictions job in Amazon Fraud Detector to get predictions for a set of events that do not require real-time scoring. You may want to generate fraud predictions for a batch of events. These might be payment fraud, account take over or compromise, and free tier misuse while performing an offline proof-of-concept. You can also use batch predictions to evaluate the risk of events on an hourly, daily, or weekly basis depending upon your business need.

Batch fraud insights using Amazon Fraud Detector

Organizations such as ecommerce companies and credit card companies use ML to detect the fraud. Some of the most common types of fraud include email account compromise (personal or business), new account fraud, and non-payment or non-delivery (which includes compromised card numbers).

Amazon Fraud Detector automates the time-consuming and expensive steps to build, train, and deploy an ML model for fraud detection. Amazon Fraud Detector customizes each model it creates to your dataset, making the accuracy of models higher than current one-size-fits-all ML solutions. And because you pay only for what you use, you can avoid large upfront expenses.

If you want to analyze fraud transactions after the fact, you can perform batch fraud predictions using Amazon Fraud Detector. Then you can store fraud prediction results in an Amazon S3 bucket. Amazon Athena helps you analyze the fraud prediction results. You can create fraud prediction visualization dashboards using Amazon QuickSight.

The following diagram illustrates how to perform fraud predictions for a batch of events and analyze them using Amazon Athena.

Figure 1. Example architecture for analyzing fraud transactions using Amazon Fraud Detector and Amazon Athena

Figure 1. Example architecture for analyzing fraud transactions using Amazon Fraud Detector and Amazon Athena

The architecture flow follows these general steps:

  1. Create and publish a detector. First create and publish a detector using Amazon Fraud Detector. It should contain your fraud prediction model and rules. For additional details, see Get started (console).
  2. Create an input Amazon S3 bucket and upload your CSV file. Prepare a CSV file that contains the events you want to evaluate. Then upload your CSV file into the input S3 bucket. In this file, include a column for each variable in the event type associated with your detector. In addition, include columns for EVENT_ID, ENTITY_ID, EVENT_TIMESTAMP, ENTITY_TYPE. Refer to Amazon Fraud Detector batch input and output files for more details. Read Create a variable for additional information on Amazon Fraud Detector variable data types and formatting.
  3. Create an output Amazon S3 bucket. Create an output Amazon S3 bucket to store your Amazon Fraud Detector prediction results.
  4. Perform a batch prediction. You can use a batch predictions job in Amazon Fraud Detector to get predictions for a set of events that do not require real-time scoring. Read more here about Batch predictions.
  5. Review your prediction results. Review your results in the CSV file that is generated and stored in the Amazon S3 output bucket.
  6. Analyze your fraud prediction results.
    • After creating a Data Catalog by using AWS Glue, you can use Amazon Athena to analyze your fraud prediction results with standard SQL.
    • You can develop user-friendly dashboards to analyze fraud prediction results using Amazon QuickSight by creating new datasets with Amazon Athena as your data source.

Fraud detection using Amazon SageMaker

The Amazon Web Services (AWS) Solutions Implementation, Fraud Detection Using Machine Learning, enables you to run automated transaction processing. This can be on an example dataset or your own dataset. The included ML model detects potentially fraudulent activity and flags that activity for review. The diagram following presents the architecture you can automatically deploy using the solution’s implementation guide and accompanying AWS CloudFormation template.

SageMaker provides several built-in machine learning algorithms that you can use for a variety of problem types. This solution leverages the built-in Random Cut Forest algorithm for unsupervised learning and the built-in XGBoost algorithm for supervised learning. In the SageMaker Developer Guide, you can see how Random Cut Forest and XGBoost algorithms work.

Figure 2. Fraud detection using machine learning architecture on AWS

Figure 2. Fraud detection using machine learning architecture on AWS

This architecture can be segmented into three phases.

  1. Develop a fraud prediction machine learning model. The AWS CloudFormation template deploys an example dataset of credit card transactions contained in an Amazon Simple Storage Service (Amazon S3) bucket. An Amazon SageMaker notebook instance with different ML models will be trained on the dataset.
  2. Perform fraud prediction. The solution also deploys an AWS Lambda function that processes transactions from the example dataset. It invokes the two SageMaker endpoints that assign anomaly scores and classification scores to incoming data points. An Amazon API Gateway REST API initiates predictions using signed HTTP requests. An Amazon Kinesis Data Firehose delivery stream loads the processed transactions into another Amazon S3 bucket for storage. The solution also provides an example of how to invoke the prediction REST API as part of the Amazon SageMaker notebook.
  3. Analyze fraud transactions. Once the transactions have been loaded into Amazon S3, you can use analytics tools and services for visualization, reporting, ad-hoc queries, and more detailed analysis.

By default, the solution is configured to process transactions from the example dataset. To use your own dataset, you must modify the solution. For more information, see Customization.

Conclusion

In this post, we showed you how to analyze fraud transactions using Amazon Fraud Detector and Amazon Athena. You can build fraud insights using Amazon Fraud Detector and Amazon SageMaker built-in algorithms Random Cut Forest and XGBoost. With the information in this post, you can build your own fraud insights models on AWS. You’ll be able to detect fraud faster. Finally, you’ll be able to solve a variety of fraud types. These can be new account fraud, online transaction fraud, and fake reviews, among others.

Read more and get started on building fraud detection models on AWS.

Architecting Persona-centric Data Platform with On-premises Data Sources

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/architecting-persona-centric-data-platform-with-on-premises-data-sources/

Many organizations are moving their data from silos and aggregating it in one location. Collecting this data in a data lake enables you to perform analytics and machine learning on that data. You can store your data in purpose-built data stores, like a data warehouse, to get quick results for complex queries on structured data.

In this post, we show how to architect a persona-centric data platform with on-premises data sources by using AWS purpose-built analytics services and Apache NiFi. We will also discuss Lake House architecture on AWS, which is the next evolution from data warehouse and data lake-based solutions.

Data movement services

AWS provides a wide variety of services to bring data into a data lake:

You may want to bring on-premises data into the AWS Cloud to take advantage of AWS purpose-built analytics services, derive insights, and make timely business decisions. Apache NiFi is an open source tool that enables you to move and process data using a graphical user interface.

For this use case and solution architecture, we use Apache NiFi to ingest data into Amazon S3 and AWS purpose-built analytics services, based on user personas.

Building persona-centric data platform on AWS

When you are building a persona-centric data platform for analytics and machine learning, you must first identify your user personas. Who will be using your platform? Then choose the appropriate purpose-built analytics services. Envision a data platform analytics architecture as a stack of seven layers:

  1. User personas: Identify your user personas for data engineering, analytics, and machine learning
  2. Data ingestion layer: Bring the data into your data platform and data lineage lifecycle view, while ingesting data into your storage layer
  3. Storage layer: Store your structured and unstructured data
  4. Cataloging layer: Store your business and technical metadata about datasets from the storage layer
  5. Processing layer: Create data processing pipelines
  6. Consumption layer: Enable your user personas for purpose-built analytics
  7. Security and Governance: Protect your data across the layers

Reference architecture

The following diagram illustrates how to architect a persona-centric data platform with on-premises data sources by using AWS purpose-built analytics services and Apache NiFi.

Figure 1. Example architecture for persona-centric data platform with on-premises data sources

Figure 1. Example architecture for persona-centric data platform with on-premises data sources

Architecture flow:

    1. Identify user personas: You must first identify user personas to derive insights from your data platform. Let’s start with identifying your users:
      • Enterprise data service users who would like to consume data from your data lake into their respective applications.
      • Business users who would like to like create business intelligence dashboards by using your data lake datasets.
      • IT users who would like to query data from your data lake by using traditional SQL queries.
      • Data scientists who would like to run machine learning algorithms to derive recommendations.
      • Enterprise data warehouse users who would like to run complex SQL queries on your data warehouse datasets.
    2. Data ingestion layer: Apache NiFi scans the on-premises data stores and ingest the data into your data lake (Amazon S3). Apache NiFi can also transform the data in transit. It supports both Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) data transformations. Apache NiFi also supports data lineage lifecycle while ingesting data into Amazon S3.
    3. Storage layer: For your data lake storage, we recommend using Amazon S3 to build a data lake. It has unmatched 11 nines of durability and 99.99% availability. You can also create raw, transformed, and enriched storage layers depending upon your use case.
    4. Cataloging layer: AWS Lake Formation provides the central catalog to store and manage metadata for all datasets hosted in the data lake by AWS Glue Data Catalog. AWS services such as AWS Glue, Amazon EMR, and Amazon Athena natively integrate with Lake Formation. They automate discovering and registering dataset metadata into the Lake Formation catalog.
    5. Processing layer: Amazon EMR processes your raw data and places them into a new S3 bucket. Use AWS Glue DataBrew and AWS Glue to process the data as needed.
    6. Consumption layer or persona-centric analytics: Once data is transformed:
      • AWS Lambda and Amazon API Gateway will allow you to develop data services for enterprise data service users
      • You can develop user-friendly dashboards for your business users using Amazon QuickSight
      • Use Amazon Athena to query transformed data for your IT users
      • Your data scientists can utilize AWS Glue DataBrew to clean and normalize the data and Amazon SageMaker for machine learning models
      • Your enterprise data warehouse users can use Amazon Redshift to derive business intelligence
    7. Security and governance layer: AWS IAM provides users, groups, and role-level identity, in addition to the ability to configure coarse-grained access control for resources managed by AWS services in all layers. AWS Lake Formation provides fine-grained access controls and you can grant/revoke permissions at the database- or table- or column-level access.

Lake House architecture on AWS

The vast majority of data lakes are built on Amazon S3. At the same time, customers are leveraging purpose-built analytics stores that are optimized for specific use cases. Customers want the freedom to move data between their centralized data lakes and the surrounding purpose-built analytics stores. And they want to get insights with speed and agility in a seamless, secure, and compliant manner. We call this modern approach to analytics the Lake House architecture.

Figure 2. Lake House architecture on AWS

Figure 2. Lake House architecture on AWS

Refer to the whitepaper Derive Insights from AWS Lake house for various design patterns to derive persona-centric analytics by using the AWS Lake House approach. Check out the blog post Build a Lake House Architecture on AWS  for a Lake House reference architecture on AWS.

Conclusion

In this post, we show you how to build a persona-centric data platform on AWS with a seven-layered approach. This uses Apache NiFi as a data ingestion tool and AWS purpose-built analytics services for persona-centric analytics and machine learning. We have also shown how to build persona-centric analytics by using the AWS Lake House approach.

With the information in this post, you can now build your own data platform on AWS to gain faster and deeper insights from your data. AWS provides you the broadest and deepest portfolio of purpose-built analytics and machine learning services to support your business needs.

Read more and get started on building a data platform on AWS:

Fine-tuning blue/green deployments on application load balancer

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/devops/blue-green-deployments-with-application-load-balancer/

In a traditional approach to application deployment, you typically fix a failed deployment by redeploying an older, stable version of the application. Redeployment in traditional data centers is typically done on the same set of resources due to the cost and effort of provisioning additional resources. Applying the principles of agility, scalability, and automation capabilities of AWS can shift the paradigm of application deployment. This enables a better deployment technique called blue/green deployment.

Blue/green deployments provide near-zero downtime release and rollback capabilities. The fundamental idea behind blue/green deployment is to shift traffic between two identical environments that are running different versions of your application. The blue environment represents the current application version serving production traffic. In parallel, the green environment is staged running a newer version of your application. After the green environment is ready and tested, production traffic is redirected from blue to green. If any problems are identified, you can roll back by reverting traffic to the blue environment.

Canary deployments are a pattern for the slow rollout of new version of an existing application. The canary deployments incrementally deploy the new version, making it visible to new users in a slow fashion. As you gain confidence in the deployment, you can deploy it to replace the current version in its entirety.

AWS provides several options to help you automate and streamline your blue/green deployments and canary deployments, one such approach is using Application Load Balancer weighted target group feature. In this post, we  will cover the concepts of  target group stickiness, load balancer stickiness,  connection draining and  how they influence traffic shifting for  canary  and blue/green deployments when using the Application Load Balancer weighted target group feature .

Application Load Balancer weighted target groups

A target group is used to route requests to one or more registered targets like Amazon Elastic Compute Cloud (Amazon EC2) instances, fixed IP addresses, or AWS Lambda functions, among others. When creating a load balancer, you create one or more listeners and configure listener rules to direct the traffic to a target group.

Application Load Balancers now support weighted target groups routing. With this feature, you can add more than one target group to the forward action of a listener rule, and specify a weight for each group. For example, when you define a rule as having two target groups with weights of 9 and 1, the load balancer routes 90% of the traffic to the first target group and 10% to the other target group. You can create and configure your weighted target groups by using AWS Console , AWS CLI or AWS SDK.

For more information, see How do I set up weighted target groups for my Application Load Balancer?

Target group level stickiness

You can set target group stickiness to make sure clients get served from a specific target group for a configurable duration of time to ensure consistent experience. Target group stickiness is different from the already existing load balancer stickiness (also known as sticky sessions). Sticky sessions make sure that the requests from a client are always sticking to a particular target within a target group. Target group stickiness only ensures the requests are sent to a particular target group.

You can enable target group level stickiness using the AWS Command Line Interface (AWS CLI) with the TargetGroupStickinessConfig parameter, as shown in the following CLI command:

aws elbv2 modify-listener \
    --listener-arn " < LISTENER ARN > " \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<Blue Target Group ARN>", "Weight": 90}, \
             {"TargetGroupArn": "<Green Target Group ARN>", "Weight": 10}, \
          ],
          "TargetGroupStickinessConfig": {
             "Enabled": true,
             "DurationSeconds": 120
          }
       }
    }]'

In the next sections, we will see how to fine-tune weighted target group  configuration to achieve effective canary deployments and blue/green deployments.

Canary deployments with Application Load Balancer weighted target group

The canary deployment pattern allows you to roll out a new version of your application to a subset of users before making it widely available. This can be helpful in validating the stability of a new version of the application or performing A/B testing.

For this use case, you want to perform canary deployment for your application and test it by driving only 10% of the incoming traffic to your new version for 12 hours. You need to create two weighted target groups for your Application Load Balancer and use target group stickiness set to a duration of 12 hours. When target group stickiness is enabled, the requests from a client are sent to the same target group for the specified time duration.

Blue and green target groups with weights 90 and 10 for canary deployment

Figure 1: Blue and green target groups with weights 90 and 10 for canary deployment

We can define a rule as having two target groups, blue and green, with weights of 90 and 10, respectively, and enable target group level stickiness with a duration of 12 hours (43,200 seconds). The following table summarizes this configuration. See the following CLI command:

aws elbv2 modify-listener \
    --listener-arn " < LISTENER ARN > " \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<Blue Target Group ARN>", "Weight": 90}, \
             {"TargetGroupArn": "<Green Target Group ARN>", "Weight": 10}, \
          ],
          "TargetGroupStickinessConfig": {
             "Enabled": true,
             "DurationSeconds": 43200
          }
       }
    }]'

At this point, the users with existing sessions continue to be sent to the blue target group running version 1, and 10% of the new users without a session are sent to the green target group up to 12 hours running version 2, as illustrated in the following diagram.

Blue-green deployment architecture with 90% blue traffic and 10% green traffic.

Figure 2: Blue-green deployment architecture with 90% blue traffic and 10% green traffic.

When you’re confident that the new version is performing well and stable, you can update the target group weights for your blue and green target groups to be 0% and 100%, respectively, to ensure that all the traffic is shifted to your green target group. You may still see some traffic flowing into the blue target group for existing users with active session whose target group stickiness duration (in this case target group stickiness duration is 12 hours) has not expired.

Recommendation:  As illustrated above, target group stickiness duration still influences the traffic shift between blue and green targets. So we recommend you to reduce the target group stickiness duration from 12 hours to 5 minutes or less depending upon your use case to ensure that the existing users going to the blue target group also fully transition to the green target group at the earliest. Some of our customers are using target group stickiness duration as 5 minutes to shift their traffic to green target group  after successful canary testing.

The recommended value of stickiness may vary across application types. For example, for a typical 3-tier front-end deployment, lower  target group stickiness value is desirable. However, for middle tier deployment,  the target group stickiness duration value may need to be higher to account for longer transactions.

Blue/green deployments with Application Load Balancer weighted target group

For this use case, you want you perform blue/green deployment for your application to provide near-zero downtime release and rollback capabilities. You can create two weighted target groups called blue and green with the following weights applied as an initial configuration.

Blue/green deployment configuration with blue target group 100% and green target group 0%

Figure 3: Blue/green deployment configuration with blue target group 100% and green target group 0%

When you’re ready to perform the deployment, you can change the weights for blue and green targets groups to be 0% and 100%, respectively, to shift the traffic completely to your newer version of the application.

Blue/green deployment configuration with blue target group 0% and green target group 100%

Figure 4: Blue/green deployment configuration with blue target group 0% and green target group 100%

When you’re performing blue/green deployment using weighted target groups, the recommendation is to not enable target group level stickiness so that traffic shifts immediately from the blue target group to the green target group. See the following CLI command:

aws elbv2 modify-listener \
    --listener-arn "<LISTENER ARN>" \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<Blue Target Group>", "Weight": 0}, \
             {"TargetGroupArn": "<Green Target Group>", "Weight": 100}, \
          ]
       }
    }]'

The following diagram shows the updated architecture.

Blue-green deployment architecture with 0% blue traffic and 100% green traffic

Figure 5: Blue-green deployment architecture with 0% blue traffic and 100% green traffic

If you need to enable target group level stickiness, you can ensure that all traffic transitions from the blue target group to the green target group by keeping the target group level stickiness duration as low as possible (5 minutes or less).

In the following code, the target group level stickiness is enabled for a duration of 5 minutes and traffic is completely shifted from the blue target group to the green target group:

aws elbv2 modify-listener \
    --listener-arn "<LISTENER ARN> " \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<Blue Target Group>", "Weight": 0}, \
             {"TargetGroupArn": "<Green Target Group>", "Weight": 100}, \
          ],
          "TargetGroupStickinessConfig": {
             "Enabled": true,
             "DurationSeconds": 300
          }
       }
    }]'

The existing users with connection stickiness to the blue target group continue to the blue target group until the 5-minute duration elapses from the last request time.

Recommendation:  As illustrated above, target group stickiness duration still influences the traffic shift between blue and green targets. So we recommend you to reduce the target group stickiness duration from  5 minutes to 1 minute or less depending upon your use case to ensure that all users transition into the green target group at the earliest.

As recommended above, the recommended value of stickiness may vary across application types.

Connection draining

To provide near-zero downtime release with blue/green deployment, you want to avoid breaking open network connections while taking an instance out of service, updating its software, or replacing it with a fresh instance that contains updated software.  In the above use cases, you can ensure graceful transition between  blue and green target groups by enabling the connection draining feature for your Elastic Load Balancers. You can do this from the AWS Management Console, the AWS CLI, or by calling the ModifyLoadBalancerAttributes function in the Elastic Load Balancing API. You can enable the feature and enter a timeout between 1 second and 1 hour. The connection time out duration depends upon your application profile. If  your application is stateless like your customers are using your website, connection time out duration of lowest value is preferable. Applications that are  transactions heavy  and connection oriented sessions like web sockets, we recommend you to choose relatively high connection draining duration as it will impact the customer experience adversely.

Load balancer stickiness

In addition to the target group level stickiness, Application Load Balancer also supports load balancer level stickiness. When a load balancer first receives a request from a client, it routes the request to a target, generates a cookie named AWSALB that encodes information about the selected target, encrypts the cookie, and includes the cookie in the response to the client. The client should include the cookie that it receives in subsequent requests to the load balancer. When the load balancer receives a request from a client that contains the cookie, if sticky sessions are enabled for the target group and the request goes to the same target, the load balancer detects the cookie and routes the request to the same target. If the cookie is present but can’t be decoded, or if it refers to a target that was deregistered or is unhealthy, the load balancer selects a new target and updates the cookie with information about the new target.

You can enable Application Load Balancer stickiness using the AWS CLI or the console. You can specify a value between 1 second–7 days.

In the context of blue/green and canary deployments, the load balancer stickiness has no influence on the traffic shifting behavior using the weighted target groups because target group stickiness takes precedence over load balancer stickiness.

Conclusion

In this post, we showed how to perform canary and blue/green deployments with Application Load Balancer’s weighted target group feature and how target group level stickiness impacts your canary and blue/green deployments. We also demonstrated how quickly you can enable ELB connection draining to provide near-zero downtime release with blue/green deployment. We hope that you find these recommendations helpful when you build a blue/green deployment with Application Load Balancer. You can reach out to AWS Solutions Architects and AWS Support teams for further assistance.

 

Raghavarao Sodabathina is an Enterprise Solutions Architect at AWS, focusing on Data Analytics, AI/ML, and Serverless Platform. He engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.

 

 

Siva Rajamani is a Boston-based Enterprise Solutions Architect for AWS. He enjoys working closely with customers, supporting their digital transformation and AWS adoption journey. His core areas of focus are Serverless, Application Integration, and Security. Outside of work, he enjoys outdoor activities and watching documentaries.

 

 

 

TAGS: blue-green deployments