Implementing patterns that exit early out of a parallel state in AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/implementing-patterns-that-exit-early-out-of-a-parallel-state-in-aws-step-functions/

This post is written by Madhav Vishnubhatta, Senior Technical Account Manager, Enterprise Support.

This blog post explains how to implement patterns in AWS Step Functions that control the break out of a parallel state as soon as a minimum requirement is met. The parallel state usually completes only when all the parallel flows inside it are completed. But if you do not want to wait for all of the parallel flows to complete before moving to the next step, this post provides patterns to help implement this functionality.

You can use AWS Step Functions to set up visual serverless workflows that orchestrate and coordinate multiple AWS services into a serverless workflow. This allows you to build complex, stateful, and scalable applications without managing the underlying infrastructure. In Step Functions, the individual steps are called states.

Step Functions offers multiple types of states. Some states help control the logic of the workflow. For example, the choice state enables conditional logic to control the flow to any one of the multiple possible next states, depending on the conditions defined in the state. The parallel state helps control the logic, but rather than choose one of multiple next states (as the choice state does), the parallel state allows all the branches to run as parallel flows concurrently. When all the parallel flows are complete, control moves on to the Parallel state’s next state.

Patterns that do not need to wait for all parallel flows to finish

Consider a scenario where the Step Functions workflow represents the process of an employee requesting a laptop in your organization. The process begins with a request from the employee as the first step, but the approval of this request could come from either of two IT managers.

In this case there could be two parallel flows, each waiting for an approval from one IT manager. But, as soon as one person provides approval, the workflow can move forward to the next step of actually issuing a laptop to the employee. This is an “either-or” pattern.

Consider a similar use-case but with a slightly different requirement. Instead of just one person’s approval being enough to issue a laptop, what if approval is needed from a minimum of two out of three IT managers before the laptop is issued. This is the “quorum” pattern.

The parallel state does not directly support these two patterns because the state waits for all the flows to complete. In this case, that means all the managers must provide an approval before a laptop can be issued.

Solution overview

Step Functions provides an error handling mechanism with the fail state, which can be used to fail the workflow with an error. This error can be caught downstream in the workflow and handled as needed. Both the either-or and the quorum patterns can be implemented with this fail state along with the error handling capability.

In case of either-or, as soon as the parallel flow is finished, the fail state can throw an error, which is caught outside the parallel state for further processing. Even though it is the fail state, it might not represent an error scenario in your use-case.

The quorum pattern needs an additional mechanism to store the status of each parallel flow, using an Amazon DynamoDB table. The quorum pattern creates an item in the DynamoDB table at the beginning of the workflow that is updated by each parallel flow as soon as it has completed. Each parallel flow checks the DynamoDB table to look at the number of processes that have completed and compare it against the quorum. If the quorum is met, that flow raises an error with a fail state that can be caught outside the parallel step.

Prerequisites

Both of these patterns are published on Serverless Land:

To deploy and use these patterns, you need:

  1. An AWS Account
  2. Access to login as a user or assume a role that can:
  3. Familiarity with AWS Serverless Application Model (AWS SAM).
  4. AWS SAM Command Line Interface installed.

Example walkthrough

Either-or pattern

To deploy the Either-or pattern, follow the deployment Instructions section in the GitHub repo. This deployment creates the following resources:

  1. A Step Functions workflow.
  2. An IAM role that is assumed by the Step Functions workflow during execution.

Navigate to the AWS CloudFormation page in the AWS Management Console and choose the stack with the name provided during deployment. Choose the State Machine resource in the Resources section of the CloudFormation stack to go to the Step Functions console. Choose Edit and then choose WorkflowStudio to see a visual representation of the workflow.

You can see the exported workflow in the GitHub repo. This is the logic of the workflow:

Either-or patter. Conceptual flow.

  1. There are three (numbered) parallel flows in this workflow.
  2. Flows #1 and #2 are the main parallel flows, one of which completing should move the control to outside the Parallel state.
  3. Flow #3 is the time out flow so that the workflow can exit after a set amount of time if neither of the other two parallel flows complete by then.
  4. Each of the two main parallel flows follows the following logic:
    • Wait for the process to complete. This is a filler and can be replaced with your business logic on how to monitor process completion. This could be a human approval, or any other job that needs to finish.
    • Once process is complete, throw a dummy error, which moves control to outside the parallel state.
  5. The dummy errors for the two flows are caught outside the parallel state with corresponding catch condition.
  6. The errors from the two flows need not be caught separately. You might just do the same action no matter which of the parallel flows finished, but I show separate steps in case you need to do something different based on which parallel flow finished.

To test the workflow, follow the instructions provided in the Testing section of the README file at the GitHub repo.

To clean up the resources created, run:

sam delete

Quorum pattern

To deploy the Quorum pattern, follow the Deployment Instructions section in the GitHub repo. This deployment creates the following resources:

  1. A Step Functions workflow.
  2. An IAM role that is assumed by the Step Functions workflow during execution.
  3. A DynamoDB Table called “QuorumWorkflowTable”.

Navigate to CloudFormation in the AWS Management Console and choose the stack with the name provided during deployment. Choose the state machine resource in the Resources section of the CloudFormation stack to go to the Step Functions console.

Choose Edit and then choose WorkflowStudio to see a visual representation of the workflow.

You can see the the exported workflow in the GitHub repo. This is the logic of the workflow:

Quorum pattern. Conceptual flow.

  1. The first step creates an entry in the DynamoDB table with the execution ID of the workflow’s execution. This item in the table tracks the completion of processes.
  2. The next state is the parallel state, which has three parallel flows and a fourth time out flow. All the four flows are numbered.
  3. Flow #1, #2, and #3 are the main parallel flows, two of which completing should move the control to outside the parallel state.
  4. Flow #4 is the timeout flow so that the workflow can exit after a set amount of time, if neither of the other two parallel flows complete by then.
  5. Each of the three main parallel flows uses the following logic:
    • Wait for the process to complete.
    • Once complete, update the DynamoDB table entry to mark the completion of the process.
    • After the update, query the item from DynamoDB to get the list of processes that have completed and check if the quorum has been met.
    • If the quorum has been met, raise an “Error” (which is actually a success criterion in terms of business case), to move the control to outside the parallel state.

To test the workflow, follow the instructions provided in the Testing section of the README file at the GitHub repo.

To clean up the resources created, run:

sam delete

Conclusion

This blog post shows how you can implement patterns that must exit early out of a parallel state in an AWS Step Functions workflow.

The use-cases for this approach are not limited to these two patterns. More complicated use-cases like having different combinations of conditions to exit a parallel state can all be implemented using parallel and fail states.

Visit Serverless Land for more Step Functions workflow patterns.