All posts by Wang Rui

How to build custom nodes workflow with ComfyUI on Amazon EKS

Post Syndicated from Wang Rui original https://aws.amazon.com/blogs/architecture/how-to-build-custom-nodes-workflow-with-comfyui-on-amazon-eks/

ComfyUI is an open-source node-based workflow solution for Stable Diffusion and increasingly being used by many creators. We previously published a blog and solution about how to deploy ComfyUI on AWS.

Typically, ComfyUI users use various custom nodes, which extend the capabilities of ComfyUI, to build their own workflows, often using ComfyUI-Manager to conveniently install and manage their custom nodes.

Following our blog post, we received numerous customer requests to integrate ComfyUI custom nodes into our solution. This post will guide you through the process of integrating custom nodes within ComfyUI-on-EKS.

Architecture overview

Architecture diagram showing the ComfyUI integration with Amazon EKS

Figure 1. Architecture diagram showing the ComfyUI integration with Amazon EKS

To integrate custom nodes within ComfyUI-on-EKS solution, we need to prepare custom nodes codes and environment, as well as needed models:

  • Code and Environment: Custom node code is placed in $HOME/ComfyUI/custom_nodes, and the environment is prepared by running pip install -r on all requirements.txt files in the custom node directories (any dependency conflicts between custom nodes need to be handled separately). Additionally, any system packages required by the custom nodes also should be installed. All these operations are performed through the Dockerfile, building an image containing the required custom nodes.
  • Models: Models used by custom nodes are placed in different directories under s3://comfyui-models-{account_id}-{region}. This triggers a Lambda function to send commands to all GPU nodes to synchronize the newly uploaded models to local instance store.

We’ll use the Stable Video Diffusion (SVD) – Image to video generation with high FPS workflow as an example to illustrate how to integrate custom nodes (you can also use your own workflow).

Build docker image

When loading this workflow, it will display the missing custom nodes. Next, we will build the missing custom nodes into the docker image.

Error message showing the missing node types

Figure 2. Error message showing the missing node types

There are two ways to build the image:

  • Build from GitHub: In the Dockerfile, download the code for each custom node and set up the environment and dependencies separately.
  • Build locally: Copy all the custom nodes from your local Dev environment into the image and set up the environment and dependencies.

Before building the image, please switch to the corresponding branch

git clone https://github.com/aws-samples/comfyui-on-eks ~/comfyui-on-eks
cd ~/comfyui-on-eks && git checkout custom_nodes_demo

Build from GitHub

Install custom nodes and dependencies with RUN command in the Dockerfile. You’ll need to find the GitHub URLs for all missing custom nodes.

...
RUN apt-get update && apt-get install -y \
    git \
    python3.10 \
    python3-pip \
    # needed by custom node ComfyUI-VideoHelperSuite
    libsm6 \
    libgl1 \
    libglib2.0-0
...
# Custom nodes demo of https://comfyworkflows.com/workflows/bf3b455d-ba13-4063-9ab7-ff1de0c9fa75

## custom node ComfyUI-Stable-Video-Diffusion
RUN cd /app/ComfyUI/custom_nodes && git clone https://github.com/thecooltechguy/ComfyUI-Stable-Video-Diffusion.git && cd ComfyUI-Stable-Video-Diffusion/ && python3 install.py
## custom node ComfyUI-VideoHelperSuite
RUN cd /app/ComfyUI/custom_nodes && git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git && pip3 install -r ComfyUI-VideoHelperSuite/requirements.txt
## custom node ComfyUI-Frame-Interpolation
RUN cd /app/ComfyUI/custom_nodes && git clone https://github.com/Fannovel16/ComfyUI-Frame-Interpolation.git && cd ComfyUI-Frame-Interpolation/ && python3 install.py
...

Refer to comfyui-on-eks/comfyui_image/Dockerfile.github for the complete Dockerfile.

Run following command to build and push Docker image

region="us-west-2" # Modify the region to your current region.
cd ~/comfyui-on-eks/comfyui_image/ && bash build_and_push.sh $region Dockerfile.github

Building from GitHub provides a clear understanding of the installation method, version, and environmental dependencies for each custom node, providing better control over the entire ComfyUI environment.

However, when there are too many custom nodes, installation and management can be time-consuming, and you need to find the URL for each custom node yourself (on the other hand, this can also be seen as a pro, as it makes you more familiar with the entire ComfyUI environment).

Build locally

Often, we use ComfyUI-Manager to install missing custom nodes. ComfyUI-Manager hides the installation details, and we cannot clearly know which custom nodes have been installed. In this case, we can build the image by COPY the entire ComfyUI directory (except the input, output, models, and other directories) into the Dockerfile.

The prerequisite for building the image locally is that you already have a working ComfyUI environment with custom nodes. In the same directory as ComfyUI, create a .dockerignore file and add the following content to ignore these directories when building the Docker image

ComfyUI/models
ComfyUI/input
ComfyUI/output
ComfyUI/custom_nodes/ComfyUI-Manager

Copy the two files comfyui-on-eks/comfyui_image/Dockerfile.local and comfyui-on-eks/comfyui_image/build_and_push.sh to the same directory as your local ComfyUI, like this:

ubuntu@comfyui:~$ ll
-rwxrwxr-x  1 ubuntu ubuntu       792 Jul 16 10:27 build_and_push.sh*
drwxrwxr-x 19 ubuntu ubuntu      4096 Jul 15 08:10 ComfyUI/
-rw-rw-r--  1 ubuntu ubuntu       784 Jul 16 10:41 Dockerfile.local
-rw-rw-r--  1 ubuntu ubuntu        81 Jul 16 10:45 .dockerignore
...

The Dockerfile.local builds the image by COPY the directory

...
# Python Evn
RUN pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121
COPY ComfyUI /app/ComfyUI
RUN pip3 install -r /app/ComfyUI/requirements.txt

# Custom Nodes Env, may encounter some conflicts
RUN find /app/ComfyUI/custom_nodes -maxdepth 2 -name "requirements.txt"|xargs -I {} pip install -r {}
...

Refer to comfyui-on-eks/comfyui_image/Dockerfile.local for the complete Dockerfile.

Run the following command to build and upload the Docker image

region="us-west-2" # Modify the region to your current region.
bash build_and_push.sh $region Dockerfile.local

With this method, you can easily and quickly build your local Dev environment into an image for deployment, without paying attention to the installation, version, and dependency details of custom nodes when there are many of them.

However, not paying attention to the deployment environment of custom nodes may cause conflicts or missing dependencies, which need to be manually tested and resolved.

Upload models

Upload all the models needed for the workflow to the s3://comfyui-models-{account_id}-{region} corresponding directory using your preferred method. The GPU nodes will automatically sync from Amazon S3 (triggered by Lambda). If the models are large and numerous, you might need to wait. You can log into the GPU nodes using the aws ssm start-session --target ${instance_id} command and use the ps command to check the progress of the aws s3 sync process.

To set up this demo, you need to download the following models to s3://comfyui-models-{account_id}-{region}/svd/:

Test the Docker image locally (optional)

Since there are many types of custom nodes with different dependencies and versions, the runtime environment is quite complex. We recommend testing the Docker image locally after building it to ensure it runs correctly.

Refer to the code in comfyui-on-eks/comfyui_image/test_docker_image_locally.sh. Prepare the models and input directories (assuming the models and input images are stored in /home/ubuntu/ComfyUI/models and /home/ubuntu/ComfyUI/input respectively), and run the script to test the Docker image:

bash comfyui-on-eks/comfyui_image/test_docker_image_locally.sh

Rolling update K8S pods

Use your preferred method to perform a rolling update of the image for the online K8S pods, and then test the service.

Note, to run this demo, you need to:

  • use g5.2xlarge GPU node
  • set lower num_frames in Load Stable Video Diffusion Model (for example to 6)
  • set lower decoding_t in Stable Video Diffusion Decoder node (for example to 1)
Screenshot showing the rolling update demo

Figure 3. Screenshot showing the rolling update demo

Conclusion

Custom nodes empower creators to unleash the full potential of ComfyUI by seamlessly integrating a wide range of capabilities into their own workflows.

This article demonstrate how to build custom nodes into ComfyUI-on-EKS solution, you can build your own ComfyUI CI/CD pipeline following the instructions.

Deploy Stable Diffusion ComfyUI on AWS elastically and efficiently

Post Syndicated from Wang Rui original https://aws.amazon.com/blogs/architecture/deploy-stable-diffusion-comfyui-on-aws-elastically-and-efficiently/

Introduction

ComfyUI is an open-source node-based workflow solution for Stable Diffusion. It offers the following advantages:

  • Significant performance optimization for SDXL model inference
  • High customizability, allowing users granular control
  • Portable workflows that can be shared easily
  • Developer-friendly

Due to these advantages, ComfyUI is increasingly being used by artistic creators. In this post, we will introduce how to deploy ComfyUI on AWS elastically and efficiently.

Overview of solution

The solution is characterized by the following features:

  • Infrastructure as Code (IaC) deployment: We employ a minimalist approach to operations and maintenance. Using AWS Cloud Development Kit (AWS CDK) and Amazon Elastic Kubernetes Service (Amazon EKS) Blueprints, we manage the Amazon EKS clusters that host and run ComfyUI.
  • Dynamic scaling with Karpenter: Leveraging the capabilities of Karpenter, we customize node scaling strategies to meet business needs.
  • Cost savings with Amazon Spot Instances: We use Amazon Spot Instances to reduce the costs of GPU instances.
  • Optimized use of GPU instance store: By fully utilizing the instance store of GPU instances, we maximize performance for model loading and switching while minimizing the costs associated with model storage and transfer.
  • Direct image writing with Amazon Simple Storage Service (Amazon S3) CSI driver: Images generated are directly written to Amazon S3 using the S3 CSI driver, reducing storage costs.
  • Accelerated dynamic requests with Amazon CloudFront: To facilitate the use of the platform by art studios across different regions, we use Amazon CloudFront for faster dynamic request processing.
  • Serverless event-initiated model synchronization: When models are uploaded to or deleted from Amazon S3, serverless event initiations activate, syncing the model directory data across worker nodes.

Walkthrough

The solution’s architecture is structured into two distinct phases: the deployment phase and the user interaction phase.

Architecture for deploying stable diffusion on ComfyUI

Figure 1. Architecture for deploying stable diffusion on ComfyUI

Deployment phase

  1. Model storage in Amazon S3: ComfyUI’s models are stored in Amazon S3 for models, following the same directory structure as the native ComfyUI/models directory.
  2. GPU node initialization in Amazon EKS cluster: When GPU nodes in the EKS cluster are initiated, they format the local instance store and synchronize the models from Amazon S3 to the local instance store using user data scripts.
  3. Running ComfyUI pods in EKS: Pods operating ComfyUI effectively link the instance store directory on the node to the pod’s internal models directory, facilitating seamless model access and loading.
  4. Model sync with AWS Lambda: When models are uploaded to or deleted from Amazon S3, an AWS Lambda function synchronizes the models from S3 to the local instance store on all GPU nodes by using SSM commands.
  5. Output mapping to Amazon S3: Pods running ComfyUI map the ComfyUI/output directory to S3 for outputs with Persistent Volume Claim (PVC) methods.

User interaction phase

  1. Request routing: When a user request reaches the Amazon EKS pod through CloudFront t0 ALB, the pod first loads the model from the instance store.
  2. Post-inference image storage: After inference, the pod stores the image in the ComfyUI/output directory, which is directly written to Amazon S3 using the S3 CSI driver.
  3. Performance advantages of instance store: Thanks to the performance benefits of the instance store, the time taken for initial model loading and model switching is significantly reduced.

You can find the deployment code and detailed instructions in our GitHub samples library.

Image Generation

Once deployed, you can access and use the ComfyUI frontend directly through a browser by visiting the domain name of CloudFront or the domain name of Kubernetes Ingress.

Accessing ComfyUI through a browser

Figure 2. Accessing ComfyUI through a browser

You can also interact with ComfyUI by saving its workflow as an API-callable JSON file.

Accessing ComfyUI through an API

Figure 3. Accessing ComfyUI through an API

Deployment Instructions

Prerequisites

This solution assumes that you have already installed, deployed, and are familiar with the following tools:

Make sure that you have enough vCPU quota for G instances (at least 8 vCPU for a g5.2xl/g4dn.2x used in this guidance).

  1. Download the code, check out the branch, install rpm packages, and check the environment:
    git clone https://github.com/aws-samples/comfyui-on-eks ~/comfyui-on-eks
    cd ~/comfyui-on-eks && git checkout v0.2.0
    npm install
    npm list
    cdk list
  2. Run npm list to ensure following packages are installed:
    git clone https://github.com/aws-samples/comfyui-on-eks ~/comfyui-on-eks
    cd ~/comfyui-on-eks && git checkout v0.2.0
    npm install
    npm list
    cdk list
  3. Run cdk list to ensure the environment is all set, you will have following AWS CloudFormation stack to deploy:
    Comfyui-Cluster
    CloudFrontEntry
    LambdaModelsSync
    S3OutputsStorage
    ComfyuiEcrRepo

Deploy EKS Cluster

  1. Run the following command:
    cd ~/comfyui-on-eks && cdk deploy Comfyui-Cluster
  2. CloudFormation will create a stack named Comfyui-Cluster to deploy all the resources required for the EKS cluster. This process typically takes around 20 to 30 minutes to complete.
  3. Upon successful deployment, the CDK outputs will present a ConfigCommand. This command is used to update the configuration, enabling access to the EKS cluster via kubectl.

    ConfigCommand output screenshot

    Figure 4. ConfigCommand output screenshot

  4. Execute the ConfigCommand to authorize kubectl to access the EKS cluster.
  5. To verify that kubectl has been granted access to the EKS cluster, execute the following command:
    kubectl get svc

The deployment of the EKS cluster is complete. Note that EKS Blueprints has output KarpenterInstanceNodeRole, which is the role for the nodes managed by Karpenter. Record this role; it will be configured later.

Deploy an Amazon S3 bucket for storing models and set up AWS Lambda for dynamic model synchronization

  1. Run the following command:
    cd ~/comfyui-on-eks && cdk deploy LambdaModelsSync
  2. The LambdaModelsSync stack primarily creates the following resources:
    • S3 bucket: The S3 bucket is named following the format comfyui-models-{account_id}-{region}; it’s used to store ComfyUI models.
    • Lambda function, along with its associated role and event source: The Lambda function, named comfy-models-sync, is designed to initiate the synchronization of models from the S3 bucket to local storage on GPU instances whenever models are uploaded to or deleted from S3.
  3. Once the S3 for models and Lambda function are deployed, the S3 bucket will initially be empty. Execute the following command to initialize the S3 bucket and download the SDXL model for testing purposes.
    region="us-west-2" # Modify the region to your current region.
    cd ~/comfyui-on-eks/test/ && bash init_s3_for_models.sh $region

    There’s no need to wait for the model to finish downloading and uploading to S3. You can proceed with the following steps once you ensure the model is uploaded to S3 before starting the GPU nodes.

Deploy S3 bucket for storing images generated by ComfyUI.

Run the following command:
cd ~/comfyui-on-eks && cdk deploy S3OutputsStorage

The S3OutputsStorage stack creates an S3 bucket, named following the pattern comfyui-outputs-{account_id}-{region}, which is used to store images generated by ComfyUI.

Deploy ComfyUI workload

The ComfyUI workload is deployed through Kubernetes.

Build and push ComfyUI Docker image

  1. Run the following command, create an ECR repo for ComfyUI image:
    cd ~/comfyui-on-eks && cdk deploy ComfyuiEcrRepo
  2. Run the build_and_push.sh script on a machine where Docker has been successfully installed:
    region="us-west-2" # Modify the region to your current region.
    cd ~/comfyui-on-eks/comfyui_image/ && bash build_and_push.sh $region

    Note:

    • The Dockerfile uses a combination of git clone and git checkout to pin a specific version of ComfyUI. Modify this as needed.
    • The Dockerfile does not install customer nodes, these can be added as needed using the RUN command.
    • You only need to rebuild the image and replace it with the new version to update ComfyUI.

Deploy Karpenter for managing GPU instance scaling

Get the KarpenterInstanceNodeRole in previous section, run the following command to deploy Karpenter Provisioner:

KarpenterInstanceNodeRole="Comfyui-Cluster-ComfyuiClusterkarpenternoderole" # Modify the role to your own.
sed -i "s/role: KarpenterInstanceNodeRole.*/role: $KarpenterInstanceNodeRole/g" comfyui-on-eks/manifests/Karpenter/karpenter_v1beta1.yaml
kubectl apply -f comfyui-on-eks/manifests/Karpenter/karpenter_v1beta1.yaml

The KarpenterInstanceNodeRole acquired in previous section needs an additional S3 access permission to allow GPU nodes to sync files from S3. Run the following command:

KarpenterInstanceNodeRole="Comfyui-Cluster-ComfyuiClusterkarpenternoderole" # Modify the role to your own.
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --role-name $KarpenterInstanceNodeRole

Deploy S3 PV and PVC to store generated images

Execute the following command to deploy the PV and PVC for S3 CSI:

region="us-west-2" # Modify the region to your current region.
account=$(aws sts get-caller-identity --query Account --output text)
sed -i "s/region .*/region $region/g" comfyui-on-eks/manifests/PersistentVolume/sd-outputs-s3.yaml
sed -i "s/bucketName: .*/bucketName: comfyui-outputs-$account-$region/g" comfyui-on-eks/manifests/PersistentVolume/sd-outputs-s3.yaml
kubectl apply -f comfyui-on-eks/manifests/PersistentVolume/sd-outputs-s3.yaml

Deploy EKS S3 CSI Driver

  1. Run the following command to add your AWS Identity and Access Management (IAM) principal to the EKS cluster:
    identity=$(aws sts get-caller-identity --query 'Arn' --output text --no-cli-pager)
    if [[ $identity == *"assumed-role"* ]]; then
        role_name=$(echo $identity | cut -d'/' -f2)
        account_id=$(echo $identity | cut -d':' -f5)
        identity="arn:aws:iam::$account_id:role/$role_name"
    fi
    aws eks update-cluster-config --name Comfyui-Cluster --access-config authenticationMode=API_AND_CONFIG_MAP
    aws eks create-access-entry --cluster-name Comfyui-Cluster --principal-arn $identity --type STANDARD --username comfyui-user
    aws eks associate-access-policy --cluster-name Comfyui-Cluster --principal-arn $identity --access-scope type=cluster --policy-arn arn:aws:eks::
  2. Execute the following command to create a role and service account for the S3 CSI driver, enabling it to read and write to S3:
    region="us-west-2" # Modify the region to your current region.
    account=$(aws sts get-caller-identity --query Account --output text)
    ROLE_NAME=EKS-S3-CSI-DriverRole-$account-$region
    POLICY_ARN=arn:aws:iam::aws:policy/AmazonS3FullAccess
    eksctl create iamserviceaccount \
        --name s3-csi-driver-sa \
        --namespace kube-system \
        --cluster Comfyui-Cluster \
        --attach-policy-arn $POLICY_ARN \
        --approve \
        --role-name $ROLE_NAME \
        --region $region
  3. Run the following command to install aws-mountpoint-s3-csi-driver Addon:
    region="us-west-2" # Modify the region to your current region.
    account=$(aws sts get-caller-identity --query Account --output text)
    eksctl create addon --name aws-mountpoint-s3-csi-driver --version v1.0.0-eksbuild.1 --cluster Comfyui-Cluster --service-account-role-arn "arn:aws:iam::${account}:role/EKS-S3-CSI-DriverRole-${account}-${region}" --force

Deploy ComfyUI deployment and service

  1. Run the following command to replace docker image:
    region="us-west-2" # Modify the region to your current region.
    account=$(aws sts get-caller-identity --query Account --output text)
    sed -i "s/image: .*/image: ${account}.dkr.ecr.${region}.amazonaws.com\/comfyui-images:latest/g" comfyui-on-eks/manifests/ComfyUI/comfyui_deployment.yaml
  2. Run the following command to deploy ComfyUI Deployment and Service:
    kubectl apply -f comfyui-on-eks/manifests/ComfyUI

Test ComfyUI on EKS

API Test

To test with an API, run the following command in the comfyui-on-eks/test directory:

ingress_address=$(kubectl get ingress|grep comfyui-ingress|awk '{print $4}')
sed -i "s/SERVER_ADDRESS = .*/SERVER_ADDRESS = \"${ingress_address}\"/g" invoke_comfyui_api.py
sed -i "s/HTTPS = .*/HTTPS = False/g" invoke_comfyui_api.py
sed -i "s/SHOW_IMAGES = .*/SHOW_IMAGES = False/g" invoke_comfyui_api.py
./invoke_comfyui_api.py

Test with browser

  1. Run the following command to get the K8S ingress address:
    kubectl get ingress
  2. Access the ingress address through a web browser.

The deployment and testing of ComfyUI on EKS is now complete. Next we will connect the EKS cluster to CloudFront for edge acceleration.

Deploy CloudFront for edge acceleration (Optional)

Execute the following command in the comfyui-on-eks directory to connect the Kubernetes ingress to CloudFront:

cdk deploy CloudFrontEntry

After deployment completes, outputs will be printed, including the CloudFront URL CloudFrontEntry.cloudFrontEntryUrl. Refer to previous section for testing via the API or browser.

Cleaning up

Run the following command to delete all Kubernetes resources:

kubectl delete -f comfyui-on-eks/manifests/ComfyUI/
kubectl delete -f comfyui-on-eks/manifests/PersistentVolume/
kubectl delete -f comfyui-on-eks/manifests/Karpenter/

Run the following command to delete all deployed resources:

cdk destroy ComfyuiEcrRepo
cdk destroy CloudFrontEntry
cdk destroy S3OutputsStorage
cdk destroy LambdaModelsSync
cdk destroy Comfyui-Cluster

Conclusion

This article introduces a solution for deploying ComfyUI on EKS. By combining instance store and S3, it maximizes model loading and switching performance while reducing storage costs. It also automatically syncs models in a serverless way, leverages spot instances to lower GPU instance costs, and accelerates globally via CloudFront to meet the needs of geographically distributed art studios. The entire solution manages underlying infrastructure as code to minimize operational overhead.