Squeeze Every Drop of VRAM: Running Z-Image-Turbo-ControlNet on Low-End GPUs

Z-Image has been out for a while, and everyone praises its quality. But its limitation is it only does text-to-image, and text-based control over character poses is not great. The turbo version can generate images quickly for more sampling, but we still wish Z-Image-Edit would come out soon — it would be even better if it supported multi-image references.

Before Z-Image-Edit arrives, we can try using Z-Image-Turbo-Fun-Controlnet-Union for pose control. Pose control is a mature technique in text-to-image, well-established during the Stable Diffusion era. The open-source community's Z-Image Controlnet is built on the SD ecosystem, supporting:

Canny Edge Detection: Extracts hard edges from images, preserving edge features — similar to simple line drawings.
HED Soft Edges: Compared to Canny, soft edges preserve transition information, closer to hand-drawn sketches.
OpenPose: The stick-figure skeleton maps that precisely control body poses.
Depth Estimation: Single-channel depth maps using grayscale to represent 3D spatial positions.
And many more

Download List

Using pose control requires downloading quite a few items.

First, download the Z-Image-Turbo-Fun-Controlnet-Union module — an additional module that stacks on top of the original Z-Image-Turbo, placed in the ComfyUI/models/model_patches directory. The preferred source is HuggingFace, but for those who prefer speed, searching on ModelScope for the most-downloaded version works great and is very fast.

Next are the ControlNet processing nodes. Besides Canny (built into ComfyUI), we need the aux preprocessor plugin for the others:

Plugin install: https://github.com/Fannovel16/comfyui_controlnet_aux

Processing model downloads: Different processing nodes need their own models. The aux nodes default to auto-downloading from HuggingFace, which is a dealbreaker if you can't connect to the international internet. So you need to download these models yourself and place them in ComfyUI/custom_nodes/comfyui_controlnet_aux/ckpts. While you can find all-in-one packs on file-sharing sites, without a premium account those are a hassle — I chose to manually download from ModelScope.

Model Download Checklist

Model	ModelScope Link
depth_anything_v2_vitb	https://www.modelscope.cn/models/depth-anything/Depth-Anything-V2-Base
body pose	https://www.modelscope.cn/models/muse/annotator_body_pose_model
hand pose	https://www.modelscope.cn/models/muse/annotator_hand_pose_model
face pose (rename to facenet)	https://www.modelscope.cn/models/muse/annotator_face_pose_model
ControlNetHED	https://www.modelscope.cn/models/muse/ControlNetHED

Workflow & Results

Now let's see the results. The workflow can be built using ComfyUI's official templates. However, with my 12GB entry-level GPU, loading the model and patches simultaneously causes OOM errors. I referenced a guide on running LTX2.3 on 12GB cards and added a VRAM control plugin (Set Reserved VRAM(GB) ⚙️) to make it work, though it's a bit slower (about 3 minutes per image on average):

Workflow Screenshot

Canny edge detection is natively supported by ComfyUI — let's try it directly.

Canny Result 1

Canny Result 2

Canny Result 3

Canny Result 4

Next, try HED by replacing the Canny node with HED Soft-Edge Lines:

HED Result 1

HED Result 2

HED Result 3

HED Result 4

Depth Anything offers several model variants; I chose the base version:

Depth Anything Result 1

Depth Anything Result 2

Depth Anything Result 3

Depth Anything Result 4

This one seems to produce too much noise in the output — the control image has too much information, giving the generated images an oil-painting-like appearance.

OpenPose requires three detection models for face, hands, and body:

OpenPose Result 1

OpenPose Result 2

OpenPose Result 3

OpenPose Result 4

The stick-figure approach sometimes doesn't perfectly respect the original pose — the generated poses can shift slightly.

Summary

Use Z-Image-Turbo-Fun-Controlnet-Union as a model patch for Z-Image to enable ControlNet
Use the Set Reserved VRAM(GB) ⚙️ node to manage VRAM usage, enabling low-end GPU operation
Use the comfyui_controlnet_aux plugin to access different ControlNet models

I found that Canny produces the highest-fidelity ControlNet inputs. There are still many pose-generation nodes I haven't tried, mostly because I couldn't find quick download sources.

Different nodes perform differently for real photos vs. cartoon characters, and different poses yield different results. Try them out and share your favorite pose-control model!

Squeeze Every Drop of VRAM: Running Z-Image-Turbo-ControlNet on Low-End GPUs

Table of Contents

Squeeze Every Drop of VRAM: Running Z-Image-Turbo-ControlNet on Low-End GPUs

Download List

Model Download Checklist

Workflow & Results

Summary