Z-Image Turbo ControlNet Union 2.1: Complete Multi-Control Workflow Guide
Abstract: Z-Image Turbo ControlNet Union 2.1 is a unified control network model open-sourced by Alibaba's PAI team, supporting depth, edge, normal, semantic segmentation, grayscale, and multiple other control conditions. This article takes you from basic concepts to advanced multi-control workflows, teaching you Z-Image's multi-control point combination generation techniques step by step.
Introduction
ControlNet is the core technology for achieving precise spatial control in AI image generation. In early 2026, Alibaba's PAI team officially open-sourced Z-Image-Turbo-Fun-ControlNet-Union 2.1, a unified ControlNet model specifically trained for Z-Image Turbo.
Unlike traditional approaches requiring separate ControlNet models for each control condition, ControlNet Union consolidates all control conditions into a single model file, significantly simplifying workflows and improving flexibility.
1. What is ControlNet Union?
1.1 Basic Concepts
ControlNet Union is a unified ControlNet model supporting these control conditions:
| Control Type | Description | Use Cases |
|---|---|---|
| Depth | Spatial structure from depth maps | Architecture, scene composition |
| Canny | Contour control via Canny edge detection | Line art coloring, sketch refinement |
| Normal | Surface orientation from normal maps | 3D rendering, material replacement |
| Seg | Region control via semantic segmentation | Scene editing, object replacement |
| Gray | Light/dark control from grayscale | B&W to color, tone control |
| Pose | Human pose control | Portrait generation, action control |
| Tile | Local control via tile maps | Image expansion, detail enhancement |
1.2 Technical Background
- Model Size: Single file ~2GB (smaller than multiple independent ControlNets)
- Training Data: 1 million high-quality images
- Training Steps: 10,000 steps
- Architecture: ControlNet added on 6 DiT blocks of Z-Image Turbo
- Download: HuggingFace - alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union
2. ComfyUI Environment Setup
2.1 Install ComfyUI and Extensions
# Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
# Install ComfyUI Manager
cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
2.2 Download Model Files
Z-Image Turbo Main Model (place in models/unet/):
z-image-turbo-fp8-e4m3fn.safetensors
ControlNet Union 2.1 (place in models/controlnet/):
Z-Image-Turbo-Fun-Controlnet-Union.safetensors
CLIP Text Encoder (place in models/clip/):
qwen3-4b-fp8-scaled.safetensors
2.3 Minimum Hardware Requirements
| Configuration | VRAM | Notes |
|---|---|---|
| Entry Level | 8GB | FP8 quantization + basic resolution |
| Recommended | 12GB | FP8 + high resolution + multi-ControlNet |
| Ideal | 16GB+ | BF16 precision + high resolution |
3. Basic Workflows
3.1 Single Control Condition Workflow
The most basic ControlNet workflow includes these nodes:
Load Checkpoint (Z-Image-Turbo)
→ ControlNet Loader (Union 2.1)
→ ControlNet Preprocessor (select control type)
→ Apply ControlNet
→ KSampler (8 steps)
→ Save Image
Canny Edge Detection Example:
- Control type:
canny - ControlNet strength: 0.8-1.0
- Use case: Line art coloring, sketch refinement
3.2 Depth Map Control Workflow
Depth control is one of the most commonly used scenarios:
- Input: Depth map (can be auto-generated from RGB)
- Control type:
depth - Application: Change style while maintaining spatial structure
# Key ComfyUI workflow parameters
{
"control_net_name": "Z-Image-Turbo-Fun-Controlnet-Union.safetensors",
"control_net_type": "depth",
"strength": 0.9,
"start_percent": 0.0,
"end_percent": 1.0,
"num_inference_steps": 8,
"guidance_scale": 7.5
}
4. Multi-Control Combination Workflows (Core Feature)
The biggest advantage of ControlNet Union 2.1 is simultaneously using multiple control conditions. This is the core content of this article.
4.1 Depth + Canny Combination
Application: Precisely control spatial structure while maintaining contour details.
ControlNet 1: Depth (strength 0.8)
ControlNet 2: Canny (strength 0.6)
↓ Combined application
KSampler (8 steps)
↓
Output: Structurally precise + detail-rich images
Tips:
- Depth controls spatial structure (front, higher strength)
- Canny controls edge details (back, lower strength)
- Total strength of both ControlNets: 1.2-1.4
4.2 Semantic Segmentation + Depth Combination
Application: Scene editing, object replacement, style transfer.
ControlNet 1: Seg (semantic segmentation, strength 0.7)
→ Controls what content is in which region
ControlNet 2: Depth (strength 0.7)
→ Controls spatial structure
↓ Combined application
Prompt: "Add a running hound dog on the grass"
↓
Output: Semantically correct scene editing results
4.3 Pose + Canny Combination
Application: Character generation, outfit changes, action control.
ControlNet 1: Pose (strength 0.9)
→ Controls character pose
ControlNet 2: Canny (strength 0.5)
→ Controls outfit/appearance contours
↓ Combined application
Prompt: "Business person in red suit, standing in office"
↓
Output: Accurate pose + outfit matching description
5. Advanced Techniques
5.1 Multi-Resolution Strategy
# Two-stage high-resolution strategy
Stage 1: 512×512 + ControlNet (structure control)
→ Use Hires.Fix / Latent Upscale
Stage 2: 1024×1024 (no ControlNet, detail enhancement)
→ Use Z-Image Turbo native 8-step inference
5.2 Grayscale to Color Workflow
ControlNet Union's Gray mode excels at coloring black-and-white photos:
- Input black-and-white photo
- Control type:
gray - Prompt describes target color style
- Output: Colorized image
Tips:
- Gray mode strength: 0.7-0.8
- Combine with
depthcontrol to maintain structure - Suitable for old photo restoration, sketch coloring
5.3 ControlNet Strength Tuning Tips
| Scenario | Recommended Strength | Notes |
|---|---|---|
| Structure (Depth) | 0.8-1.0 | High strength for spatial structure |
| Edge (Canny) | 0.5-0.8 | Medium strength for detail flexibility |
| Semantic (Seg) | 0.6-0.9 | Adjust based on edit scope |
| Pose | 0.8-1.0 | High strength for pose accuracy |
| Grayscale (Gray) | 0.6-0.8 | Balance light/dark and color |
| Multi-control combo | 0.4-0.7 each | Total ≤ 1.5 |
6. Common Application Scenarios
6.1 Architectural Design
ControlNet combo: Depth + Normal
Prompt: "Modern glass facade building, sunset light, reflecting sky"
Effect: Maintain building structure, change materials and lighting
6.2 E-commerce Product Photography
ControlNet combo: Canny + Tile
Prompt: "Professional product photography, marble surface, soft studio lighting, white background"
Effect: Generate professional product images from product sketches
6.3 Character Design
ControlNet combo: Pose + Canny + Seg
Prompt: "Cyberpunk-styled character, neon-lit city background"
Effect: Precisely control character pose, outfit contours, and scene elements
6.4 Image Expansion (Uncrop)
ControlNet combo: Tile + Depth
Prompt: "Expand the frame, show a wider city skyline"
Effect: Intelligently expand image boundaries, maintaining style consistency
7. Troubleshooting
7.1 Insufficient VRAM
# Solution 1: Use FP8 quantized models
# Solution 2: Reduce resolution
# Solution 3: Enable --lowvram mode
python main.py --lowvram
7.2 Weak Control Effect
- Check if ControlNet strength is too low
- Confirm correct control type selection
- Verify preprocessing map quality (depth, edges, etc.)
- Try increasing
start_percent(from 0.0 to 0.1-0.2)
7.3 Over-Control Causing Stiff Images
- Reduce ControlNet strength
- Set
end_percent < 1.0(let model be free in final steps) - Reduce the number of simultaneously used ControlNets
8. Complete Workflow JSON Template
Key node configuration for ComfyUI multi-control combination workflow:
{
"controlnet_depth": {
"model": "Z-Image-Turbo-Fun-Controlnet-Union.safetensors",
"type": "depth",
"strength": 0.8,
"start_percent": 0.0,
"end_percent": 0.9
},
"controlnet_canny": {
"model": "Z-Image-Turbo-Fun-Controlnet-Union.safetensors",
"type": "canny",
"strength": 0.6,
"start_percent": 0.0,
"end_percent": 0.8
},
"sampler": {
"steps": 8,
"cfg": 7.5,
"scheduler": "euler",
"model": "z-image-turbo-fp8-e4m3fn.safetensors"
}
}
9. Summary
Z-Image Turbo ControlNet Union 2.1 is one of the most powerful control tools in the Z-Image ecosystem. Its unified architecture, multi-control combination capabilities, and open-source nature make it the preferred choice for professional image generation workflows.
Key Takeaways:
- Unified Model: One model supports all control types
- Multi-Control Combination: Use multiple ControlNets simultaneously for precise control
- Strength Tuning: Properly distribute strength across ControlNets
- Scenario Adaptation: Select different control combinations for different scenarios
- FP8 Deployment: Runs even in low VRAM environments
Mastering ControlNet Union 2.1 means mastering Z-Image Turbo's core competitive advantage.
Related Articles:
- ZI-072: Z-Image ComfyUI Power Nodes Advanced Workflow
- ZI-058: Z-Image ControlNet Union 2.1 Multi-Control Practical Guide
- ZI-080: Z-Image Batch Generation and Performance Tuning