Z-Image ControlNet Union 2.1 Multi-Control Practice: Next-Gen Unified Control Model

mai 26, 2026

Z-Image ControlNet Union 2.1 Multi-Control Practice: Next-Gen Unified Control Model

Keywords: z-image controlnet union multi-control


Table of Contents


Introduction

ControlNet has evolved from single-condition models (one model per condition type) to unified multi-condition models. Union 2.1 represents the current state-of-the-art, enabling multiple control conditions through a single model file. This guide covers practical implementation with Z-Image, including multi-condition workflows, weight balancing, and real-world use cases.


What Is ControlNet Union 2.1

Evolution from Single-Condition ControlNet

Traditional ControlNet required separate model files per condition:

controlnet_canny.safetensors    → Canny edge control
controlnet_depth.safetensors    → Depth map control
controlnet_pose.safetensors     → OpenPose control

This approach accumulates VRAM overhead and requires manual weight management between conditions.

Union 2.1 Architecture

Union 2.1 consolidates multiple conditions into one model:

controlnet_union21.safetensors → Canny + Depth + Pose + Tile +
                                  Lineart + Normal + Segment + Depth V3 + ...

The model uses type embeddings to route condition-specific features internally, handling multiple simultaneous controls in a single forward pass.

Key Advantages

Aspect Single-Condition Union 2.1
Model files One per condition Single unified model
VRAM Cumulative per model Single model + conditions
Weight management Manual per-condition Unified weight parameter
Condition conflicts Possible Resolved by internal routing

New Features in Union 2.1

Expanded Condition Types

Type ID Condition Description
0 Canny Edge detection control
1 Depth Depth map control
2 Pose OpenPose skeleton
3 Tile Regional tile control
4 Lineart Line art / sketch
5 Normal Surface normal map
6 Segment Semantic segmentation
7 MLP Multi-layer perception
8 Depth V3 Enhanced depth (new)
9 Soft Edge HED soft edges

Type Embedding Mechanism

Control Input → Type Embedding → Union Model → Condition Features

Learnable type embeddings distinguish between condition types internally, allowing shared weights without interference.

Multi-Condition Stacking

Canny (weight 0.8) + Depth (weight 0.6) + Pose (weight 0.9)
       → Single Union 2.1 ControlNet node → KSampler

Each condition maintains its own weight while sharing the model backbone.


Fun Union ControlNet Capabilities

Fun Union ControlNet is a community variant optimized for Z-Image compatibility:

  • Z-Image fine-tuned: Optimized for Z-Image's Flux-based DiT architecture
  • Extended conditions: Additional types beyond standard Union
  • Improved weight interpolation: Smoother transitions between condition types
  • Better ComfyUI integration: Native node support

Supported Conditions

Condition Union 2.1 Fun Union
Canny / Depth / Pose Yes Yes
Depth V3 Yes Yes
Lineart / Tile / Normal Yes Yes
Segment / Soft Edge Yes Yes
IP-Adapter No Yes (Fun Union extension)

Depth V3 Improvements

Depth V3 (type ID 8), new in Union 2.1, provides:

  1. Higher precision: Finer depth granularity
  2. Better edge preservation: Depth boundaries align with visual edges
  3. Fewer artifacts: Reduced estimation errors in complex scenes
  4. Improved occlusion handling: Better overlapping object depth

Practical Use

# Type ID 8 = Depth V3
condition_type = 8
depth_map = preprocess_depth_v3(input_image)
condition = prepare_union_condition(depth_map, type_id=8)

Best for: Architectural visualization, product photography, portrait structure, landscape composition.


Combined Control Strategies

Strategy 1: Pose + Depth

Character generation with accurate spatial positioning:

OpenPose → Pose (type 2, weight 0.9)
Depth Map → Depth (type 8, weight 0.7)
  → Union 2.1 → KSampler

Strategy 2: Canny + Depth + Pose

Triple-condition for maximum structural fidelity:

Canny → Edge (type 0, weight 0.8)
Depth → Depth (type 8, weight 0.6)
OpenPose → Pose (type 2, weight 0.9)
  → Union 2.1 → KSampler

Strategy 3: Lineart + Tile

Sketch-guided generation with regional refinement:

Line Art → Lineart (type 4, weight 0.85)
Tile Map → Tile (type 3, weight 0.5)
  → Union 2.1 → KSampler

Strategy 4: Depth V3 + Segment

Semantic-aware scene generation:

Depth Map → Depth V3 (type 8, weight 0.7)
Seg Map → Segment (type 6, weight 0.6)
  → Union 2.1 → KSampler

Single-Model Multi-Condition Workflow

Workflow Architecture

┌──────────────────────────────────┐
│  Pose Image │ Depth │ Canny Map  │
└──────┬─────────┬────────┬────────┘
       ↓         ↓        ↓
    [Preprocess each condition]
       ↓         ↓        ↓
       └────┬────┴───┬────┘
            ↓        ↓
    [Union Condition Merge]
            ↓
    [ControlNet Union 2.1 Apply]
            ↓
    [KSampler + VAE Decode → Output]

ComfyUI Multi-Condition JSON

{
  "2": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {"ckpt_name": "z-image-base.safetensors"}
  },
  "4": {
    "class_type": "ControlNetUnion21Loader",
    "inputs": {"union_model": "controlnet_union21_zimage.safetensors"}
  },
  "6": {"class_type": "LoadImage", "inputs": {"image": "pose.png"}},
  "8": {"class_type": "LoadImage", "inputs": {"image": "depth.png"}},
  "10": {"class_type": "LoadImage", "inputs": {"image": "canny.png"}},
  "12": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "a woman in a red dress in a modern lobby, cinematic lighting",
      "clip": ["2", 1]
    }
  },
  "14": {
    "class_type": "ControlNetUnion21Apply",
    "inputs": {
      "control_net": ["4", 0],
      "image_pose": ["6", 0], "image_depth": ["8", 0], "image_canny": ["10", 0],
      "strength_pose": 0.9, "strength_depth": 0.7, "strength_canny": 0.8,
      "conditioning": ["12", 0]
    }
  },
  "16": {
    "class_type": "KSampler",
    "inputs": {
      "model": ["2", 0], "positive": ["14", 0],
      "negative": ["12", 1], "latent_image": ["12", 0],
      "seed": 42, "steps": 30, "cfg": 7.5,
      "sampler_name": "euler_ancestral", "scheduler": "normal"
    }
  },
  "18": {
    "class_type": "VAEDecode",
    "inputs": {"samples": ["16", 0], "vae": ["2", 2]}
  },
  "20": {"class_type": "SaveImage", "inputs": {"images": ["18", 0]}}
}

Weight Balancing

Weight Guidelines

Condition Recommended Range Effect at High Weight
Pose 0.7–1.0 Strict skeleton adherence
Depth / Depth V3 0.5–0.8 Strong depth fidelity
Canny 0.5–0.9 Precise edge following
Lineart 0.6–0.9 Faithful sketch rendering
Tile 0.3–0.6 Regional guidance
Segment 0.4–0.7 Semantic placement

Resolving Conflicts

Conflict Resolution
Pose vs Depth Favor Pose (0.9) over Depth (0.5) for characters
Canny vs Depth Canny (0.8) + Depth (0.6) — edges dominate structure
Lineart vs Pose Pose (0.9) + Lineart (0.7) — skeleton guides, lines refine
Segment vs Depth Depth (0.7) + Segment (0.5) — depth dominates

Adjustment Strategy

  1. Start with equal weights (0.7 for all)
  2. Generate a test image
  3. Identify over/under-influenced conditions
  4. Adjust weights ±0.1–0.2
  5. Repeat until balanced

ComfyUI Union 2.1 Nodes

Required Nodes

Node Purpose
ControlNetUnion21Loader Load Union 2.1 model
ControlNetUnion21Apply Apply multi-condition control
UnionConditioning Merge multiple conditions

Installation

cd ComfyUI/custom_nodes
git clone https://github.com/docssw/ComfyUI-Union.git
pip install -r ComfyUI-Union/requirements.txt

The ControlNetUnion21Apply node accepts condition images (pose, depth, canny, etc.) with per-condition strength parameters. Missing conditions are handled via zero-weight type embeddings.


Practical Examples

Example 1: Pose + Depth + Canny Combination

Inputs: OpenPose skeleton + Depth map + Canny edges
Prompt: "a detective in a trench coat in a dimly lit office,
film noir atmosphere, dramatic shadows, photorealistic"
Weights: Pose 0.9 / Depth 0.6 / Canny 0.7
Params: steps=30, cfg=7.5

Result: Character matches exact pose, positioned correctly in room layout, with furniture structure preserved.

Example 2: Architectural Visualization

Inputs: Depth V3 + Canny edges + Semantic segment
Prompt: "modern minimalist living room, white walls, large windows,
natural wood furniture, warm sunlight, architectural photography"
Weights: Depth V3 0.7 / Canny 0.8 / Segment 0.5
Params: steps=35, cfg=8.0

Result: Architecturally accurate interior with correct proportions and material placement.

Example 3: Character Design with Pose + Depth

Inputs: OpenPose + Depth V3
Prompt: "cyberpunk warrior in futuristic armor, neon accents,
detailed mechanical parts, dramatic lighting, concept art"
Weights: Pose 0.9 / Depth V3 0.65
Params: steps=30, cfg=7.5

Result: Character matches pose exactly with accurate 3D volume, suitable for design iteration.


Troubleshooting

Issue Solution
Conditions conflict Adjust weights to prioritize one condition
No control effect Check type IDs; increase weight to 0.7+
Overly rigid output Reduce individual weights; combined < 2.0
Wrong condition applied Verify type ID mapping
VRAM OOM Downscale condition inputs to 512; keep batch=1
Depth V3 artifacts Use better depth estimation; preprocess depth maps

Summary

ControlNet Union 2.1 consolidates multiple control conditions into a single model:

  1. Single model, multiple conditions: Type embeddings route pose, depth, canny, lineart, and more
  2. Depth V3 upgrade: Improved precision and edge handling over previous depth
  3. Weight balancing matters: Each condition needs individual tuning; start at 0.7
  4. Fun Union for Z-Image: Community-optimized variant with extended features
  5. Mature ComfyUI support: Complete node support for Union 2.1 workflows

Union 2.1 enables complex multi-condition workflows that were previously cumbersome with single-condition ControlNet models.


Z-Image Team

Z-Image ControlNet Union 2.1 Multi-Control Practice: Next-Gen Unified Control Model | Blog