Z-Image + Depth Anything V3: 3D Depth-Aware Control Workflow

From legacy preprocessors to next-gen depth estimation — inject real spatial understanding into Z-Image ControlNet with Depth Anything V3.

Why Better Depth Maps Matter

In Z-Image ControlNet workflows, depth maps are one of the most critical control signals. They determine the perspective, spatial hierarchy, and object proportions of generated images. Traditional depth estimation methods (MiDaS, ZoeDepth) have several notable limitations:

Detail loss: Weak depth discrimination for distant objects
Blurred boundaries: Insufficient depth transitions at object edges
Multi-scale inconsistency: Difficulty coordinating depth ratios between foreground and background

Depth Anything V3 (ByteDance, 2025) addresses these issues. Trained on large-scale depth-labeled data, it supports monocular depth estimation, camera pose estimation, and 3D point cloud output — all available in ComfyUI via the ComfyUI-DepthAnythingV3 plugin.

Core Capabilities of Depth Anything V3

Monocular Depth Estimation

Generate high-precision depth maps from single 2D images across multiple resolutions:

Model	Parameters	Inference Speed (RTX 4090)	Accuracy
V3-Small	24M	~50ms	High
V3-Metric	48M	~80ms	Highest (absolute distance)
V3-Large	180M	~150ms	Ultimate

Multi-View Consistency

The biggest V3 over V2 breakthrough: when processing multiple images from different angles, V3's Cross-View Attention mechanism ensures depth consistency across all views. This means:

Video frame depth maps don't "flicker"
Multi-angle 3D point clouds are conflict-free
Ideal for architecture and interior scenes requiring precise spatial relationships

Camera Pose Estimation

Beyond depth, V3 estimates camera parameters (focal length, field of view, pose) — data directly usable for 3D reconstruction or VR/AR applications.

Complete ComfyUI Workflow Setup

Step 1: Install ComfyUI-DepthAnythingV3

cd ComfyUI/custom_nodes
git clone https://github.com/kijai/ComfyUI-DepthAnythingV3.git
cd ComfyUI-DepthAnythingV3
pip install -r requirements.txt

Step 2: Download Model Files

# Place in ComfyUI/models/depth_anything/
# Small version (recommended for daily use)
huggingface-cli download depth-anything-2/depth-anything-v3-small --local-dir ./depth_anything/v3-small

# Metric version (for absolute depth distance)
huggingface-cli download depth-anything-2/depth-anything-v3-metric --local-dir ./depth_anything/v3-metric

Step 3: Build Z-Image + Depth V3 + ControlNet Workflow

Core node connections:

LoadImage (reference image)
    ↓
DepthAnythingV3Preprocessor (generate depth map)
    ↓
ControlNetApply (Z-Image-Turbo-ControlNet-Union)
    ↓
CLIPTextEncode (Prompt)
    ↓
KSampler (Z-Image Turbo)
    ↓
VAEDecode → SaveImage

Key Parameters:

Parameter	Recommended	Notes
ControlNet strength	0.6-0.8	Don't over-control depth
Denoise	0.7-0.85	Preserve structural info
CFG Scale	2.0-4.0	Low CFG for Z-Image Turbo
Steps	20-30	Depth control needs more steps

Practical Use Case: Interior Design Style Transfer

Take a bare room photo and generate a luxury interior:

Prompt Example:

Modern luxury living room interior, marble floor, floor-to-ceiling windows,
warm ambient lighting, minimalist furniture, high-end materials,
photorealistic, architectural photography, 8k, detailed textures

Workflow Tips:

Depth map input: V3 depth map feeds directly — no binarization needed
ControlNet strength tuning:
- 0.4-0.5: Rough spatial structure only, high style variation
- 0.6-0.7: Balance between structure and creativity (recommended start)
- 0.8-1.0: Strict adherence to original layout
Combine with Inpainting: Use masks to rework unsatisfactory areas

Comparison:

Method	Spatial Accuracy	Style Freedom	Inference Time
MiDaS + ControlNet	⭐⭐⭐	⭐⭐⭐⭐	~2s
ZoeDepth + ControlNet	⭐⭐⭐⭐	⭐⭐⭐⭐	~3s
Depth Anything V3 + ControlNet	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	~4s

Video Frame Depth Consistency Workflow

Leverage V3's multi-view consistency to generate coherent depth maps across video frames, then stylize each frame with Z-Image:

LoadVideo (input video)
    ↓
DepthAnythingV3Preprocessor (multi_view=True)
    ↓
[Per-frame]
    ↓
ControlNetApply + KSampler
    ↓
VAEDecode
    ↓
SaveAnimatedPNG / VideoCombine

Key Settings:

multi_view=True: Enable cross-view consistency
temporal_smoothing=0.7: Temporal smoothing factor
Keep ControlNet strength consistent across frames

Result: Stylized video where objects don't "jump" or "flicker" — spatial relationships remain stable throughout.

Troubleshooting

Q1: Depth map looks "blurry", object edges unclear

Cause: Low resolution or Small model on complex scenes.

Fix:

Switch to V3-Metric or V3-Large
Increase input resolution to 1024x1024
Verify post_process=True (default on)

Q2: ControlNet too strong, image looks rigid

Cause: Strength too high or denoise too low.

Fix:

Start strength at 0.6 and decrease
Raise denoise above 0.8
Try lowering CFG Scale to 2.0-3.0

Q3: Video frame depth inconsistency

Cause: Multi-view mode not enabled or temporal_smoothing too low.

Fix:

Confirm multi_view=True
Set temporal_smoothing to 0.6-0.9
Ensure stable frame rate (no frame skipping)

Summary

Depth Anything V3 brings three key upgrades to Z-Image ControlNet workflows:

Accuracy leap: Monocular depth estimation surpasses MiDaS/ZoeDepth with sharper boundaries
Multi-view consistency: Cross-frame/angle depth maps no longer "flicker"
Camera pose output: Ready-to-use 3D data for downstream applications

For professional scenes like architectural visualization, interior design, and video stylization, the Depth Anything V3 + Z-Image ControlNet combo has become the new standard depth control workflow.

This workflow uses ComfyUI + Z-Image Turbo + Depth Anything V3 + ControlNet Union 2.1 — all open source and free.

Z-Image + Depth Anything V3: 3D Depth-Aware Control Workflow

Table of Contents

Z-Image + Depth Anything V3: 3D Depth-Aware Control Workflow

Why Better Depth Maps Matter

Core Capabilities of Depth Anything V3

Monocular Depth Estimation

Multi-View Consistency

Camera Pose Estimation

Complete ComfyUI Workflow Setup

Step 1: Install ComfyUI-DepthAnythingV3

Step 2: Download Model Files

Step 3: Build Z-Image + Depth V3 + ControlNet Workflow

Practical Use Case: Interior Design Style Transfer

Video Frame Depth Consistency Workflow

Troubleshooting

Q1: Depth map looks "blurry", object edges unclear

Q2: ControlNet too strong, image looks rigid

Q3: Video frame depth inconsistency

Summary