Z-Image Inpainting Workflow: Complete Guide to Mask Editing and Object Replacement
Keywords: z-image inpainting mask workflow
Table of Contents
- Introduction
- Inpainting Fundamentals
- Mask Creation Techniques
- Diffusers Inpainting Pipeline
- ComfyUI Mask Workflow
- Object Replacement Strategies
- Multi-Step Inpainting
- Edge Blending
- Quality Preservation
- Practical Examples
- Troubleshooting
Introduction
Inpainting is one of the most widely used techniques in AI-powered image editing. Unlike outpainting (canvas expansion, see ZI-015) or general edit workflows (see ZI-038), inpainting focuses on generating new content within a specified masked region of an existing image while preserving the surrounding area.
Z-Image's 6B-parameter Flux-based DiT architecture provides strong inpainting capabilities through ZImageInpaintPipeline in diffusers and comprehensive node support in ComfyUI.
Inpainting Fundamentals
Technical Mechanism
Z-Image inpainting operates through conditional generation:
- Mask encoding: Masked regions are encoded as noise; non-masked pixels are preserved
- Condition injection: The original image (with mask info) conditions the DiT architecture
- Prompt guidance: Text prompts direct content generation inside the masked area
- Iterative denoising: Progressive sampling produces content coordinated with surroundings
Inpainting vs Outpainting vs General Edit
| Aspect | Inpainting | Outpainting | General Edit |
|---|---|---|---|
| Operation Area | Internal masked region | External canvas | Full/partial image |
| Mask Required | Yes | Optional | Optional |
| Primary Use | Object replacement/removal | Canvas extension | Style/detail changes |
| Constraint | Strong surrounding context | Edge constraints | Prompt-driven |
Mask Creation Techniques
Method 1: Manual Drawing
from PIL import Image, ImageDraw
mask = Image.new('L', (1024, 1024), 0)
draw = ImageDraw.Draw(mask)
draw.ellipse([200, 150, 400, 500], fill=255)
draw.rectangle([500, 100, 900, 600], fill=255)
Best for: Regular shapes, simple geometric regions. Precise control with no extra dependencies, but time-consuming for complex objects.
Method 2: AI Auto-Segmentation (SAM)
from segment_anything import sam_model_registry, SamPredictor
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)
predictor.set_image(np.array(image))
masks, scores, _ = predictor.predict(
point_coords=np.array([[350, 300]]),
point_labels=np.array([1]),
multimask_output=True
)
best_mask = masks[np.argmax(scores)] * 255
Best for: Irregular objects and organic shapes. Delivers precise contours quickly but requires ~2.6GB additional VRAM for the ViT-H model.
Method 3: ComfyUI Mask Nodes
Common mask nodes available in ComfyUI:
- SAMDetectorSEGS: SAM-based instance segmentation
- Create Masks from Image: Color/luminance-based mask generation
- Merge Masks: Combine multiple masks
- Invert Mask / Mask Blur / Grow Mask: Standard mask operations
- Threshold Mask: Binary thresholding of soft masks
Method 4: Semantic Segmentation
from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
processor = AutoImageProcessor.from_pretrained(
"nvidia/segformer-b0-finetuned-ade-512-512")
model = AutoModelForSemanticSegmentation.from_pretrained(
"nvidia/segformer-b0-finetuned-ade-512-512")
Best for: Batch mask creation by semantic category. Covers 150 ADE20K classes.
Diffusers Inpainting Pipeline
ZImageInpaintPipeline Usage
from diffusers import ZImageInpaintPipeline
import torch
pipe = ZImageInpaintPipeline.from_pretrained(
"Tongyi-ZImage/Z-Image-Turbo", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
result = pipe(
prompt="a modern sports car in the parking spot",
image=image.convert("RGB").resize((1024, 1024)),
mask_image=mask.convert("L").resize((1024, 1024)),
strength=1.0,
guidance_scale=7.5,
num_inference_steps=28,
width=1024, height=1024
).images[0]
Key Parameters
| Parameter | Default | Description |
|---|---|---|
strength |
1.0 | Fixed at 1.0 (full masked-area regeneration) |
guidance_scale |
7.5 | Prompt guidance strength, range 3.0–12.0 |
num_inference_steps |
28 | Sampling steps, recommended 20–50 |
mask_blur |
4 | Mask edge blur, controls blending quality |
mask_blur Guide
- 0: Sharp edges, prone to visible seams
- 2–4: Light blur, suitable for precise object replacement
- 6–10: Moderate blur, ideal for background modifications
- >10: Strong blur, best for large-area repairs
ComfyUI Mask Workflow
Basic Inpainting Flow
Load Image ───→ Image ──┐
↓
Create Mask ───→ Mask ──┤
↓
KSampler (inpaint) ───→ Output
↑
Load Model ───→ Model ─┘
Text Prompt ─→ Prompt ─┘
Complete JSON Workflow
{
"4": {
"class_type": "VAELoader",
"inputs": {"vae_name": "zimage_vae.safetensors"}
},
"6": {
"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "z-image-turbo.safetensors"}
},
"8": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "a vintage leather sofa, warm lighting, photorealistic",
"clip": ["6", 1]
}
},
"10": {
"class_type": "LoadImage",
"inputs": {"image": "living_room.jpg", "upload": "image"}
},
"12": {
"class_type": "CreateMaskFromImage",
"inputs": {"image": ["10", 1], "channel": "alpha"}
},
"14": {
"class_type": "InpaintModelConditioning",
"inputs": {
"positive": ["8", 0], "negative": ["8", 1],
"vae": ["4", 0], "pixels": ["10", 0], "mask": ["12", 0]
}
},
"16": {
"class_type": "KSampler",
"inputs": {
"model": ["6", 0], "positive": ["14", 0],
"negative": ["14", 1], "latent": ["14", 2],
"seed": 42, "steps": 28, "cfg": 7.5,
"sampler_name": "euler_ancestral", "scheduler": "normal",
"denoise": 1.0
}
},
"18": {
"class_type": "VAEDecode",
"inputs": {"samples": ["16", 0], "vae": ["4", 0]}
},
"20": {
"class_type": "SaveImage",
"inputs": {"images": ["18", 0]}
}
}
Advanced: Inpainting with ControlNet
Original Image + Mask → InpaintModelConditioning
↓
KSampler (with ControlNet)
↓
ControlNet (Depth / Canny / Pose)
↓
VAE Decode → Result
ControlNet constrains inpainted content to match structural guidance. Useful when replacing objects while preserving scene geometry or depth relationships.
Object Replacement Strategies
Strategy 1: Precise Replacement
- Create precise mask (SAM or manual)
- Write detailed prompt for the new object
- Use
guidance_scale7–9 andmask_blur3–5
| Scenario | Mask Target | Prompt Example |
|---|---|---|
| Furniture | Chair area | a vintage wooden rocking chair with brass accents, warm oak finish, photorealistic |
| Vehicle | Car area | a matte-black electric sedan, sleek design, natural sunlight reflections |
Strategy 2: Context-Aware Replacement
- Slightly larger mask (including partial context)
- Prompt describes overall scene
- Lower
guidance_scale(5–7), highermask_blur(6–8)
Strategy 3: Multi-Object Replacement
- Create separate masks per object
- Combine with Merge Masks node
- Include all new objects in the prompt
- May require multi-step generation
combined_mask = (mask1 | mask2).astype(np.uint8) * 255
Multi-Step Inpainting
Complex edits often exceed what one inpainting pass can handle:
Step 1: Remove unwanted objects
Mask → "empty space matching surroundings"
↓
Step 2: Add new content
Mask → "detailed new content description"
↓
Step 3: Detail refinement
Small mask → "refined details, matching lighting"
Optimization tips:
- Large to small: Process large areas first, then reduce mask size for detail
- Reduce steps: Later steps can use fewer inference steps (28 → 20 → 15)
- Increase blur: Higher
mask_blurin later steps improves blending - Check consistency: Verify color, lighting, and perspective at each step
Edge Blending
Mask Feathering
import cv2
def feather_mask(mask, blur_radius=8):
return cv2.GaussianBlur(mask, (0, 0), blur_radius)
Poisson Blending
def poisson_blend(base, result, mask, center):
return cv2.seamlessClone(result, base, mask, center, cv2.NORMAL_CLONE)
Blending Parameter Guide
| Edit Type | mask_blur | Method |
|---|---|---|
| Object removal | 4–6 | Alpha blend |
| Object replacement | 3–5 | Poisson blend |
| Background modification | 6–10 | Gradient blend |
| Text addition | 1–2 | Direct overlay |
Quality Preservation
- Resolution matching: Use
Image.NEARESTfor mask resizing to avoid anti-aliasing - Color consistency: Describe lighting and color in prompts; apply color transfer post-processing if needed
- Prevent texture repetition: Add
"unique texture, no repeating patterns"to prompts; increaseguidance_scale - Lighting consistency: Use
"matching lighting, consistent shadows, same light source"in prompts - Perspective consistency: Use ControlNet Depth for architectural scenes; maintain vanishing point alignment
Practical Examples
Example 1: Object Removal
Scenario: Remove bystanders from a photograph.
Steps: SAM segmentation → Grow Mask +5px → Inpainting
Prompt: "clean background, matching surroundings, no people, photorealistic"
Params: guidance_scale=7.5, steps=28, mask_blur=6
Example 2: Clothing Change
Scenario: Replace clothing while preserving pose and features.
Steps: SAM segmentation + manual refinement → Inpainting
Prompt: "wearing a red tailored suit with a white dress shirt,
professional photography, natural pose, matching lighting"
Params: guidance_scale=9.0, steps=32, mask_blur=4
Example 3: Background Swap
Scenario: Replace indoor background with outdoor setting.
Steps: Foreground segmentation → Invert mask → Inpainting
Prompt: "tropical beach background, crystal clear ocean, palm trees,
golden sunset, cinematic lighting, matching subject lighting"
Params: guidance_scale=6.5, steps=28, mask_blur=10
Example 4: Text Insertion
Scenario: Add branded text to a product photo.
Steps: Precise rectangular mask → Inpainting
Prompt: "brand logo in clean sans-serif font, white text,
professional design, minimal style"
Params: guidance_scale=10.0, steps=30, mask_blur=1
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Visible seams | mask_blur too low |
Increase to 6–10; apply Poisson blending |
| Content clashes with surroundings | Prompt lacks context | Add environment description; lower guidance_scale |
| Unclear text | Mask too small | Enlarge mask; reduce mask_blur to 0–2 |
| Repetitive textures | Over-smoothing | Add texture keywords; increase guidance_scale |
| Color inconsistency | Distribution mismatch | Describe colors in prompt; use color transfer |
Summary
Z-Image's inpainting, combined with ComfyUI's node-based workflow, covers the full range from simple object removal to complex multi-step editing. Key factors for success:
- Precise masks via SAM or semantic segmentation
- Appropriate parameters — adjust
mask_blurandguidance_scaleper edit type - Multi-step processing for complex edits
- Proper edge blending — the single biggest factor in final output quality
- Context-aware prompts ensuring visual coherence
Compared to outpainting, inpainting demands higher mask precision and more careful blending. Compared to general edit workflows, it offers more precise localized control.