Z-Image Inpainting Workflow: Complete Guide to Mask Editing and Object Replacement

mai 22, 2026

Z-Image Inpainting Workflow: Complete Guide to Mask Editing and Object Replacement

Keywords: z-image inpainting mask workflow


Table of Contents


Introduction

Inpainting is one of the most widely used techniques in AI-powered image editing. Unlike outpainting (canvas expansion, see ZI-015) or general edit workflows (see ZI-038), inpainting focuses on generating new content within a specified masked region of an existing image while preserving the surrounding area.

Z-Image's 6B-parameter Flux-based DiT architecture provides strong inpainting capabilities through ZImageInpaintPipeline in diffusers and comprehensive node support in ComfyUI.


Inpainting Fundamentals

Technical Mechanism

Z-Image inpainting operates through conditional generation:

  1. Mask encoding: Masked regions are encoded as noise; non-masked pixels are preserved
  2. Condition injection: The original image (with mask info) conditions the DiT architecture
  3. Prompt guidance: Text prompts direct content generation inside the masked area
  4. Iterative denoising: Progressive sampling produces content coordinated with surroundings

Inpainting vs Outpainting vs General Edit

Aspect Inpainting Outpainting General Edit
Operation Area Internal masked region External canvas Full/partial image
Mask Required Yes Optional Optional
Primary Use Object replacement/removal Canvas extension Style/detail changes
Constraint Strong surrounding context Edge constraints Prompt-driven

Mask Creation Techniques

Method 1: Manual Drawing

from PIL import Image, ImageDraw

mask = Image.new('L', (1024, 1024), 0)
draw = ImageDraw.Draw(mask)
draw.ellipse([200, 150, 400, 500], fill=255)
draw.rectangle([500, 100, 900, 600], fill=255)

Best for: Regular shapes, simple geometric regions. Precise control with no extra dependencies, but time-consuming for complex objects.

Method 2: AI Auto-Segmentation (SAM)

from segment_anything import sam_model_registry, SamPredictor

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)
predictor.set_image(np.array(image))

masks, scores, _ = predictor.predict(
    point_coords=np.array([[350, 300]]),
    point_labels=np.array([1]),
    multimask_output=True
)
best_mask = masks[np.argmax(scores)] * 255

Best for: Irregular objects and organic shapes. Delivers precise contours quickly but requires ~2.6GB additional VRAM for the ViT-H model.

Method 3: ComfyUI Mask Nodes

Common mask nodes available in ComfyUI:

  • SAMDetectorSEGS: SAM-based instance segmentation
  • Create Masks from Image: Color/luminance-based mask generation
  • Merge Masks: Combine multiple masks
  • Invert Mask / Mask Blur / Grow Mask: Standard mask operations
  • Threshold Mask: Binary thresholding of soft masks

Method 4: Semantic Segmentation

from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation

processor = AutoImageProcessor.from_pretrained(
    "nvidia/segformer-b0-finetuned-ade-512-512")
model = AutoModelForSemanticSegmentation.from_pretrained(
    "nvidia/segformer-b0-finetuned-ade-512-512")

Best for: Batch mask creation by semantic category. Covers 150 ADE20K classes.


Diffusers Inpainting Pipeline

ZImageInpaintPipeline Usage

from diffusers import ZImageInpaintPipeline
import torch

pipe = ZImageInpaintPipeline.from_pretrained(
    "Tongyi-ZImage/Z-Image-Turbo", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

result = pipe(
    prompt="a modern sports car in the parking spot",
    image=image.convert("RGB").resize((1024, 1024)),
    mask_image=mask.convert("L").resize((1024, 1024)),
    strength=1.0,
    guidance_scale=7.5,
    num_inference_steps=28,
    width=1024, height=1024
).images[0]

Key Parameters

Parameter Default Description
strength 1.0 Fixed at 1.0 (full masked-area regeneration)
guidance_scale 7.5 Prompt guidance strength, range 3.0–12.0
num_inference_steps 28 Sampling steps, recommended 20–50
mask_blur 4 Mask edge blur, controls blending quality

mask_blur Guide

  • 0: Sharp edges, prone to visible seams
  • 2–4: Light blur, suitable for precise object replacement
  • 6–10: Moderate blur, ideal for background modifications
  • >10: Strong blur, best for large-area repairs

ComfyUI Mask Workflow

Basic Inpainting Flow

Load Image ───→ Image ──┐
                        ↓
Create Mask ───→ Mask ──┤
                        ↓
                     KSampler (inpaint) ───→ Output
                        ↑
Load Model ───→ Model ─┘
Text Prompt ─→ Prompt ─┘

Complete JSON Workflow

{
  "4": {
    "class_type": "VAELoader",
    "inputs": {"vae_name": "zimage_vae.safetensors"}
  },
  "6": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {"ckpt_name": "z-image-turbo.safetensors"}
  },
  "8": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "a vintage leather sofa, warm lighting, photorealistic",
      "clip": ["6", 1]
    }
  },
  "10": {
    "class_type": "LoadImage",
    "inputs": {"image": "living_room.jpg", "upload": "image"}
  },
  "12": {
    "class_type": "CreateMaskFromImage",
    "inputs": {"image": ["10", 1], "channel": "alpha"}
  },
  "14": {
    "class_type": "InpaintModelConditioning",
    "inputs": {
      "positive": ["8", 0], "negative": ["8", 1],
      "vae": ["4", 0], "pixels": ["10", 0], "mask": ["12", 0]
    }
  },
  "16": {
    "class_type": "KSampler",
    "inputs": {
      "model": ["6", 0], "positive": ["14", 0],
      "negative": ["14", 1], "latent": ["14", 2],
      "seed": 42, "steps": 28, "cfg": 7.5,
      "sampler_name": "euler_ancestral", "scheduler": "normal",
      "denoise": 1.0
    }
  },
  "18": {
    "class_type": "VAEDecode",
    "inputs": {"samples": ["16", 0], "vae": ["4", 0]}
  },
  "20": {
    "class_type": "SaveImage",
    "inputs": {"images": ["18", 0]}
  }
}

Advanced: Inpainting with ControlNet

Original Image + Mask → InpaintModelConditioning
                          ↓
                    KSampler (with ControlNet)
                          ↓
                ControlNet (Depth / Canny / Pose)
                          ↓
                VAE Decode → Result

ControlNet constrains inpainted content to match structural guidance. Useful when replacing objects while preserving scene geometry or depth relationships.


Object Replacement Strategies

Strategy 1: Precise Replacement

  1. Create precise mask (SAM or manual)
  2. Write detailed prompt for the new object
  3. Use guidance_scale 7–9 and mask_blur 3–5
Scenario Mask Target Prompt Example
Furniture Chair area a vintage wooden rocking chair with brass accents, warm oak finish, photorealistic
Vehicle Car area a matte-black electric sedan, sleek design, natural sunlight reflections

Strategy 2: Context-Aware Replacement

  1. Slightly larger mask (including partial context)
  2. Prompt describes overall scene
  3. Lower guidance_scale (5–7), higher mask_blur (6–8)

Strategy 3: Multi-Object Replacement

  1. Create separate masks per object
  2. Combine with Merge Masks node
  3. Include all new objects in the prompt
  4. May require multi-step generation
combined_mask = (mask1 | mask2).astype(np.uint8) * 255

Multi-Step Inpainting

Complex edits often exceed what one inpainting pass can handle:

Step 1: Remove unwanted objects
  Mask → "empty space matching surroundings"
  ↓
Step 2: Add new content
  Mask → "detailed new content description"
  ↓
Step 3: Detail refinement
  Small mask → "refined details, matching lighting"

Optimization tips:

  • Large to small: Process large areas first, then reduce mask size for detail
  • Reduce steps: Later steps can use fewer inference steps (28 → 20 → 15)
  • Increase blur: Higher mask_blur in later steps improves blending
  • Check consistency: Verify color, lighting, and perspective at each step

Edge Blending

Mask Feathering

import cv2

def feather_mask(mask, blur_radius=8):
    return cv2.GaussianBlur(mask, (0, 0), blur_radius)

Poisson Blending

def poisson_blend(base, result, mask, center):
    return cv2.seamlessClone(result, base, mask, center, cv2.NORMAL_CLONE)

Blending Parameter Guide

Edit Type mask_blur Method
Object removal 4–6 Alpha blend
Object replacement 3–5 Poisson blend
Background modification 6–10 Gradient blend
Text addition 1–2 Direct overlay

Quality Preservation

  • Resolution matching: Use Image.NEAREST for mask resizing to avoid anti-aliasing
  • Color consistency: Describe lighting and color in prompts; apply color transfer post-processing if needed
  • Prevent texture repetition: Add "unique texture, no repeating patterns" to prompts; increase guidance_scale
  • Lighting consistency: Use "matching lighting, consistent shadows, same light source" in prompts
  • Perspective consistency: Use ControlNet Depth for architectural scenes; maintain vanishing point alignment

Practical Examples

Example 1: Object Removal

Scenario: Remove bystanders from a photograph.

Steps: SAM segmentation → Grow Mask +5px → Inpainting
Prompt: "clean background, matching surroundings, no people, photorealistic"
Params: guidance_scale=7.5, steps=28, mask_blur=6

Example 2: Clothing Change

Scenario: Replace clothing while preserving pose and features.

Steps: SAM segmentation + manual refinement → Inpainting
Prompt: "wearing a red tailored suit with a white dress shirt,
professional photography, natural pose, matching lighting"
Params: guidance_scale=9.0, steps=32, mask_blur=4

Example 3: Background Swap

Scenario: Replace indoor background with outdoor setting.

Steps: Foreground segmentation → Invert mask → Inpainting
Prompt: "tropical beach background, crystal clear ocean, palm trees,
golden sunset, cinematic lighting, matching subject lighting"
Params: guidance_scale=6.5, steps=28, mask_blur=10

Example 4: Text Insertion

Scenario: Add branded text to a product photo.

Steps: Precise rectangular mask → Inpainting
Prompt: "brand logo in clean sans-serif font, white text,
professional design, minimal style"
Params: guidance_scale=10.0, steps=30, mask_blur=1

Troubleshooting

Issue Cause Solution
Visible seams mask_blur too low Increase to 6–10; apply Poisson blending
Content clashes with surroundings Prompt lacks context Add environment description; lower guidance_scale
Unclear text Mask too small Enlarge mask; reduce mask_blur to 0–2
Repetitive textures Over-smoothing Add texture keywords; increase guidance_scale
Color inconsistency Distribution mismatch Describe colors in prompt; use color transfer

Summary

Z-Image's inpainting, combined with ComfyUI's node-based workflow, covers the full range from simple object removal to complex multi-step editing. Key factors for success:

  1. Precise masks via SAM or semantic segmentation
  2. Appropriate parameters — adjust mask_blur and guidance_scale per edit type
  3. Multi-step processing for complex edits
  4. Proper edge blending — the single biggest factor in final output quality
  5. Context-aware prompts ensuring visual coherence

Compared to outpainting, inpainting demands higher mask precision and more careful blending. Compared to general edit workflows, it offers more precise localized control.


Z-Image Team

Z-Image Inpainting Workflow: Complete Guide to Mask Editing and Object Replacement | Blog