Z-Image Inpainting Workflow: Complete Guide to Mask Editing and Object Replacement

Keywords: z-image inpainting mask workflow

Introduction
Inpainting Fundamentals
Mask Creation Techniques
Diffusers Inpainting Pipeline
ComfyUI Mask Workflow
Object Replacement Strategies
Multi-Step Inpainting
Edge Blending
Quality Preservation
Practical Examples
Troubleshooting

Introduction

Inpainting is one of the most widely used techniques in AI-powered image editing. Unlike outpainting (canvas expansion, see ZI-015) or general edit workflows (see ZI-038), inpainting focuses on generating new content within a specified masked region of an existing image while preserving the surrounding area.

Z-Image's 6B-parameter Flux-based DiT architecture provides strong inpainting capabilities through ZImageInpaintPipeline in diffusers and comprehensive node support in ComfyUI.

Inpainting Fundamentals

Technical Mechanism

Z-Image inpainting operates through conditional generation:

Mask encoding: Masked regions are encoded as noise; non-masked pixels are preserved
Condition injection: The original image (with mask info) conditions the DiT architecture
Prompt guidance: Text prompts direct content generation inside the masked area
Iterative denoising: Progressive sampling produces content coordinated with surroundings

Inpainting vs Outpainting vs General Edit

Aspect	Inpainting	Outpainting	General Edit
Operation Area	Internal masked region	External canvas	Full/partial image
Mask Required	Yes	Optional	Optional
Primary Use	Object replacement/removal	Canvas extension	Style/detail changes
Constraint	Strong surrounding context	Edge constraints	Prompt-driven

Mask Creation Techniques

Method 1: Manual Drawing

from PIL import Image, ImageDraw

mask = Image.new('L', (1024, 1024), 0)
draw = ImageDraw.Draw(mask)
draw.ellipse([200, 150, 400, 500], fill=255)
draw.rectangle([500, 100, 900, 600], fill=255)

Best for: Regular shapes, simple geometric regions. Precise control with no extra dependencies, but time-consuming for complex objects.

Method 2: AI Auto-Segmentation (SAM)

from segment_anything import sam_model_registry, SamPredictor

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)
predictor.set_image(np.array(image))

masks, scores, _ = predictor.predict(
    point_coords=np.array([[350, 300]]),
    point_labels=np.array([1]),
    multimask_output=True
)
best_mask = masks[np.argmax(scores)] * 255

Best for: Irregular objects and organic shapes. Delivers precise contours quickly but requires ~2.6GB additional VRAM for the ViT-H model.

Method 3: ComfyUI Mask Nodes

Common mask nodes available in ComfyUI:

SAMDetectorSEGS: SAM-based instance segmentation
Create Masks from Image: Color/luminance-based mask generation
Merge Masks: Combine multiple masks
Invert Mask / Mask Blur / Grow Mask: Standard mask operations
Threshold Mask: Binary thresholding of soft masks

Method 4: Semantic Segmentation

from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation

processor = AutoImageProcessor.from_pretrained(
    "nvidia/segformer-b0-finetuned-ade-512-512")
model = AutoModelForSemanticSegmentation.from_pretrained(
    "nvidia/segformer-b0-finetuned-ade-512-512")

Best for: Batch mask creation by semantic category. Covers 150 ADE20K classes.

Diffusers Inpainting Pipeline

ZImageInpaintPipeline Usage

from diffusers import ZImageInpaintPipeline
import torch

pipe = ZImageInpaintPipeline.from_pretrained(
    "Tongyi-ZImage/Z-Image-Turbo", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

result = pipe(
    prompt="a modern sports car in the parking spot",
    image=image.convert("RGB").resize((1024, 1024)),
    mask_image=mask.convert("L").resize((1024, 1024)),
    strength=1.0,
    guidance_scale=7.5,
    num_inference_steps=28,
    width=1024, height=1024
).images[0]

Key Parameters

Parameter	Default	Description
`strength`	1.0	Fixed at 1.0 (full masked-area regeneration)
`guidance_scale`	7.5	Prompt guidance strength, range 3.0–12.0
`num_inference_steps`	28	Sampling steps, recommended 20–50
`mask_blur`	4	Mask edge blur, controls blending quality

mask_blur Guide

0: Sharp edges, prone to visible seams
2–4: Light blur, suitable for precise object replacement
6–10: Moderate blur, ideal for background modifications
>10: Strong blur, best for large-area repairs

ComfyUI Mask Workflow

Basic Inpainting Flow

Load Image ───→ Image ──┐
                        ↓
Create Mask ───→ Mask ──┤
                        ↓
                     KSampler (inpaint) ───→ Output
                        ↑
Load Model ───→ Model ─┘
Text Prompt ─→ Prompt ─┘

Complete JSON Workflow

{
  "4": {
    "class_type": "VAELoader",
    "inputs": {"vae_name": "zimage_vae.safetensors"}
  },
  "6": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {"ckpt_name": "z-image-turbo.safetensors"}
  },
  "8": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "a vintage leather sofa, warm lighting, photorealistic",
      "clip": ["6", 1]
    }
  },
  "10": {
    "class_type": "LoadImage",
    "inputs": {"image": "living_room.jpg", "upload": "image"}
  },
  "12": {
    "class_type": "CreateMaskFromImage",
    "inputs": {"image": ["10", 1], "channel": "alpha"}
  },
  "14": {
    "class_type": "InpaintModelConditioning",
    "inputs": {
      "positive": ["8", 0], "negative": ["8", 1],
      "vae": ["4", 0], "pixels": ["10", 0], "mask": ["12", 0]
    }
  },
  "16": {
    "class_type": "KSampler",
    "inputs": {
      "model": ["6", 0], "positive": ["14", 0],
      "negative": ["14", 1], "latent": ["14", 2],
      "seed": 42, "steps": 28, "cfg": 7.5,
      "sampler_name": "euler_ancestral", "scheduler": "normal",
      "denoise": 1.0
    }
  },
  "18": {
    "class_type": "VAEDecode",
    "inputs": {"samples": ["16", 0], "vae": ["4", 0]}
  },
  "20": {
    "class_type": "SaveImage",
    "inputs": {"images": ["18", 0]}
  }
}

Advanced: Inpainting with ControlNet

Original Image + Mask → InpaintModelConditioning
                          ↓
                    KSampler (with ControlNet)
                          ↓
                ControlNet (Depth / Canny / Pose)
                          ↓
                VAE Decode → Result

ControlNet constrains inpainted content to match structural guidance. Useful when replacing objects while preserving scene geometry or depth relationships.

Object Replacement Strategies

Strategy 1: Precise Replacement

Create precise mask (SAM or manual)
Write detailed prompt for the new object
Use guidance_scale 7–9 and mask_blur 3–5

Scenario	Mask Target	Prompt Example
Furniture	Chair area	`a vintage wooden rocking chair with brass accents, warm oak finish, photorealistic`
Vehicle	Car area	`a matte-black electric sedan, sleek design, natural sunlight reflections`

Strategy 2: Context-Aware Replacement

Slightly larger mask (including partial context)
Prompt describes overall scene
Lower guidance_scale (5–7), higher mask_blur (6–8)

Strategy 3: Multi-Object Replacement

Create separate masks per object
Combine with Merge Masks node
Include all new objects in the prompt
May require multi-step generation

combined_mask = (mask1 | mask2).astype(np.uint8) * 255

Multi-Step Inpainting

Complex edits often exceed what one inpainting pass can handle:

Step 1: Remove unwanted objects
  Mask → "empty space matching surroundings"
  ↓
Step 2: Add new content
  Mask → "detailed new content description"
  ↓
Step 3: Detail refinement
  Small mask → "refined details, matching lighting"

Optimization tips:

Large to small: Process large areas first, then reduce mask size for detail
Reduce steps: Later steps can use fewer inference steps (28 → 20 → 15)
Increase blur: Higher mask_blur in later steps improves blending
Check consistency: Verify color, lighting, and perspective at each step

Edge Blending

Mask Feathering

import cv2

def feather_mask(mask, blur_radius=8):
    return cv2.GaussianBlur(mask, (0, 0), blur_radius)

Poisson Blending

def poisson_blend(base, result, mask, center):
    return cv2.seamlessClone(result, base, mask, center, cv2.NORMAL_CLONE)

Blending Parameter Guide

Edit Type	mask_blur	Method
Object removal	4–6	Alpha blend
Object replacement	3–5	Poisson blend
Background modification	6–10	Gradient blend
Text addition	1–2	Direct overlay

Quality Preservation

Resolution matching: Use Image.NEAREST for mask resizing to avoid anti-aliasing
Color consistency: Describe lighting and color in prompts; apply color transfer post-processing if needed
Prevent texture repetition: Add "unique texture, no repeating patterns" to prompts; increase guidance_scale
Lighting consistency: Use "matching lighting, consistent shadows, same light source" in prompts
Perspective consistency: Use ControlNet Depth for architectural scenes; maintain vanishing point alignment

Practical Examples

Example 1: Object Removal

Scenario: Remove bystanders from a photograph.

Steps: SAM segmentation → Grow Mask +5px → Inpainting
Prompt: "clean background, matching surroundings, no people, photorealistic"
Params: guidance_scale=7.5, steps=28, mask_blur=6

Example 2: Clothing Change

Scenario: Replace clothing while preserving pose and features.

Steps: SAM segmentation + manual refinement → Inpainting
Prompt: "wearing a red tailored suit with a white dress shirt,
professional photography, natural pose, matching lighting"
Params: guidance_scale=9.0, steps=32, mask_blur=4

Example 3: Background Swap

Scenario: Replace indoor background with outdoor setting.

Steps: Foreground segmentation → Invert mask → Inpainting
Prompt: "tropical beach background, crystal clear ocean, palm trees,
golden sunset, cinematic lighting, matching subject lighting"
Params: guidance_scale=6.5, steps=28, mask_blur=10

Example 4: Text Insertion

Scenario: Add branded text to a product photo.

Steps: Precise rectangular mask → Inpainting
Prompt: "brand logo in clean sans-serif font, white text,
professional design, minimal style"
Params: guidance_scale=10.0, steps=30, mask_blur=1

Troubleshooting

Issue	Cause	Solution
Visible seams	`mask_blur` too low	Increase to 6–10; apply Poisson blending
Content clashes with surroundings	Prompt lacks context	Add environment description; lower `guidance_scale`
Unclear text	Mask too small	Enlarge mask; reduce `mask_blur` to 0–2
Repetitive textures	Over-smoothing	Add texture keywords; increase `guidance_scale`
Color inconsistency	Distribution mismatch	Describe colors in prompt; use color transfer

Summary

Z-Image's inpainting, combined with ComfyUI's node-based workflow, covers the full range from simple object removal to complex multi-step editing. Key factors for success:

Precise masks via SAM or semantic segmentation
Appropriate parameters — adjust mask_blur and guidance_scale per edit type
Multi-step processing for complex edits
Proper edge blending — the single biggest factor in final output quality
Context-aware prompts ensuring visual coherence

Compared to outpainting, inpainting demands higher mask precision and more careful blending. Compared to general edit workflows, it offers more precise localized control.

Z-Image Inpainting Workflow: Complete Guide to Mask Editing and Object Replacement

Table of Contents

Z-Image Inpainting Workflow: Complete Guide to Mask Editing and Object Replacement

Table of Contents

Introduction

Inpainting Fundamentals

Technical Mechanism

Inpainting vs Outpainting vs General Edit

Mask Creation Techniques

Method 1: Manual Drawing

Method 2: AI Auto-Segmentation (SAM)

Method 3: ComfyUI Mask Nodes

Method 4: Semantic Segmentation

Diffusers Inpainting Pipeline

ZImageInpaintPipeline Usage

Key Parameters

mask_blur Guide

ComfyUI Mask Workflow

Basic Inpainting Flow

Complete JSON Workflow

Advanced: Inpainting with ControlNet

Object Replacement Strategies

Strategy 1: Precise Replacement

Strategy 2: Context-Aware Replacement

Strategy 3: Multi-Object Replacement

Multi-Step Inpainting

Edge Blending

Mask Feathering

Poisson Blending

Blending Parameter Guide

Quality Preservation

Practical Examples

Example 1: Object Removal

Example 2: Clothing Change

Example 3: Background Swap

Example 4: Text Insertion

Troubleshooting

Summary