Z-Image img2img Image-to-Image Workflow: Complete Guide to Style Remapping and Detail Enhancement

مايو ٢٥، ٢٠٢٦

Z-Image img2img Image-to-Image Workflow: Complete Guide to Style Remapping and Detail Enhancement

Keywords: z-image img2img image-to-image workflow


Table of Contents


Introduction

Image-to-image (img2img) generation uses an existing image as input, allowing the model to reinterpret or enhance it based on a text prompt. Unlike inpainting (targeted masked regions), img2img processes the entire image with a controllable modification level set by the denoising strength parameter.

Z-Image's img2img workflow supports style transfer, detail enhancement, sketch-to-image conversion, and batch processing in both diffusers and ComfyUI.


img2img Fundamentals

Process Pipeline

  1. VAE Encoding: Input image encoded into latent space
  2. Noise Addition: Controlled noise added based on denoising strength
  3. Denoising: Model iteratively denoises, guided by both noisy latent and text prompt
  4. VAE Decoding: Final latent decoded to pixel space

img2img vs Other Modes

Aspect Text-to-Image img2img Inpainting
Input Prompt only Image + prompt Image + mask + prompt
Original Preserved None Partial (by strength) Unmasked regions fully
Use Case Fresh creation Style transfer, enhancement Localized editing

Strength Parameter Tuning

Understanding Denoising Strength

The strength (aka denoising_strength) parameter is the primary control:

from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-ZImage/Z-Image-Base", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

result = pipe(
    prompt="oil painting style, dramatic lighting, rich colors",
    image=input_image,
    strength=0.75,
    guidance_scale=7.5,
    num_inference_steps=30
).images[0]

Strength Value Reference

Strength Effect Best Use
0.1–0.2 Minimal change Color correction, slight style hint
0.3–0.4 Moderate change, structure preserved Style suggestion, detail addition
0.5–0.6 Significant change, composition kept Style transfer, medium remapping
0.7–0.8 Major change, only rough structure Strong style transfer
0.9–1.0 Near complete regeneration Sketch-to-image, radical transformation

Strength × Steps Interaction

Higher strength values benefit from more inference steps:

Strength Recommended Steps
0.1–0.3 20–28
0.4–0.6 28–40
0.7–0.9 30–50
1.0 28–50 (equivalent to txt2img)

Denoising Strength Guide

Low Strength (0.1–0.3): Enhancement Mode

# Photo enhancement
result = pipe(
    prompt="high detail, sharp focus, professional photography, 8K quality",
    image=low_quality_photo,
    strength=0.25,
    guidance_scale=5.0,
    num_inference_steps=28
).images[0]

Medium Strength (0.4–0.6): Style Remapping

# Photo to watercolor
result = pipe(
    prompt="watercolor painting, soft edges, flowing colors,
artistic brush strokes, wet-on-wet technique",
    image=photo_input,
    strength=0.55,
    guidance_scale=7.5,
    num_inference_steps=30
).images[0]

High Strength (0.7–1.0): Transformation Mode

# Sketch to realistic
result = pipe(
    prompt="photorealistic portrait, detailed skin texture,
natural lighting, professional photography",
    image=sketch_input,
    strength=0.85,
    guidance_scale=8.0,
    num_inference_steps=40
).images[0]

ComfyUI img2img Workflow

Node Setup

Load Checkpoint ──→ Model + CLIP + VAE
  ↓
Load Image ──→ Image ──→ VAE Encode ──→ Latent
  ↓
Text Prompt ──→ CLIP Encode ──→ Conditioning
  ↓
KSampler (denoise: X.X) ──→ Latent ──→ VAE Decode ──→ Save Image

JSON Workflow

{
  "2": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {"ckpt_name": "z-image-base.safetensors"}
  },
  "4": {
    "class_type": "LoadImage",
    "inputs": {"image": "input_photo.jpg", "upload": "image"}
  },
  "6": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "cyberpunk cityscape, neon lights, rain, cinematic",
      "clip": ["2", 1]
    }
  },
  "8": {
    "class_type": "VAEEncode",
    "inputs": {"pixels": ["4", 0], "vae": ["2", 2]}
  },
  "12": {
    "class_type": "KSampler",
    "inputs": {
      "model": ["2", 0], "positive": ["6", 0],
      "negative": ["6", 1], "latent_image": ["8", 0],
      "seed": 42, "steps": 30, "cfg": 7.5,
      "sampler_name": "euler_ancestral", "scheduler": "normal",
      "denoise": 0.65
    }
  },
  "14": {
    "class_type": "VAEDecode",
    "inputs": {"samples": ["12", 0], "vae": ["2", 2]}
  },
  "16": {
    "class_type": "SaveImage",
    "inputs": {"images": ["14", 0]}
  }
}

With ControlNet

Load Image → Encode → KSampler
                    ↓
            ControlNet (Canny / Depth)
                    ↓
            VAE Decode → Output

ControlNet adds structural guidance during img2img, preserving composition while applying new aesthetics.


Style Transfer via img2img

Common Style Transfers

Source → Target Strength Prompt Focus
Photo → Oil painting 0.5–0.7 oil painting, thick brush strokes, rich colors
Photo → Watercolor 0.45–0.6 watercolor, soft edges, flowing colors
Photo → Anime 0.6–0.8 anime style, cel shading, vibrant colors
Photo → Pencil sketch 0.5–0.65 pencil sketch, graphite, hatching
Photo → Pixel art 0.7–0.85 pixel art, 16-bit, retro aesthetic
Sketch → Photorealistic 0.8–0.95 photorealistic, detailed, natural lighting

Tips: Match strength to style distance (similar styles: lower strength; dissimilar: higher). Use negative prompts to exclude artifacts.


Detail Enhancement

Two-Pass Enhancement

# Step 1: Content enhancement
step1 = pipe(
    prompt="high detail, sharp textures, professional photography",
    image=low_res_input, strength=0.3,
    guidance_scale=6.0, num_inference_steps=28).images[0]

# Step 2: Detail refinement
step2 = pipe(
    prompt="ultra detailed, fine textures, crisp focus, 8K",
    image=step1, strength=0.15,
    guidance_scale=5.0, num_inference_steps=20).images[0]

Quality by Strength

Strength Noise Reduction Detail Preservation Color Shift
0.15 Moderate Excellent Minimal
0.25 Good Very good Slight
0.35 Strong Good Moderate

Sketch-to-Image

result = pipe(
    prompt="photorealistic portrait, detailed skin texture,
natural lighting, shallow depth of field, professional photography",
    image=sketch, strength=0.85,
    guidance_scale=8.0, num_inference_steps=40
).images[0]

Best practices: Clean line sketches produce best results at strength 0.8–0.95. Combine with Canny ControlNet for edge-preserving guidance at strength 0.6–0.75.


Photo Enhancement

Color Enhancement

result = pipe(
    prompt="vibrant colors, professional color grading,
cinematic color palette, rich saturation",
    image=dull_photo, strength=0.2,
    guidance_scale=4.5, num_inference_steps=20
).images[0]

Batch img2img

Python Batch Processing

import os
from PIL import Image

def batch_img2img(input_dir, output_dir, prompt, strength=0.6):
    os.makedirs(output_dir, exist_ok=True)
    for fn in os.listdir(input_dir):
        if fn.lower().endswith(('.jpg', '.png')):
            img = Image.open(os.path.join(input_dir, fn)).convert("RGB")
            result = pipe(prompt=prompt, image=img, strength=strength,
                guidance_scale=7.5, num_inference_steps=28).images[0]
            result.save(os.path.join(output_dir, fn))

ComfyUI Batch Workflow

Use BatchImageLoad → ImageBatch → VAE Encode → KSampler → VAE Decode → BatchSaveImage. Batch sizes of 4–8 give ~40% speed improvement with moderate VRAM impact.


Practical Examples

Example 1: Photo to Illustration

Prompt: "digital illustration, clean lines, flat colors, vector art style"
Params: strength=0.6, steps=30, cfg=7.5
Result: Clean vector illustration maintaining facial structure and pose

Example 2: Sketch to Realistic Portrait

Prompt: "photorealistic portrait, detailed skin, natural lighting, shallow depth of field"
Params: strength=0.85, steps=40, cfg=8.0
Result: Sketch lines guide facial structure; model generates realistic textures

Example 3: Low-Res to High-Detail

Prompt: "high detail, sharp focus, clean background, studio lighting, 8K"
Params: strength=0.25, steps=28, cfg=5.0
Result: Enhanced detail and clarity preserving original appearance

Example 4: Photo to Anime

Prompt: "anime style, vibrant colors, cel shading, manga art, detailed background"
Params: strength=0.7, steps=32, cfg=7.5
Result: Photographic scene converted to anime aesthetic

Troubleshooting

Issue Solution
Too much original preserved Increase strength by 0.1–0.2
Original completely lost Decrease strength; add structural keywords
Unwanted artifacts Increase steps for higher strength
Color drift Add color description to prompt; lower cfg
Blurry output Increase steps by 50–100%
Structural distortion Use ControlNet for guidance

Summary

Z-Image's img2img workflow is controlled primarily by the denoising strength parameter:

  1. Strength is key: 0.1–0.3 for enhancement, 0.4–0.6 for style transfer, 0.7–1.0 for transformation
  2. Match steps to strength: Higher strength needs more inference steps
  3. Two-pass approach: Sequential low-strength passes for enhancement workflows
  4. ControlNet adds precision: Prevents unwanted distortions at higher strengths
  5. Batch processing: Efficient large-scale image processing via ComfyUI batch nodes

Combined with inpainting (ZI-052) and ControlNet (ZI-058), img2img completes the toolkit for AI-powered image editing with Z-Image.


Z-Image Team

Z-Image img2img Image-to-Image Workflow: Complete Guide to Style Remapping and Detail Enhancement | Blog