Z-Image img2img Image-to-Image Workflow: Complete Guide to Style Remapping and Detail Enhancement
Keywords: z-image img2img image-to-image workflow
Table of Contents
- Introduction
- img2img Fundamentals
- Strength Parameter Tuning
- Denoising Strength Guide
- ComfyUI img2img Workflow
- Style Transfer via img2img
- Detail Enhancement
- Sketch-to-Image
- Photo Enhancement
- Batch img2img
- Practical Examples
- Troubleshooting
- Summary
Introduction
Image-to-image (img2img) generation uses an existing image as input, allowing the model to reinterpret or enhance it based on a text prompt. Unlike inpainting (targeted masked regions), img2img processes the entire image with a controllable modification level set by the denoising strength parameter.
Z-Image's img2img workflow supports style transfer, detail enhancement, sketch-to-image conversion, and batch processing in both diffusers and ComfyUI.
img2img Fundamentals
Process Pipeline
- VAE Encoding: Input image encoded into latent space
- Noise Addition: Controlled noise added based on denoising strength
- Denoising: Model iteratively denoises, guided by both noisy latent and text prompt
- VAE Decoding: Final latent decoded to pixel space
img2img vs Other Modes
| Aspect | Text-to-Image | img2img | Inpainting |
|---|---|---|---|
| Input | Prompt only | Image + prompt | Image + mask + prompt |
| Original Preserved | None | Partial (by strength) | Unmasked regions fully |
| Use Case | Fresh creation | Style transfer, enhancement | Localized editing |
Strength Parameter Tuning
Understanding Denoising Strength
The strength (aka denoising_strength) parameter is the primary control:
from diffusers import ZImagePipeline
pipe = ZImagePipeline.from_pretrained(
"Tongyi-ZImage/Z-Image-Base", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
result = pipe(
prompt="oil painting style, dramatic lighting, rich colors",
image=input_image,
strength=0.75,
guidance_scale=7.5,
num_inference_steps=30
).images[0]
Strength Value Reference
| Strength | Effect | Best Use |
|---|---|---|
| 0.1–0.2 | Minimal change | Color correction, slight style hint |
| 0.3–0.4 | Moderate change, structure preserved | Style suggestion, detail addition |
| 0.5–0.6 | Significant change, composition kept | Style transfer, medium remapping |
| 0.7–0.8 | Major change, only rough structure | Strong style transfer |
| 0.9–1.0 | Near complete regeneration | Sketch-to-image, radical transformation |
Strength × Steps Interaction
Higher strength values benefit from more inference steps:
| Strength | Recommended Steps |
|---|---|
| 0.1–0.3 | 20–28 |
| 0.4–0.6 | 28–40 |
| 0.7–0.9 | 30–50 |
| 1.0 | 28–50 (equivalent to txt2img) |
Denoising Strength Guide
Low Strength (0.1–0.3): Enhancement Mode
# Photo enhancement
result = pipe(
prompt="high detail, sharp focus, professional photography, 8K quality",
image=low_quality_photo,
strength=0.25,
guidance_scale=5.0,
num_inference_steps=28
).images[0]
Medium Strength (0.4–0.6): Style Remapping
# Photo to watercolor
result = pipe(
prompt="watercolor painting, soft edges, flowing colors,
artistic brush strokes, wet-on-wet technique",
image=photo_input,
strength=0.55,
guidance_scale=7.5,
num_inference_steps=30
).images[0]
High Strength (0.7–1.0): Transformation Mode
# Sketch to realistic
result = pipe(
prompt="photorealistic portrait, detailed skin texture,
natural lighting, professional photography",
image=sketch_input,
strength=0.85,
guidance_scale=8.0,
num_inference_steps=40
).images[0]
ComfyUI img2img Workflow
Node Setup
Load Checkpoint ──→ Model + CLIP + VAE
↓
Load Image ──→ Image ──→ VAE Encode ──→ Latent
↓
Text Prompt ──→ CLIP Encode ──→ Conditioning
↓
KSampler (denoise: X.X) ──→ Latent ──→ VAE Decode ──→ Save Image
JSON Workflow
{
"2": {
"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "z-image-base.safetensors"}
},
"4": {
"class_type": "LoadImage",
"inputs": {"image": "input_photo.jpg", "upload": "image"}
},
"6": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "cyberpunk cityscape, neon lights, rain, cinematic",
"clip": ["2", 1]
}
},
"8": {
"class_type": "VAEEncode",
"inputs": {"pixels": ["4", 0], "vae": ["2", 2]}
},
"12": {
"class_type": "KSampler",
"inputs": {
"model": ["2", 0], "positive": ["6", 0],
"negative": ["6", 1], "latent_image": ["8", 0],
"seed": 42, "steps": 30, "cfg": 7.5,
"sampler_name": "euler_ancestral", "scheduler": "normal",
"denoise": 0.65
}
},
"14": {
"class_type": "VAEDecode",
"inputs": {"samples": ["12", 0], "vae": ["2", 2]}
},
"16": {
"class_type": "SaveImage",
"inputs": {"images": ["14", 0]}
}
}
With ControlNet
Load Image → Encode → KSampler
↓
ControlNet (Canny / Depth)
↓
VAE Decode → Output
ControlNet adds structural guidance during img2img, preserving composition while applying new aesthetics.
Style Transfer via img2img
Common Style Transfers
| Source → Target | Strength | Prompt Focus |
|---|---|---|
| Photo → Oil painting | 0.5–0.7 | oil painting, thick brush strokes, rich colors |
| Photo → Watercolor | 0.45–0.6 | watercolor, soft edges, flowing colors |
| Photo → Anime | 0.6–0.8 | anime style, cel shading, vibrant colors |
| Photo → Pencil sketch | 0.5–0.65 | pencil sketch, graphite, hatching |
| Photo → Pixel art | 0.7–0.85 | pixel art, 16-bit, retro aesthetic |
| Sketch → Photorealistic | 0.8–0.95 | photorealistic, detailed, natural lighting |
Tips: Match strength to style distance (similar styles: lower strength; dissimilar: higher). Use negative prompts to exclude artifacts.
Detail Enhancement
Two-Pass Enhancement
# Step 1: Content enhancement
step1 = pipe(
prompt="high detail, sharp textures, professional photography",
image=low_res_input, strength=0.3,
guidance_scale=6.0, num_inference_steps=28).images[0]
# Step 2: Detail refinement
step2 = pipe(
prompt="ultra detailed, fine textures, crisp focus, 8K",
image=step1, strength=0.15,
guidance_scale=5.0, num_inference_steps=20).images[0]
Quality by Strength
| Strength | Noise Reduction | Detail Preservation | Color Shift |
|---|---|---|---|
| 0.15 | Moderate | Excellent | Minimal |
| 0.25 | Good | Very good | Slight |
| 0.35 | Strong | Good | Moderate |
Sketch-to-Image
result = pipe(
prompt="photorealistic portrait, detailed skin texture,
natural lighting, shallow depth of field, professional photography",
image=sketch, strength=0.85,
guidance_scale=8.0, num_inference_steps=40
).images[0]
Best practices: Clean line sketches produce best results at strength 0.8–0.95. Combine with Canny ControlNet for edge-preserving guidance at strength 0.6–0.75.
Photo Enhancement
Color Enhancement
result = pipe(
prompt="vibrant colors, professional color grading,
cinematic color palette, rich saturation",
image=dull_photo, strength=0.2,
guidance_scale=4.5, num_inference_steps=20
).images[0]
Batch img2img
Python Batch Processing
import os
from PIL import Image
def batch_img2img(input_dir, output_dir, prompt, strength=0.6):
os.makedirs(output_dir, exist_ok=True)
for fn in os.listdir(input_dir):
if fn.lower().endswith(('.jpg', '.png')):
img = Image.open(os.path.join(input_dir, fn)).convert("RGB")
result = pipe(prompt=prompt, image=img, strength=strength,
guidance_scale=7.5, num_inference_steps=28).images[0]
result.save(os.path.join(output_dir, fn))
ComfyUI Batch Workflow
Use BatchImageLoad → ImageBatch → VAE Encode → KSampler → VAE Decode → BatchSaveImage. Batch sizes of 4–8 give ~40% speed improvement with moderate VRAM impact.
Practical Examples
Example 1: Photo to Illustration
Prompt: "digital illustration, clean lines, flat colors, vector art style"
Params: strength=0.6, steps=30, cfg=7.5
Result: Clean vector illustration maintaining facial structure and pose
Example 2: Sketch to Realistic Portrait
Prompt: "photorealistic portrait, detailed skin, natural lighting, shallow depth of field"
Params: strength=0.85, steps=40, cfg=8.0
Result: Sketch lines guide facial structure; model generates realistic textures
Example 3: Low-Res to High-Detail
Prompt: "high detail, sharp focus, clean background, studio lighting, 8K"
Params: strength=0.25, steps=28, cfg=5.0
Result: Enhanced detail and clarity preserving original appearance
Example 4: Photo to Anime
Prompt: "anime style, vibrant colors, cel shading, manga art, detailed background"
Params: strength=0.7, steps=32, cfg=7.5
Result: Photographic scene converted to anime aesthetic
Troubleshooting
| Issue | Solution |
|---|---|
| Too much original preserved | Increase strength by 0.1–0.2 |
| Original completely lost | Decrease strength; add structural keywords |
| Unwanted artifacts | Increase steps for higher strength |
| Color drift | Add color description to prompt; lower cfg |
| Blurry output | Increase steps by 50–100% |
| Structural distortion | Use ControlNet for guidance |
Summary
Z-Image's img2img workflow is controlled primarily by the denoising strength parameter:
- Strength is key: 0.1–0.3 for enhancement, 0.4–0.6 for style transfer, 0.7–1.0 for transformation
- Match steps to strength: Higher strength needs more inference steps
- Two-pass approach: Sequential low-strength passes for enhancement workflows
- ControlNet adds precision: Prevents unwanted distortions at higher strengths
- Batch processing: Efficient large-scale image processing via ComfyUI batch nodes
Combined with inpainting (ZI-052) and ControlNet (ZI-058), img2img completes the toolkit for AI-powered image editing with Z-Image.