Character Consistency LoRA Training for Z-Image: Keep Your Character Consistent Across Scenes

May 9, 2026

Character Consistency LoRA Training for Z-Image: Keep Your Character Consistent Across Scenes

Train a LoRA so your character maintains facial features and style across any prompt and any scene.


Why Character Consistency?

The Problem

One of the biggest pain points in AI image generation: the same character looks different every time.

Use Case Problem LoRA Solution
Comic/Webtoon creation Protagonist's face changes between panels ✅ Lock facial features
Social media accounts AI influencers need consistent identity ✅ Cross-scene consistency
Game character design Character concept art needs same face ✅ Multi-angle consistency
Novel covers Protagonist needs uniform look across covers ✅ Style lock

Z-Image Turbo's Advantage

Z-Image Turbo is a 6B parameter distilled model that compared to full-size models:

  • Trains faster: LoRA training time reduced by 60-70%
  • Lower VRAM: Training possible on 8GB VRAM
  • Faster inference: Sub-second generation
  • No quality compromise: 95%+ quality retention after distillation

Data Preparation

Requirements

Metric Minimum Recommended High Quality
Image count 5 images 15-20 images 30+ images
Resolution 512×512 1024×1024 1024×1024+
Face clarity Recognizable Clear, unobstructed Multiple angles
Background variety Some variation Multiple scenes Rich variety

Image Collection Guidelines

  1. Front-facing为主: At least 50% front/three-quarter view
  2. Multiple angles: Include 3-4 profile/side shots
  3. Different expressions: Smile, neutral, surprised
  4. Different outfits: Reduce clothing bias in training
  5. Different backgrounds: Prevent background features from being learned
  6. Lighting variation: Natural light, indoor, side lighting

Image Preprocessing

# Preprocess images with Python
python prepare_training_data.py /
  --input ./raw_images/ /
  --output ./training_data/ /
  --size 1024 /
  --face-detect /
  --crop-face-ratio 0.6

Key steps:

  1. Resize all images to 1024×1024
  2. Face detection and center crop
  3. Remove watermarks and irrelevant elements
  4. Generate caption files (optional)

LoRA Training Configuration

Using Kohya_ss WebUI

Base Parameters

Parameter Recommended Notes
Model z_image_turbo_bf16.safetensors Z-Image Turbo
Optimizer Prodigy Optimized for Z-Image
Learning rate Auto (Prodigy) Prodigy auto-scales
Network dim 32-64 Depends on character complexity
Network alpha 16-32 Typically dim/2
Epochs 10-20 More epochs for fewer images
Batch size 1-2 Adjust based on VRAM
Resolution 1024×1024 Z-Image native resolution

Prodigy Optimizer Parameters

Prodigy is specifically designed for Z-Image Turbo:

optimizer: prodigy
lr: 1.0  # Prodigy auto-scales this
d0: 0.05  # Initial step scale
weight_decay: 0.01

Why Prodigy?

  • Traditional AdamW requires manual learning rate tuning
  • Prodigy auto-scales based on gradient magnitude
  • More stable training, lower overfitting risk
  • Especially suited for distilled models like Z-Image Turbo

Command-Line Training

accelerate launch train_text_to_image.py /
  --pretrained_model_name_or_path=./z_image_turbo /
  --train_data_dir=./training_data /
  --resolution=1024 /
  --train_batch_size=1 /
  --num_train_epochs=15 /
  --learning_rate=1.0 /
  --optimizer=prodigy /
  --optimizer_args="d0=0.05,weight_decay=0.01" /
  --lora_rank=32 /
  --lora_alpha=16 /
  --output_dir=./lora_output /
  --checkpointing_steps=500 /
  --mixed_precision=bf16

Training Monitoring

Key Metrics

Metric Normal Range Warning Sign
Loss 0.01-0.1, gradually decreasing Not decreasing or volatile
Training time ~2-5 hours (15 images, 8GB) Over 8 hours
VRAM usage < 7GB (batch=1) OOM error

Validation Every 5 Epochs

Test with these prompts:

# Test 1: Base consistency
character_name, portrait photo, white background, studio lighting

# Test 2: Different scene
character_name, walking in a park, sunny day

# Test 3: Different style
character_name, anime style, watercolor painting

# Test 4: Extreme test
character_name, in a sci-fi spaceship, dramatic lighting

Overfitting Detection

Symptom Cause Fix
Training background appears Background learned Add background variety
Outfit stays fixed Clothing learned Train with different outfits
Face is blurry Over-trained Reduce epochs or alpha
No change at all Under-trained Increase epochs or alpha

Inference Usage

ComfyUI LoRA Loading

Load Checkpoint → z_image_turbo_bf16.safetensors
    ↓
Load LoRA → character_lora.safetensors
    ↓
[Set LoRA strength]
    ↓
KSampler

LoRA Strength Tuning

Strength Effect Use Case
0.3-0.5 Subtle features Style reference, loose consistency
0.6-0.8 Medium features Recommended daily use range
0.9-1.0 Strong features When high consistency needed
1.0-1.2 Over-strong May show overfitting artifacts

Multi-LoRA Combinations

Load multiple LoRAs simultaneously:

LoRA 1 (Character face) — strength 0.8
LoRA 2 (Art style) — strength 0.6
LoRA 3 (Clothing style) — strength 0.5

Note: Total strength should not exceed ~2.0 to avoid artifacts.


Advanced Techniques

Multi-Character Training

For maintaining consistency across multiple characters:

  1. Approach A: Train separate LoRA for each character
  2. Approach B: Train multi-character LoRA (with token differentiation)

Approach A recommended: More flexible, independently adjustable per character.

Face Enhancement

After LoRA training, stack a face restore node:

KSampler → LoRA generation → Face Restore node → Output

Prompt Template

# Character consistency prompt template
[character trigger word], [age description], [outfit description],
[scene description], [action description],
[style modifiers], [quality words]

# Example
character_name, 25-year-old woman, red dress,
walking through a garden at sunset,
cinematic lighting, photorealistic, 8k, sharp focus

FAQ

Q: How many images are enough?

  • 5 images: Bare minimum, simple scenes only
  • 10-15 images: Recommended starting point, basic consistency
  • 20+ images: High quality, cross-scene stability
  • 30+ images: Professional grade, multi-angle, multi-expression

Q: How long does training take?

Images VRAM Optimizer Estimated Time
10 8GB Prodigy ~2 hours
15 8GB Prodigy ~3 hours
20 12GB Prodigy ~4 hours
30 16GB Prodigy ~6 hours

Q: Character still not consistent after training?

  1. Increase LoRA strength to 0.8-1.0
  2. Check training image quality (faces clear?)
  3. Increase training epochs
  4. Ensure trigger word is at the beginning of prompt
  5. Reduce other descriptive elements that might override character features

Q: Are LoRAs trained on Z-Image Turbo compatible with Z-Image Base?

No. Turbo and Base are different models — training parameters are not compatible. Turbo LoRAs work only with Turbo.


Summary

Z-Image Turbo Character Consistency LoRA Training Workflow:

  1. Data prep: 15-20 clear face photos, multiple angles and scenes
  2. Model: Z-Image Turbo + Prodigy optimizer
  3. Config: dim=32, alpha=16, epochs=15
  4. Monitoring: Validate every 5 epochs, detect overfitting
  5. Inference: LoRA strength 0.6-0.8, with template prompts

Key advantages:

  • Fast training (8GB VRAM, 2-3 hours)
  • High quality (distilled model retains 95%+ quality)
  • Good consistency (cross-scene, cross-style feature retention)

This guide is based on ComfyUI + Z-Image Turbo + Prodigy optimizer.

Z-Image Team