Character Consistency LoRA Training for Z-Image: Keep Your Character Consistent Across Scenes
Train a LoRA so your character maintains facial features and style across any prompt and any scene.
Why Character Consistency?
The Problem
One of the biggest pain points in AI image generation: the same character looks different every time.
| Use Case | Problem | LoRA Solution |
|---|---|---|
| Comic/Webtoon creation | Protagonist's face changes between panels | ✅ Lock facial features |
| Social media accounts | AI influencers need consistent identity | ✅ Cross-scene consistency |
| Game character design | Character concept art needs same face | ✅ Multi-angle consistency |
| Novel covers | Protagonist needs uniform look across covers | ✅ Style lock |
Z-Image Turbo's Advantage
Z-Image Turbo is a 6B parameter distilled model that compared to full-size models:
- Trains faster: LoRA training time reduced by 60-70%
- Lower VRAM: Training possible on 8GB VRAM
- Faster inference: Sub-second generation
- No quality compromise: 95%+ quality retention after distillation
Data Preparation
Requirements
| Metric | Minimum | Recommended | High Quality |
|---|---|---|---|
| Image count | 5 images | 15-20 images | 30+ images |
| Resolution | 512×512 | 1024×1024 | 1024×1024+ |
| Face clarity | Recognizable | Clear, unobstructed | Multiple angles |
| Background variety | Some variation | Multiple scenes | Rich variety |
Image Collection Guidelines
- Front-facing为主: At least 50% front/three-quarter view
- Multiple angles: Include 3-4 profile/side shots
- Different expressions: Smile, neutral, surprised
- Different outfits: Reduce clothing bias in training
- Different backgrounds: Prevent background features from being learned
- Lighting variation: Natural light, indoor, side lighting
Image Preprocessing
# Preprocess images with Python
python prepare_training_data.py /
--input ./raw_images/ /
--output ./training_data/ /
--size 1024 /
--face-detect /
--crop-face-ratio 0.6
Key steps:
- Resize all images to 1024×1024
- Face detection and center crop
- Remove watermarks and irrelevant elements
- Generate caption files (optional)
LoRA Training Configuration
Using Kohya_ss WebUI
Base Parameters
| Parameter | Recommended | Notes |
|---|---|---|
| Model | z_image_turbo_bf16.safetensors | Z-Image Turbo |
| Optimizer | Prodigy | Optimized for Z-Image |
| Learning rate | Auto (Prodigy) | Prodigy auto-scales |
| Network dim | 32-64 | Depends on character complexity |
| Network alpha | 16-32 | Typically dim/2 |
| Epochs | 10-20 | More epochs for fewer images |
| Batch size | 1-2 | Adjust based on VRAM |
| Resolution | 1024×1024 | Z-Image native resolution |
Prodigy Optimizer Parameters
Prodigy is specifically designed for Z-Image Turbo:
optimizer: prodigy
lr: 1.0 # Prodigy auto-scales this
d0: 0.05 # Initial step scale
weight_decay: 0.01
Why Prodigy?
- Traditional AdamW requires manual learning rate tuning
- Prodigy auto-scales based on gradient magnitude
- More stable training, lower overfitting risk
- Especially suited for distilled models like Z-Image Turbo
Command-Line Training
accelerate launch train_text_to_image.py /
--pretrained_model_name_or_path=./z_image_turbo /
--train_data_dir=./training_data /
--resolution=1024 /
--train_batch_size=1 /
--num_train_epochs=15 /
--learning_rate=1.0 /
--optimizer=prodigy /
--optimizer_args="d0=0.05,weight_decay=0.01" /
--lora_rank=32 /
--lora_alpha=16 /
--output_dir=./lora_output /
--checkpointing_steps=500 /
--mixed_precision=bf16
Training Monitoring
Key Metrics
| Metric | Normal Range | Warning Sign |
|---|---|---|
| Loss | 0.01-0.1, gradually decreasing | Not decreasing or volatile |
| Training time | ~2-5 hours (15 images, 8GB) | Over 8 hours |
| VRAM usage | < 7GB (batch=1) | OOM error |
Validation Every 5 Epochs
Test with these prompts:
# Test 1: Base consistency
character_name, portrait photo, white background, studio lighting
# Test 2: Different scene
character_name, walking in a park, sunny day
# Test 3: Different style
character_name, anime style, watercolor painting
# Test 4: Extreme test
character_name, in a sci-fi spaceship, dramatic lighting
Overfitting Detection
| Symptom | Cause | Fix |
|---|---|---|
| Training background appears | Background learned | Add background variety |
| Outfit stays fixed | Clothing learned | Train with different outfits |
| Face is blurry | Over-trained | Reduce epochs or alpha |
| No change at all | Under-trained | Increase epochs or alpha |
Inference Usage
ComfyUI LoRA Loading
Load Checkpoint → z_image_turbo_bf16.safetensors
↓
Load LoRA → character_lora.safetensors
↓
[Set LoRA strength]
↓
KSampler
LoRA Strength Tuning
| Strength | Effect | Use Case |
|---|---|---|
| 0.3-0.5 | Subtle features | Style reference, loose consistency |
| 0.6-0.8 | Medium features | Recommended daily use range |
| 0.9-1.0 | Strong features | When high consistency needed |
| 1.0-1.2 | Over-strong | May show overfitting artifacts |
Multi-LoRA Combinations
Load multiple LoRAs simultaneously:
LoRA 1 (Character face) — strength 0.8
LoRA 2 (Art style) — strength 0.6
LoRA 3 (Clothing style) — strength 0.5
Note: Total strength should not exceed ~2.0 to avoid artifacts.
Advanced Techniques
Multi-Character Training
For maintaining consistency across multiple characters:
- Approach A: Train separate LoRA for each character
- Approach B: Train multi-character LoRA (with token differentiation)
Approach A recommended: More flexible, independently adjustable per character.
Face Enhancement
After LoRA training, stack a face restore node:
KSampler → LoRA generation → Face Restore node → Output
Prompt Template
# Character consistency prompt template
[character trigger word], [age description], [outfit description],
[scene description], [action description],
[style modifiers], [quality words]
# Example
character_name, 25-year-old woman, red dress,
walking through a garden at sunset,
cinematic lighting, photorealistic, 8k, sharp focus
FAQ
Q: How many images are enough?
- 5 images: Bare minimum, simple scenes only
- 10-15 images: Recommended starting point, basic consistency
- 20+ images: High quality, cross-scene stability
- 30+ images: Professional grade, multi-angle, multi-expression
Q: How long does training take?
| Images | VRAM | Optimizer | Estimated Time |
|---|---|---|---|
| 10 | 8GB | Prodigy | ~2 hours |
| 15 | 8GB | Prodigy | ~3 hours |
| 20 | 12GB | Prodigy | ~4 hours |
| 30 | 16GB | Prodigy | ~6 hours |
Q: Character still not consistent after training?
- Increase LoRA strength to 0.8-1.0
- Check training image quality (faces clear?)
- Increase training epochs
- Ensure trigger word is at the beginning of prompt
- Reduce other descriptive elements that might override character features
Q: Are LoRAs trained on Z-Image Turbo compatible with Z-Image Base?
No. Turbo and Base are different models — training parameters are not compatible. Turbo LoRAs work only with Turbo.
Summary
Z-Image Turbo Character Consistency LoRA Training Workflow:
- Data prep: 15-20 clear face photos, multiple angles and scenes
- Model: Z-Image Turbo + Prodigy optimizer
- Config: dim=32, alpha=16, epochs=15
- Monitoring: Validate every 5 epochs, detect overfitting
- Inference: LoRA strength 0.6-0.8, with template prompts
Key advantages:
- Fast training (8GB VRAM, 2-3 hours)
- High quality (distilled model retains 95%+ quality)
- Good consistency (cross-scene, cross-style feature retention)
This guide is based on ComfyUI + Z-Image Turbo + Prodigy optimizer.