Z-Image Face Swap in Action: From ComfyUI Workflows to LoRA Character Training
Abstract: Z-Image is an open-source AI image generation model launched by Z.ai, renowned for its exceptional aesthetics and image quality. This article provides an in-depth guide to practical face swap solutions with Z-Image, covering two mainstream approaches — one-click face swap via ComfyUI workflows and character fine-tuning with LoRA — helping readers build their own AI face swap system from scratch.
- 📅 Publication Date: 2026-04-28
- 🏷️ Tags:
Z-ImageFace SwapComfyUILoRAReActorAI Art- 💻 Hardware Requirements: 16GB unified memory (Apple Silicon M3/M4/M5) or entry-level NVIDIA/AMD GPU
- 📦 Community Resources: Z-Image_FaceSwap_Gen_1.0 workflow template
Table of Contents
- Introduction: Overview of Z-Image's Face Swap Capabilities
- Comparing Two Face Swap Approaches: ComfyUI Workflows vs. LoRA Training
- Approach One: ComfyUI + Z-Image Turbo + ReActor Workflow Walkthrough
- Approach Two: LoRA Character Fine-Tuning (nphSi/Z-Image-Lora)
- Comparative Summary: Pros and Cons at a Glance
- Best Practices and Troubleshooting Guide
- Appendix: Common Prompt Templates and Resource Links
1. Introduction: Overview of Z-Image's Face Swap Capabilities
What is Z-Image?
Z-Image is a high-quality AI image generation model developed by the Z.ai team. Built on a Diffusion architecture, it excels particularly in character portraits and photorealistic human imagery. The model supports FP8 quantization, significantly lowering VRAM requirements and enabling smooth operation on consumer-grade hardware.
Why is Z-Image Suitable for Face Swapping?
Z-Image's face swap capability doesn't come from a single module — it stems from its outstanding facial detail generation. The faces it produces already have a high degree of realism and aesthetic quality. On this foundation, the community has developed two mature face swap approaches:
| Approach | Core Concept | Use Cases |
|---|---|---|
| ComfyUI Workflow | First generate a high-quality base image with Z-Image, then replace the target face using the ReActor plugin | Quick generation, temporary face swaps, batch production |
| LoRA Character Training | Train a dedicated LoRA on photos of a specific person, then the model directly outputs that character during generation | Character consistency, long-term projects, manga/novel illustrations |
💡 Key Difference: The ComfyUI workflow follows a "generate-then-replace" strategy, while LoRA training follows a "generate-as-character" strategy. The two can complement each other.
2. Comparing Two Face Swap Approaches: ComfyUI Workflows vs. LoRA Training
Before diving into the details, let's take a panoramic comparison of both approaches:
Approach One: ComfyUI Workflow (Z-Image Turbo + ReActor)
┌─────────────┐ ┌──────────────┐ ┌────────────────┐
│ Target Person Photo │────▶│ │────▶│ │
│ (Source) │ │ Z-Image │ │ ReActor │
│ │ │ Turbo FP8 │ │ Face Swap │
└─────────────┘ │ Generate Base Image │ │ Face Replacement │
└──────┬───────┘ └────────┬───────┘
│ │
▼ ▼
┌─────────────────────────────────┐
│ Final Face Swap Result │
└─────────────────────────────────┘
- How It Works: First, use the Z-Image Turbo model to generate a high-quality portrait base image based on your prompt. Then, leverage the ReActor plugin to transfer the target person's facial features onto the base image.
- Pros: Extremely quick to get started, no training required, swap anyone you want
- Cons: Face blending quality depends on the ReActor algorithm; certain angles may look unnatural
Approach Two: LoRA Character Fine-Tuning
┌──────────────────┐
│ 10~20 Character Photos │
│ (Training Dataset) │
└────────┬─────────┘
▼
┌──────────────────┐
│ LoRA Training │
│ (nphSi/Z-Image- │
│ Lora) │
└────────┬─────────┘
▼
┌──────────────────┐
│ Load LoRA + │
│ Z-Image Generation│
│ (Trigger Word + Prompt)│
└────────┬─────────┘
▼
┌──────────────────┐
│ Direct Character Output │
│ (No Face Swap Step Needed)│
└──────────────────┘
- How It Works: Collect 10~20 photos of your target character and train a LoRA weight file. When generating with Z-Image, load the LoRA and use a trigger word — the model directly outputs images of that character.
- Pros: Strong character consistency, natural face blending, no post-processing needed
- Cons: Requires training data, longer training time (approximately 1~2 hours)
3. Approach One: ComfyUI + Z-Image Turbo + ReActor Workflow Walkthrough
3.1 Environment Setup
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| GPU/Memory | 16GB unified memory (M3/M4/M5) / 8GB VRAM | 24GB VRAM / 32GB+ RAM |
| Storage | 50GB available space | SSD, 100GB+ available space |
| OS | macOS 14+ / Linux (Ubuntu 22.04+) / Windows 11 | Same as above |
Software Installation Steps
# 1. Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# 2. Create virtual environment (optional, recommended)
python -m venv venv
source venv/bin/activate # Linux/macOS
# venv/Scripts/activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Install ComfyUI Manager (node plugin manager)
cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
# 5. Install ReActor node
git clone https://github.com/TencentARC/InsightFace.git
# Or search for "ReActor" in ComfyUI Manager for one-click install
Downloading Models
# Create model directories
mkdir -p models/checkpoints
mkdir -p models/clip
mkdir -p models/vae
# Download Z-Image Turbo FP8 ALL-in-One model
# FP8 version is recommended to save VRAM
# Download sources: HuggingFace or Civitai community
# Example filename: Z-Image-Turbo-ALL-in-One-FP8.safetensors
3.2 Building the Workflow
The community has created a ready-to-use "Z-Image_FaceSwap_Gen_1.0" workflow template that you can download directly from Civitai. Below is a breakdown of the core nodes:
Core Node Structure
┌─ Load Checkpoint ─────────────────────────┐
│ Model: Z-Image-Turbo-ALL-in-One-FP8 │
│ VAE: Built-in (included in ALL-in-One) │
└──────────────┬─────────────────────────────┘
▼
┌─ CLIP Text Encode (Prompt) ─────────────┐
│ Positive: Photorealistic portrait prompt│
│ Negative: Negative prompt │
└──────────────┬─────────────────────────────┘
▼
┌─ Empty Latent Image ─────────────────────┐
│ Width: 832, Height: 1216 │
│ (Z-Image recommended resolution) │
└──────────────┬─────────────────────────────┘
▼
┌─ KSampler ───────────────────────────────┐
│ Steps: 20~30 │
│ CFG: 5.0~7.0 │
│ Sampler: euler_ancestral │
│ Scheduler: normal │
└──────────────┬─────────────────────────────┘
▼
┌─ VAEDecode ──────────────────────────────┐
│ Decode latent to image │
└──────────────┬─────────────────────────────┘
▼
┌─ ReActor ────────────────────────────────┐
│ Input: Generated image + Target face photo│
│ Face Model: antelopev2 │
│ Restore Face: GFPGAN/CodeFormer │
└──────────────┬─────────────────────────────┘
▼
┌─ Save Image ─────────────────────────────┐
│ Output final result │
└──────────────────────────────────────────┘
3.3 Prompt Writing Guide
Positive Prompt Template
# General photorealistic portrait prompt
(masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
<scene description>, <outfit description>, <expression description>
# Example: Female portrait on a city street
(masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
standing on a city street at sunset, wearing a white dress,
gentle smile, warm golden hour lighting
Negative Prompt (Required)
(worst quality, low quality:1.4),
nsfw, nude, naked,
extra fingers, fewer fingers, extra limbs,
bad anatomy, bad hands, missing limbs,
blurry, jpeg artifacts, watermark, signature, text,
deformed face, asymmetrical eyes
3.4 ReActor Parameter Tuning
| Parameter | Recommended Value | Description |
|---|---|---|
Face Model |
antelopev2 |
Face detection model with the highest recognition rate |
Restore Face |
CodeFormer |
Face restoration for enhanced detail |
Restore Visibility |
0.5~0.7 |
Restoration strength; too high will lose facial features |
Swap Face |
True |
Enable face swap |
Source Face Index |
0 |
Default to the first detected face |
Mask Face |
True |
Use face mask to reduce edge artifacts |
⚠️ Common Issue: If you get a "mask-like" effect (face doesn't blend with the body) after swapping, try lowering
Restore Visibilityto 0.3~0.5 and适当increasing the KSampler's CFG value.
4. Approach Two: LoRA Character Fine-Tuning (nphSi/Z-Image-Lora)
4.1 Pre-Training Preparation
Data Collection
| Item | Requirements |
|---|---|
| Number of Photos | 10~20 (too few leads to unstable training, too many can cause overfitting) |
| Resolution | Recommended 512×512 or 768×768; will be auto-cropped during training |
| Diversity | Different angles, lighting, expressions, outfits, and backgrounds |
| Cropping | Centered on the face, including the upper body is best |
| Quality | Clear, unobstructed, no beauty filters |
Data Directory Structure
dataset/
├── character_name/
│ ├── img_01.jpg
│ ├── img_02.jpg
│ ├── ...
│ └── img_20.jpg
└── metadata.json
Image Captioning
Use WD 1.4 Tagger for automatic tagging:
# Use WD 1.4 model for batch tagging
python tag.py --dir dataset/character_name --model wd-v1-4-convnext-fp16.onnx
Example of manually organized tags:
# Tags for img_01.jpg
simple tags: 1girl, solo, brown hair, looking at viewer, white shirt, upper body
4.2 Training Configuration
Based on the nphSi/Z-Image-Lora approach, train using Kohya SS or the command line.
Recommended Kohya SS GUI Training Parameters
┌─────────────────────────────────────────────────┐
│ Recommended LoRA Training Parameters │
├─────────────────┬───────────────────────────────┤
│ Basic Parameters│ │
│ Base Model │ Z-Image Turbo (FP16 original) │
│ Network Module │ LoRA │
│ Network Dim │ 32~64 (recommended for chars) │
│ Network Alpha │ 16~32 (roughly half of Dim) │
│ │ │
│ Training Params │ │
│ Epochs │ 15~30 │
│ Learning Rate │ 1e-4 (UNet) / 5e-5 (Text Encoder)│
│ Batch Size │ 1~4 (depends on VRAM) │
│ Resolution │ 512x512 / 768x768 │
│ │ │
│ Optimizer │ │
│ Optimizer │ AdamW8bit │
│ LR Scheduler │ cosine with warmup │
│ Warmup Steps │ 100 │
└─────────────────┴───────────────────────────────┘
Command Line Training Example
# Train using train_network.py
accelerate launch train_network.py /
--pretrained_model_name_or_path="z-ai/z-image-turbo" /
--dataset_dir="./dataset/character_name" /
--output_dir="./output/lora" /
--output_name="character_lora" /
--network_module="networks.lora" /
--network_dim=32 /
--network_alpha=16 /
--train_batch_size=1 /
--max_train_epochs=20 /
--learning_rate=1e-4 /
--text_encoder_lr=5e-5 /
--lr_scheduler="cosine" /
--lr_warmup_steps=100 /
--resolution=512,512 /
--cache_latents /
--cache_text_encoder_outputs /
--optimizer_type="AdamW8bit" /
--mixed_precision="bf16" /
--seed=42 /
--save_every_n_epochs=5 /
--save_model_as=safetensors
4.3 Post-Training Inference
Generating with LoRA Loaded in ComfyUI
┌─ Load Checkpoint ─────────────────────────┐
│ Model: Z-Image-Turbo-ALL-in-One-FP8 │
└──────────────┬─────────────────────────────┘
▼
┌─ Load LoRA ───────────────────────────────┐
│ LoRA: character_lora.safetensors │
│ Strength Model: 0.8~1.0 │
│ Strength Clip: 0.8~1.0 │
└──────────────┬─────────────────────────────┘
▼
┌─ CLIP Text Encode (Prompt) ───────────────┐
│ Positive: [trigger_word], + description prompt│
│ ⚠️ Must use the trigger word! │
└──────────────┬─────────────────────────────┘
▼
┌─ KSampler ──▶ VAEDecode ──▶ Save Image │
└────────────────────────────────────────────┘
Inference Prompt Template (Must Include Trigger Word)
# Assuming the trigger word is "zchar"
zchar, (masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
sitting in a cafe, wearing a cozy sweater, warm smile
# ❌ Wrong example: Forgetting the trigger word → generates a generic character, not your target
# ❌ Correct example: zchar + description → the model knows who to generate
5. Comparative Summary: Pros and Cons at a Glance
Comparison Table
| Dimension | ComfyUI Workflow (ReActor) | LoRA Character Training |
|---|---|---|
| Difficulty | ⭐ Very Low (drag-and-drop nodes) | ⭐⭐⭐ Medium (requires data prep & training) |
| Time Cost | Instant generation (1~3 minutes per image) | Training 1~2 hours + 1~3 minutes per generation |
| Hardware Requirements | 16GB RAM / 8GB VRAM | Training needs 8~16GB VRAM, inference same as left |
| Character Consistency | Medium (depends on ReActor algorithm) | High (model has learned character features) |
| Face Naturalness | Good (occasional mask-like effect) | Excellent (native blending) |
| Person Swap Flexibility | Very High (just change the photo) | Low (changing person requires retraining) |
| Use Cases | Quick testing, multiple face swaps, A/B testing | Fixed characters, serialized content, brand IPs |
| Prompt Control | Fully free | Requires trigger word |
| Community Resources | Z-Image_FaceSwap_Gen_1.0 template | nphSi/Z-Image-Lora tutorial |
Selection Guide
What do you need?
│
├─ "Quick generation, try a few different faces"
│ └─ ✅ Choose ComfyUI Workflow (ReActor)
│
├─ "Fixed character, create a series"
│ └─ ✅ Choose LoRA Training
│
├─ "Want both quick generation and character consistency"
│ └─ ✅ Use both together: LoRA for base image + ReActor for fine-tuning
│
└─ "Limited budget, tight VRAM"
└─ ✅ Start with ComfyUI Workflow + FP8 model, verify results, then consider LoRA
6. Best Practices and Troubleshooting Guide
6.1 General Tips
✅ Always Use FP8 Models to Reduce VRAM
Model Version Comparison:
┌────────────┬───────────┬──────────────────────┐
│ Format │ File Size │ Recommended VRAM │
├────────────┼───────────┼──────────────────────┤
│ FP32 │ ~10 GB │ ≥24GB VRAM │
│ FP16 │ ~5 GB │ ≥16GB VRAM / Unified Mem│
│ FP8 │ ~2.5 GB │ ≥8GB VRAM / 16GB URM │
└────────────┴───────────┴──────────────────────┘
💡 Recommendation: Start with FP8, upgrade to FP16 for maximum quality
✅ Always Use a Trigger Word for LoRA Training
# Trigger Word Rules:
# 1. Short and unique: recommend 3~8 letters, e.g., "zchar", "mychar"
# 2. Avoid everyday words: prevents confusion with common descriptors
# 3. Place at the very front of the prompt: ensures the model recognizes it first
# 4. Training captions must include the trigger word
# Example: Training dataset caption format
# img_01.jpg.caption: "zchar, 1girl, brown hair, white shirt, portrait"
# img_02.jpg.caption: "zchar, 1girl, brown hair, black dress, smiling"
✅ Resolution Selection
Z-Image Recommended Resolutions:
- Portrait (vertical): 832 × 1216 (default, best ratio)
- Landscape (horizontal): 1216 × 832
- Square: 1024 × 1024
⚠️ Avoid non-standard resolutions, which may cause image distortion
6.2 Common Issues Troubleshooting
| Issue | Possible Cause | Solution |
|---|---|---|
| Blurry face after swap | ReActor Restore value too high | Lower Restore Visibility to 0.3~0.5 |
| Face disconnected from body after swap | Mask range inappropriate | Enable Mask Face, adjust Mask Softness |
| LoRA output doesn't resemble target | Under-training or overfitting | Adjust Epochs (15~30), check data quality |
| Out of memory (OOM) | Model too large or resolution too high | Switch to FP8, lower resolution to 512 |
| Trigger word not working | Typo or wrong placement | Check spelling/case, ensure it's at the very front |
| ReActor can't detect face | Poor input photo quality | Use clear frontal photos, switch Face Model |
6.3 Apple Silicon Optimization
# Recommendations for macOS Apple Silicon users:
# 1. Ensure ComfyUI uses the MPS backend
# 2. Add the following to launch parameters:
comfyui --force-fp16
# 3. If experiencing heavy memory swapping, reduce batch_size
# 4. M3/M4/M5 with 16GB unified memory can smoothly run FP8 versions
6.4 Batch Generation Tips
# Batch generation example using ComfyUI API
import requests
import json
workflow_path = "Z-Image_FaceSwap_Gen_1.0.json"
with open(workflow_path, "r") as f:
workflow = json.load(f)
# Submit workflow
response = requests.post(
"http://127.0.0.1:8188/prompt",
json={"prompt": workflow, "client_id": "batch-gen"}
)
print(f"Generation task submitted, ID: {response.json()['prompt_id']}")
7. Appendix: Common Prompt Templates and Resource Links
7.1 Prompt Template Collection
Photorealistic Portrait (General)
(masterpiece, best quality:1.2), photorealistic, ultra detailed,
1girl, looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field, bokeh,
portrait shot, upper body,
studio lighting, professional photography,
8k resolution, sharp focus
Natural Scene
(masterpiece, best quality:1.2), photorealistic,
1girl, looking away, gentle breeze, windblown hair,
natural outdoor lighting, golden hour,
medium shot, upper body,
standing in a flower field, soft warm colors,
dreamy atmosphere, shallow depth of field
LoRA Character + Scene Customization
zchar, (masterpiece, best quality:1.2), photorealistic,
1girl, looking at viewer, slight smile,
cinematic lighting, film grain,
portrait shot, upper body,
sitting at a desk in a cozy study room, warm desk lamp,
bookshelves in background, soft focus
7.2 Resource Links Summary
| Resource | Link |
|---|---|
| Z-Image Official GitHub | Z.ai GitHub |
| Z-Image Turbo Model (HuggingFace) | [HuggingFace Page] |
| Z-Image_FaceSwap_Gen_1.0 (Civitai) | [Civitai Page] |
| ReActor Node | GitHub - ReActor |
| nphSi/Z-Image-Lora | [HuggingFace - LoRA Tutorial] |
| ComfyUI Manager | GitHub - ComfyUI-Manager |
Final Thoughts
Z-Image's face swap capabilities are rapidly evolving. ComfyUI workflows are ideal for quick prototyping and flexible face swapping, while LoRA training suits deep customization and character consistency needs. The two are not mutually exclusive — many advanced users first validate their prompts and compositions using ComfyUI workflows, then train LoRAs for fixed characters to improve quality.
Regardless of which approach you choose, FP8 quantization and trigger words are two essential techniques to master. Happy generating!
📝 If you found this article helpful, feel free to share it with fellow creators. Questions? Drop a comment below!
Last updated: 2026-04-28 | Author: Z-Image Practical Guide