Z-Image Face Swap in Action: From ComfyUI Workflows to LoRA Character Training

4월 28, 2026

Z-Image Face Swap in Action: From ComfyUI Workflows to LoRA Character Training

Abstract: Z-Image is an open-source AI image generation model launched by Z.ai, renowned for its exceptional aesthetics and image quality. This article provides an in-depth guide to practical face swap solutions with Z-Image, covering two mainstream approaches — one-click face swap via ComfyUI workflows and character fine-tuning with LoRA — helping readers build their own AI face swap system from scratch.

  • 📅 Publication Date: 2026-04-28
  • 🏷️ Tags: Z-Image Face Swap ComfyUI LoRA ReActor AI Art
  • 💻 Hardware Requirements: 16GB unified memory (Apple Silicon M3/M4/M5) or entry-level NVIDIA/AMD GPU
  • 📦 Community Resources: Z-Image_FaceSwap_Gen_1.0 workflow template

Table of Contents

  1. Introduction: Overview of Z-Image's Face Swap Capabilities
  2. Comparing Two Face Swap Approaches: ComfyUI Workflows vs. LoRA Training
  3. Approach One: ComfyUI + Z-Image Turbo + ReActor Workflow Walkthrough
  4. Approach Two: LoRA Character Fine-Tuning (nphSi/Z-Image-Lora)
  5. Comparative Summary: Pros and Cons at a Glance
  6. Best Practices and Troubleshooting Guide
  7. Appendix: Common Prompt Templates and Resource Links

1. Introduction: Overview of Z-Image's Face Swap Capabilities

What is Z-Image?

Z-Image is a high-quality AI image generation model developed by the Z.ai team. Built on a Diffusion architecture, it excels particularly in character portraits and photorealistic human imagery. The model supports FP8 quantization, significantly lowering VRAM requirements and enabling smooth operation on consumer-grade hardware.

Why is Z-Image Suitable for Face Swapping?

Z-Image's face swap capability doesn't come from a single module — it stems from its outstanding facial detail generation. The faces it produces already have a high degree of realism and aesthetic quality. On this foundation, the community has developed two mature face swap approaches:

Approach Core Concept Use Cases
ComfyUI Workflow First generate a high-quality base image with Z-Image, then replace the target face using the ReActor plugin Quick generation, temporary face swaps, batch production
LoRA Character Training Train a dedicated LoRA on photos of a specific person, then the model directly outputs that character during generation Character consistency, long-term projects, manga/novel illustrations

💡 Key Difference: The ComfyUI workflow follows a "generate-then-replace" strategy, while LoRA training follows a "generate-as-character" strategy. The two can complement each other.


2. Comparing Two Face Swap Approaches: ComfyUI Workflows vs. LoRA Training

Before diving into the details, let's take a panoramic comparison of both approaches:

Approach One: ComfyUI Workflow (Z-Image Turbo + ReActor)

┌─────────────┐     ┌──────────────┐     ┌────────────────┐
│   Target Person Photo  │────▶│              │────▶│                  │
│   (Source)   │     │  Z-Image     │     │    ReActor      │
│              │     │  Turbo FP8   │     │  Face Swap      │
└─────────────┘     │  Generate Base Image │     │  Face Replacement  │
                    └──────┬───────┘     └────────┬───────┘
                           │                      │
                           ▼                      ▼
                    ┌─────────────────────────────────┐
                    │       Final Face Swap Result      │
                    └─────────────────────────────────┘
  • How It Works: First, use the Z-Image Turbo model to generate a high-quality portrait base image based on your prompt. Then, leverage the ReActor plugin to transfer the target person's facial features onto the base image.
  • Pros: Extremely quick to get started, no training required, swap anyone you want
  • Cons: Face blending quality depends on the ReActor algorithm; certain angles may look unnatural

Approach Two: LoRA Character Fine-Tuning

┌──────────────────┐
│  10~20 Character Photos  │
│  (Training Dataset)      │
└────────┬─────────┘
         ▼
┌──────────────────┐
│  LoRA Training    │
│  (nphSi/Z-Image-  │
│   Lora)          │
└────────┬─────────┘
         ▼
┌──────────────────┐
│  Load LoRA +      │
│  Z-Image Generation│
│  (Trigger Word + Prompt)│
└────────┬─────────┘
         ▼
┌──────────────────┐
│  Direct Character Output │
│  (No Face Swap Step Needed)│
└──────────────────┘
  • How It Works: Collect 10~20 photos of your target character and train a LoRA weight file. When generating with Z-Image, load the LoRA and use a trigger word — the model directly outputs images of that character.
  • Pros: Strong character consistency, natural face blending, no post-processing needed
  • Cons: Requires training data, longer training time (approximately 1~2 hours)

3. Approach One: ComfyUI + Z-Image Turbo + ReActor Workflow Walkthrough

3.1 Environment Setup

Hardware Requirements

Component Minimum Recommended
GPU/Memory 16GB unified memory (M3/M4/M5) / 8GB VRAM 24GB VRAM / 32GB+ RAM
Storage 50GB available space SSD, 100GB+ available space
OS macOS 14+ / Linux (Ubuntu 22.04+) / Windows 11 Same as above

Software Installation Steps

# 1. Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# 2. Create virtual environment (optional, recommended)
python -m venv venv
source venv/bin/activate  # Linux/macOS
# venv/Scripts/activate   # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Install ComfyUI Manager (node plugin manager)
cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git

# 5. Install ReActor node
git clone https://github.com/TencentARC/InsightFace.git
# Or search for "ReActor" in ComfyUI Manager for one-click install

Downloading Models

# Create model directories
mkdir -p models/checkpoints
mkdir -p models/clip
mkdir -p models/vae

# Download Z-Image Turbo FP8 ALL-in-One model
# FP8 version is recommended to save VRAM
# Download sources: HuggingFace or Civitai community
# Example filename: Z-Image-Turbo-ALL-in-One-FP8.safetensors

3.2 Building the Workflow

The community has created a ready-to-use "Z-Image_FaceSwap_Gen_1.0" workflow template that you can download directly from Civitai. Below is a breakdown of the core nodes:

Core Node Structure

┌─ Load Checkpoint ─────────────────────────┐
│  Model: Z-Image-Turbo-ALL-in-One-FP8      │
│  VAE: Built-in (included in ALL-in-One)   │
└──────────────┬─────────────────────────────┘
               ▼
┌─ CLIP Text Encode (Prompt) ─────────────┐
│  Positive: Photorealistic portrait prompt│
│  Negative: Negative prompt              │
└──────────────┬─────────────────────────────┘
               ▼
┌─ Empty Latent Image ─────────────────────┐
│  Width: 832, Height: 1216                │
│  (Z-Image recommended resolution)        │
└──────────────┬─────────────────────────────┘
               ▼
┌─ KSampler ───────────────────────────────┐
│  Steps: 20~30                            │
│  CFG: 5.0~7.0                            │
│  Sampler: euler_ancestral                │
│  Scheduler: normal                       │
└──────────────┬─────────────────────────────┘
               ▼
┌─ VAEDecode ──────────────────────────────┐
│  Decode latent to image                  │
└──────────────┬─────────────────────────────┘
               ▼
┌─ ReActor ────────────────────────────────┐
│  Input: Generated image + Target face photo│
│  Face Model: antelopev2                  │
│  Restore Face: GFPGAN/CodeFormer         │
└──────────────┬─────────────────────────────┘
               ▼
┌─ Save Image ─────────────────────────────┐
│  Output final result                     │
└──────────────────────────────────────────┘

3.3 Prompt Writing Guide

Positive Prompt Template

# General photorealistic portrait prompt
(masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
<scene description>, <outfit description>, <expression description>

# Example: Female portrait on a city street
(masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
standing on a city street at sunset, wearing a white dress,
gentle smile, warm golden hour lighting

Negative Prompt (Required)

(worst quality, low quality:1.4),
nsfw, nude, naked,
extra fingers, fewer fingers, extra limbs,
bad anatomy, bad hands, missing limbs,
blurry, jpeg artifacts, watermark, signature, text,
deformed face, asymmetrical eyes

3.4 ReActor Parameter Tuning

Parameter Recommended Value Description
Face Model antelopev2 Face detection model with the highest recognition rate
Restore Face CodeFormer Face restoration for enhanced detail
Restore Visibility 0.5~0.7 Restoration strength; too high will lose facial features
Swap Face True Enable face swap
Source Face Index 0 Default to the first detected face
Mask Face True Use face mask to reduce edge artifacts

⚠️ Common Issue: If you get a "mask-like" effect (face doesn't blend with the body) after swapping, try lowering Restore Visibility to 0.3~0.5 and适当increasing the KSampler's CFG value.


4. Approach Two: LoRA Character Fine-Tuning (nphSi/Z-Image-Lora)

4.1 Pre-Training Preparation

Data Collection

Item Requirements
Number of Photos 10~20 (too few leads to unstable training, too many can cause overfitting)
Resolution Recommended 512×512 or 768×768; will be auto-cropped during training
Diversity Different angles, lighting, expressions, outfits, and backgrounds
Cropping Centered on the face, including the upper body is best
Quality Clear, unobstructed, no beauty filters

Data Directory Structure

dataset/
├── character_name/
│   ├── img_01.jpg
│   ├── img_02.jpg
│   ├── ...
│   └── img_20.jpg
└── metadata.json

Image Captioning

Use WD 1.4 Tagger for automatic tagging:

# Use WD 1.4 model for batch tagging
python tag.py --dir dataset/character_name --model wd-v1-4-convnext-fp16.onnx

Example of manually organized tags:

# Tags for img_01.jpg
simple tags: 1girl, solo, brown hair, looking at viewer, white shirt, upper body

4.2 Training Configuration

Based on the nphSi/Z-Image-Lora approach, train using Kohya SS or the command line.

┌─────────────────────────────────────────────────┐
│         Recommended LoRA Training Parameters    │
├─────────────────┬───────────────────────────────┤
│ Basic Parameters│                               │
│ Base Model      │ Z-Image Turbo (FP16 original) │
│ Network Module  │ LoRA                          │
│ Network Dim     │ 32~64 (recommended for chars) │
│ Network Alpha   │ 16~32 (roughly half of Dim)   │
│                 │                               │
│ Training Params │                               │
│ Epochs          │ 15~30                         │
│ Learning Rate   │ 1e-4 (UNet) / 5e-5 (Text Encoder)│
│ Batch Size      │ 1~4 (depends on VRAM)         │
│ Resolution      │ 512x512 / 768x768             │
│                 │                               │
│ Optimizer       │                               │
│ Optimizer       │ AdamW8bit                     │
│ LR Scheduler    │ cosine with warmup            │
│ Warmup Steps    │ 100                           │
└─────────────────┴───────────────────────────────┘

Command Line Training Example

# Train using train_network.py
accelerate launch train_network.py /
  --pretrained_model_name_or_path="z-ai/z-image-turbo" /
  --dataset_dir="./dataset/character_name" /
  --output_dir="./output/lora" /
  --output_name="character_lora" /
  --network_module="networks.lora" /
  --network_dim=32 /
  --network_alpha=16 /
  --train_batch_size=1 /
  --max_train_epochs=20 /
  --learning_rate=1e-4 /
  --text_encoder_lr=5e-5 /
  --lr_scheduler="cosine" /
  --lr_warmup_steps=100 /
  --resolution=512,512 /
  --cache_latents /
  --cache_text_encoder_outputs /
  --optimizer_type="AdamW8bit" /
  --mixed_precision="bf16" /
  --seed=42 /
  --save_every_n_epochs=5 /
  --save_model_as=safetensors

4.3 Post-Training Inference

Generating with LoRA Loaded in ComfyUI

┌─ Load Checkpoint ─────────────────────────┐
│  Model: Z-Image-Turbo-ALL-in-One-FP8      │
└──────────────┬─────────────────────────────┘
               ▼
┌─ Load LoRA ───────────────────────────────┐
│  LoRA: character_lora.safetensors          │
│  Strength Model: 0.8~1.0                  │
│  Strength Clip: 0.8~1.0                   │
└──────────────┬─────────────────────────────┘
               ▼
┌─ CLIP Text Encode (Prompt) ───────────────┐
│  Positive: [trigger_word], + description prompt│
│  ⚠️ Must use the trigger word!             │
└──────────────┬─────────────────────────────┘
               ▼
┌─ KSampler ──▶ VAEDecode ──▶ Save Image    │
└────────────────────────────────────────────┘

Inference Prompt Template (Must Include Trigger Word)

# Assuming the trigger word is "zchar"
zchar, (masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
sitting in a cafe, wearing a cozy sweater, warm smile

# ❌ Wrong example: Forgetting the trigger word → generates a generic character, not your target
# ❌ Correct example: zchar + description → the model knows who to generate

5. Comparative Summary: Pros and Cons at a Glance

Comparison Table

Dimension ComfyUI Workflow (ReActor) LoRA Character Training
Difficulty ⭐ Very Low (drag-and-drop nodes) ⭐⭐⭐ Medium (requires data prep & training)
Time Cost Instant generation (1~3 minutes per image) Training 1~2 hours + 1~3 minutes per generation
Hardware Requirements 16GB RAM / 8GB VRAM Training needs 8~16GB VRAM, inference same as left
Character Consistency Medium (depends on ReActor algorithm) High (model has learned character features)
Face Naturalness Good (occasional mask-like effect) Excellent (native blending)
Person Swap Flexibility Very High (just change the photo) Low (changing person requires retraining)
Use Cases Quick testing, multiple face swaps, A/B testing Fixed characters, serialized content, brand IPs
Prompt Control Fully free Requires trigger word
Community Resources Z-Image_FaceSwap_Gen_1.0 template nphSi/Z-Image-Lora tutorial

Selection Guide

What do you need?
│
├─ "Quick generation, try a few different faces"
│   └─ ✅ Choose ComfyUI Workflow (ReActor)
│
├─ "Fixed character, create a series"
│   └─ ✅ Choose LoRA Training
│
├─ "Want both quick generation and character consistency"
│   └─ ✅ Use both together: LoRA for base image + ReActor for fine-tuning
│
└─ "Limited budget, tight VRAM"
    └─ ✅ Start with ComfyUI Workflow + FP8 model, verify results, then consider LoRA

6. Best Practices and Troubleshooting Guide

6.1 General Tips

✅ Always Use FP8 Models to Reduce VRAM

Model Version Comparison:
┌────────────┬───────────┬──────────────────────┐
│ Format     │ File Size │ Recommended VRAM     │
├────────────┼───────────┼──────────────────────┤
│ FP32       │ ~10 GB    │ ≥24GB VRAM           │
│ FP16       │ ~5 GB     │ ≥16GB VRAM / Unified Mem│
│ FP8        │ ~2.5 GB   │ ≥8GB VRAM / 16GB URM │
└────────────┴───────────┴──────────────────────┘
💡 Recommendation: Start with FP8, upgrade to FP16 for maximum quality

✅ Always Use a Trigger Word for LoRA Training

# Trigger Word Rules:
# 1. Short and unique: recommend 3~8 letters, e.g., "zchar", "mychar"
# 2. Avoid everyday words: prevents confusion with common descriptors
# 3. Place at the very front of the prompt: ensures the model recognizes it first
# 4. Training captions must include the trigger word

# Example: Training dataset caption format
# img_01.jpg.caption: "zchar, 1girl, brown hair, white shirt, portrait"
# img_02.jpg.caption: "zchar, 1girl, brown hair, black dress, smiling"

✅ Resolution Selection

Z-Image Recommended Resolutions:
- Portrait (vertical): 832 × 1216 (default, best ratio)
- Landscape (horizontal): 1216 × 832
- Square: 1024 × 1024

⚠️ Avoid non-standard resolutions, which may cause image distortion

6.2 Common Issues Troubleshooting

Issue Possible Cause Solution
Blurry face after swap ReActor Restore value too high Lower Restore Visibility to 0.3~0.5
Face disconnected from body after swap Mask range inappropriate Enable Mask Face, adjust Mask Softness
LoRA output doesn't resemble target Under-training or overfitting Adjust Epochs (15~30), check data quality
Out of memory (OOM) Model too large or resolution too high Switch to FP8, lower resolution to 512
Trigger word not working Typo or wrong placement Check spelling/case, ensure it's at the very front
ReActor can't detect face Poor input photo quality Use clear frontal photos, switch Face Model

6.3 Apple Silicon Optimization

# Recommendations for macOS Apple Silicon users:
# 1. Ensure ComfyUI uses the MPS backend
# 2. Add the following to launch parameters:
comfyui --force-fp16

# 3. If experiencing heavy memory swapping, reduce batch_size
# 4. M3/M4/M5 with 16GB unified memory can smoothly run FP8 versions

6.4 Batch Generation Tips

# Batch generation example using ComfyUI API
import requests
import json

workflow_path = "Z-Image_FaceSwap_Gen_1.0.json"
with open(workflow_path, "r") as f:
    workflow = json.load(f)

# Submit workflow
response = requests.post(
    "http://127.0.0.1:8188/prompt",
    json={"prompt": workflow, "client_id": "batch-gen"}
)
print(f"Generation task submitted, ID: {response.json()['prompt_id']}")

7.1 Prompt Template Collection

Photorealistic Portrait (General)

(masterpiece, best quality:1.2), photorealistic, ultra detailed,
1girl, looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field, bokeh,
portrait shot, upper body,
studio lighting, professional photography,
8k resolution, sharp focus

Natural Scene

(masterpiece, best quality:1.2), photorealistic,
1girl, looking away, gentle breeze, windblown hair,
natural outdoor lighting, golden hour,
medium shot, upper body,
standing in a flower field, soft warm colors,
dreamy atmosphere, shallow depth of field

LoRA Character + Scene Customization

zchar, (masterpiece, best quality:1.2), photorealistic,
1girl, looking at viewer, slight smile,
cinematic lighting, film grain,
portrait shot, upper body,
sitting at a desk in a cozy study room, warm desk lamp,
bookshelves in background, soft focus
Resource Link
Z-Image Official GitHub Z.ai GitHub
Z-Image Turbo Model (HuggingFace) [HuggingFace Page]
Z-Image_FaceSwap_Gen_1.0 (Civitai) [Civitai Page]
ReActor Node GitHub - ReActor
nphSi/Z-Image-Lora [HuggingFace - LoRA Tutorial]
ComfyUI Manager GitHub - ComfyUI-Manager

Final Thoughts

Z-Image's face swap capabilities are rapidly evolving. ComfyUI workflows are ideal for quick prototyping and flexible face swapping, while LoRA training suits deep customization and character consistency needs. The two are not mutually exclusive — many advanced users first validate their prompts and compositions using ComfyUI workflows, then train LoRAs for fixed characters to improve quality.

Regardless of which approach you choose, FP8 quantization and trigger words are two essential techniques to master. Happy generating!


📝 If you found this article helpful, feel free to share it with fellow creators. Questions? Drop a comment below!


Last updated: 2026-04-28 | Author: Z-Image Practical Guide

Z-Image Team