Z-Image Face Swap in Action: From ComfyUI Workflows to LoRA Character Training

Abstract: Z-Image is an open-source AI image generation model launched by Z.ai, renowned for its exceptional aesthetics and image quality. This article provides an in-depth guide to practical face swap solutions with Z-Image, covering two mainstream approaches — one-click face swap via ComfyUI workflows and character fine-tuning with LoRA — helping readers build their own AI face swap system from scratch.

📅 Publication Date: 2026-04-28

🏷️ Tags: Z-Image Face Swap ComfyUI LoRA ReActor AI Art

💻 Hardware Requirements: 16GB unified memory (Apple Silicon M3/M4/M5) or entry-level NVIDIA/AMD GPU

📦 Community Resources: Z-Image_FaceSwap_Gen_1.0 workflow template

Introduction: Overview of Z-Image's Face Swap Capabilities
Comparing Two Face Swap Approaches: ComfyUI Workflows vs. LoRA Training
Approach One: ComfyUI + Z-Image Turbo + ReActor Workflow Walkthrough
Approach Two: LoRA Character Fine-Tuning (nphSi/Z-Image-Lora)
Comparative Summary: Pros and Cons at a Glance
Best Practices and Troubleshooting Guide
Appendix: Common Prompt Templates and Resource Links

1. Introduction: Overview of Z-Image's Face Swap Capabilities

What is Z-Image?

Z-Image is a high-quality AI image generation model developed by the Z.ai team. Built on a Diffusion architecture, it excels particularly in character portraits and photorealistic human imagery. The model supports FP8 quantization, significantly lowering VRAM requirements and enabling smooth operation on consumer-grade hardware.

Why is Z-Image Suitable for Face Swapping?

Z-Image's face swap capability doesn't come from a single module — it stems from its outstanding facial detail generation. The faces it produces already have a high degree of realism and aesthetic quality. On this foundation, the community has developed two mature face swap approaches:

Approach	Core Concept	Use Cases
ComfyUI Workflow	First generate a high-quality base image with Z-Image, then replace the target face using the ReActor plugin	Quick generation, temporary face swaps, batch production
LoRA Character Training	Train a dedicated LoRA on photos of a specific person, then the model directly outputs that character during generation	Character consistency, long-term projects, manga/novel illustrations

💡 Key Difference: The ComfyUI workflow follows a "generate-then-replace" strategy, while LoRA training follows a "generate-as-character" strategy. The two can complement each other.

2. Comparing Two Face Swap Approaches: ComfyUI Workflows vs. LoRA Training

Before diving into the details, let's take a panoramic comparison of both approaches:

Approach One: ComfyUI Workflow (Z-Image Turbo + ReActor)

┌─────────────┐     ┌──────────────┐     ┌────────────────┐
│   Target Person Photo  │────▶│              │────▶│                  │
│   (Source)   │     │  Z-Image     │     │    ReActor      │
│              │     │  Turbo FP8   │     │  Face Swap      │
└─────────────┘     │  Generate Base Image │     │  Face Replacement  │
                    └──────┬───────┘     └────────┬───────┘
                           │                      │
                           ▼                      ▼
                    ┌─────────────────────────────────┐
                    │       Final Face Swap Result      │
                    └─────────────────────────────────┘

How It Works: First, use the Z-Image Turbo model to generate a high-quality portrait base image based on your prompt. Then, leverage the ReActor plugin to transfer the target person's facial features onto the base image.
Pros: Extremely quick to get started, no training required, swap anyone you want
Cons: Face blending quality depends on the ReActor algorithm; certain angles may look unnatural

Approach Two: LoRA Character Fine-Tuning

┌──────────────────┐
│  10~20 Character Photos  │
│  (Training Dataset)      │
└────────┬─────────┘
         ▼
┌──────────────────┐
│  LoRA Training    │
│  (nphSi/Z-Image-  │
│   Lora)          │
└────────┬─────────┘
         ▼
┌──────────────────┐
│  Load LoRA +      │
│  Z-Image Generation│
│  (Trigger Word + Prompt)│
└────────┬─────────┘
         ▼
┌──────────────────┐
│  Direct Character Output │
│  (No Face Swap Step Needed)│
└──────────────────┘

How It Works: Collect 10~20 photos of your target character and train a LoRA weight file. When generating with Z-Image, load the LoRA and use a trigger word — the model directly outputs images of that character.
Pros: Strong character consistency, natural face blending, no post-processing needed
Cons: Requires training data, longer training time (approximately 1~2 hours)

3. Approach One: ComfyUI + Z-Image Turbo + ReActor Workflow Walkthrough

3.1 Environment Setup

Hardware Requirements

Component	Minimum	Recommended
GPU/Memory	16GB unified memory (M3/M4/M5) / 8GB VRAM	24GB VRAM / 32GB+ RAM
Storage	50GB available space	SSD, 100GB+ available space
OS	macOS 14+ / Linux (Ubuntu 22.04+) / Windows 11	Same as above

Software Installation Steps

# 1. Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# 2. Create virtual environment (optional, recommended)
python -m venv venv
source venv/bin/activate  # Linux/macOS
# venv/Scripts/activate   # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Install ComfyUI Manager (node plugin manager)
cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git

# 5. Install ReActor node
git clone https://github.com/TencentARC/InsightFace.git
# Or search for "ReActor" in ComfyUI Manager for one-click install

Downloading Models

# Create model directories
mkdir -p models/checkpoints
mkdir -p models/clip
mkdir -p models/vae

# Download Z-Image Turbo FP8 ALL-in-One model
# FP8 version is recommended to save VRAM
# Download sources: HuggingFace or Civitai community
# Example filename: Z-Image-Turbo-ALL-in-One-FP8.safetensors

3.2 Building the Workflow

The community has created a ready-to-use "Z-Image_FaceSwap_Gen_1.0" workflow template that you can download directly from Civitai. Below is a breakdown of the core nodes:

Core Node Structure

┌─ Load Checkpoint ─────────────────────────┐
│  Model: Z-Image-Turbo-ALL-in-One-FP8      │
│  VAE: Built-in (included in ALL-in-One)   │
└──────────────┬─────────────────────────────┘
               ▼
┌─ CLIP Text Encode (Prompt) ─────────────┐
│  Positive: Photorealistic portrait prompt│
│  Negative: Negative prompt              │
└──────────────┬─────────────────────────────┘
               ▼
┌─ Empty Latent Image ─────────────────────┐
│  Width: 832, Height: 1216                │
│  (Z-Image recommended resolution)        │
└──────────────┬─────────────────────────────┘
               ▼
┌─ KSampler ───────────────────────────────┐
│  Steps: 20~30                            │
│  CFG: 5.0~7.0                            │
│  Sampler: euler_ancestral                │
│  Scheduler: normal                       │
└──────────────┬─────────────────────────────┘
               ▼
┌─ VAEDecode ──────────────────────────────┐
│  Decode latent to image                  │
└──────────────┬─────────────────────────────┘
               ▼
┌─ ReActor ────────────────────────────────┐
│  Input: Generated image + Target face photo│
│  Face Model: antelopev2                  │
│  Restore Face: GFPGAN/CodeFormer         │
└──────────────┬─────────────────────────────┘
               ▼
┌─ Save Image ─────────────────────────────┐
│  Output final result                     │
└──────────────────────────────────────────┘

3.3 Prompt Writing Guide

Positive Prompt Template

# General photorealistic portrait prompt
(masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
<scene description>, <outfit description>, <expression description>

# Example: Female portrait on a city street
(masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
standing on a city street at sunset, wearing a white dress,
gentle smile, warm golden hour lighting

Negative Prompt (Required)

(worst quality, low quality:1.4),
nsfw, nude, naked,
extra fingers, fewer fingers, extra limbs,
bad anatomy, bad hands, missing limbs,
blurry, jpeg artifacts, watermark, signature, text,
deformed face, asymmetrical eyes

3.4 ReActor Parameter Tuning

Parameter	Recommended Value	Description
`Face Model`	`antelopev2`	Face detection model with the highest recognition rate
`Restore Face`	`CodeFormer`	Face restoration for enhanced detail
`Restore Visibility`	`0.5~0.7`	Restoration strength; too high will lose facial features
`Swap Face`	`True`	Enable face swap
`Source Face Index`	`0`	Default to the first detected face
`Mask Face`	`True`	Use face mask to reduce edge artifacts

⚠️ Common Issue: If you get a "mask-like" effect (face doesn't blend with the body) after swapping, try lowering Restore Visibility to 0.3~0.5 and适当increasing the KSampler's CFG value.

4. Approach Two: LoRA Character Fine-Tuning (nphSi/Z-Image-Lora)

4.1 Pre-Training Preparation

Data Collection

Item	Requirements
Number of Photos	10~20 (too few leads to unstable training, too many can cause overfitting)
Resolution	Recommended 512×512 or 768×768; will be auto-cropped during training
Diversity	Different angles, lighting, expressions, outfits, and backgrounds
Cropping	Centered on the face, including the upper body is best
Quality	Clear, unobstructed, no beauty filters

Data Directory Structure

dataset/
├── character_name/
│   ├── img_01.jpg
│   ├── img_02.jpg
│   ├── ...
│   └── img_20.jpg
└── metadata.json

Image Captioning

Use WD 1.4 Tagger for automatic tagging:

# Use WD 1.4 model for batch tagging
python tag.py --dir dataset/character_name --model wd-v1-4-convnext-fp16.onnx

Example of manually organized tags:

# Tags for img_01.jpg
simple tags: 1girl, solo, brown hair, looking at viewer, white shirt, upper body

4.2 Training Configuration

Based on the nphSi/Z-Image-Lora approach, train using Kohya SS or the command line.

Recommended Kohya SS GUI Training Parameters

┌─────────────────────────────────────────────────┐
│         Recommended LoRA Training Parameters    │
├─────────────────┬───────────────────────────────┤
│ Basic Parameters│                               │
│ Base Model      │ Z-Image Turbo (FP16 original) │
│ Network Module  │ LoRA                          │
│ Network Dim     │ 32~64 (recommended for chars) │
│ Network Alpha   │ 16~32 (roughly half of Dim)   │
│                 │                               │
│ Training Params │                               │
│ Epochs          │ 15~30                         │
│ Learning Rate   │ 1e-4 (UNet) / 5e-5 (Text Encoder)│
│ Batch Size      │ 1~4 (depends on VRAM)         │
│ Resolution      │ 512x512 / 768x768             │
│                 │                               │
│ Optimizer       │                               │
│ Optimizer       │ AdamW8bit                     │
│ LR Scheduler    │ cosine with warmup            │
│ Warmup Steps    │ 100                           │
└─────────────────┴───────────────────────────────┘

Command Line Training Example

# Train using train_network.py
accelerate launch train_network.py /
  --pretrained_model_name_or_path="z-ai/z-image-turbo" /
  --dataset_dir="./dataset/character_name" /
  --output_dir="./output/lora" /
  --output_name="character_lora" /
  --network_module="networks.lora" /
  --network_dim=32 /
  --network_alpha=16 /
  --train_batch_size=1 /
  --max_train_epochs=20 /
  --learning_rate=1e-4 /
  --text_encoder_lr=5e-5 /
  --lr_scheduler="cosine" /
  --lr_warmup_steps=100 /
  --resolution=512,512 /
  --cache_latents /
  --cache_text_encoder_outputs /
  --optimizer_type="AdamW8bit" /
  --mixed_precision="bf16" /
  --seed=42 /
  --save_every_n_epochs=5 /
  --save_model_as=safetensors

4.3 Post-Training Inference

Generating with LoRA Loaded in ComfyUI

┌─ Load Checkpoint ─────────────────────────┐
│  Model: Z-Image-Turbo-ALL-in-One-FP8      │
└──────────────┬─────────────────────────────┘
               ▼
┌─ Load LoRA ───────────────────────────────┐
│  LoRA: character_lora.safetensors          │
│  Strength Model: 0.8~1.0                  │
│  Strength Clip: 0.8~1.0                   │
└──────────────┬─────────────────────────────┘
               ▼
┌─ CLIP Text Encode (Prompt) ───────────────┐
│  Positive: [trigger_word], + description prompt│
│  ⚠️ Must use the trigger word!             │
└──────────────┬─────────────────────────────┘
               ▼
┌─ KSampler ──▶ VAEDecode ──▶ Save Image    │
└────────────────────────────────────────────┘

Inference Prompt Template (Must Include Trigger Word)

# Assuming the trigger word is "zchar"
zchar, (masterpiece, best quality:1.2), photorealistic, 1girl,
looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field,
portrait shot, upper body,
sitting in a cafe, wearing a cozy sweater, warm smile

# ❌ Wrong example: Forgetting the trigger word → generates a generic character, not your target
# ❌ Correct example: zchar + description → the model knows who to generate

5. Comparative Summary: Pros and Cons at a Glance

Comparison Table

Dimension	ComfyUI Workflow (ReActor)	LoRA Character Training
Difficulty	⭐ Very Low (drag-and-drop nodes)	⭐⭐⭐ Medium (requires data prep & training)
Time Cost	Instant generation (1~3 minutes per image)	Training 1~2 hours + 1~3 minutes per generation
Hardware Requirements	16GB RAM / 8GB VRAM	Training needs 8~16GB VRAM, inference same as left
Character Consistency	Medium (depends on ReActor algorithm)	High (model has learned character features)
Face Naturalness	Good (occasional mask-like effect)	Excellent (native blending)
Person Swap Flexibility	Very High (just change the photo)	Low (changing person requires retraining)
Use Cases	Quick testing, multiple face swaps, A/B testing	Fixed characters, serialized content, brand IPs
Prompt Control	Fully free	Requires trigger word
Community Resources	Z-Image_FaceSwap_Gen_1.0 template	nphSi/Z-Image-Lora tutorial

Selection Guide

What do you need?
│
├─ "Quick generation, try a few different faces"
│   └─ ✅ Choose ComfyUI Workflow (ReActor)
│
├─ "Fixed character, create a series"
│   └─ ✅ Choose LoRA Training
│
├─ "Want both quick generation and character consistency"
│   └─ ✅ Use both together: LoRA for base image + ReActor for fine-tuning
│
└─ "Limited budget, tight VRAM"
    └─ ✅ Start with ComfyUI Workflow + FP8 model, verify results, then consider LoRA

6. Best Practices and Troubleshooting Guide

6.1 General Tips

✅ Always Use FP8 Models to Reduce VRAM

Model Version Comparison:
┌────────────┬───────────┬──────────────────────┐
│ Format     │ File Size │ Recommended VRAM     │
├────────────┼───────────┼──────────────────────┤
│ FP32       │ ~10 GB    │ ≥24GB VRAM           │
│ FP16       │ ~5 GB     │ ≥16GB VRAM / Unified Mem│
│ FP8        │ ~2.5 GB   │ ≥8GB VRAM / 16GB URM │
└────────────┴───────────┴──────────────────────┘
💡 Recommendation: Start with FP8, upgrade to FP16 for maximum quality

✅ Always Use a Trigger Word for LoRA Training

# Trigger Word Rules:
# 1. Short and unique: recommend 3~8 letters, e.g., "zchar", "mychar"
# 2. Avoid everyday words: prevents confusion with common descriptors
# 3. Place at the very front of the prompt: ensures the model recognizes it first
# 4. Training captions must include the trigger word

# Example: Training dataset caption format
# img_01.jpg.caption: "zchar, 1girl, brown hair, white shirt, portrait"
# img_02.jpg.caption: "zchar, 1girl, brown hair, black dress, smiling"

✅ Resolution Selection

Z-Image Recommended Resolutions:
- Portrait (vertical): 832 × 1216 (default, best ratio)
- Landscape (horizontal): 1216 × 832
- Square: 1024 × 1024

⚠️ Avoid non-standard resolutions, which may cause image distortion

6.2 Common Issues Troubleshooting

Issue	Possible Cause	Solution
Blurry face after swap	ReActor Restore value too high	Lower `Restore Visibility` to 0.3~0.5
Face disconnected from body after swap	Mask range inappropriate	Enable `Mask Face`, adjust `Mask Softness`
LoRA output doesn't resemble target	Under-training or overfitting	Adjust Epochs (15~30), check data quality
Out of memory (OOM)	Model too large or resolution too high	Switch to FP8, lower resolution to 512
Trigger word not working	Typo or wrong placement	Check spelling/case, ensure it's at the very front
ReActor can't detect face	Poor input photo quality	Use clear frontal photos, switch `Face Model`

6.3 Apple Silicon Optimization

# Recommendations for macOS Apple Silicon users:
# 1. Ensure ComfyUI uses the MPS backend
# 2. Add the following to launch parameters:
comfyui --force-fp16

# 3. If experiencing heavy memory swapping, reduce batch_size
# 4. M3/M4/M5 with 16GB unified memory can smoothly run FP8 versions

6.4 Batch Generation Tips

# Batch generation example using ComfyUI API
import requests
import json

workflow_path = "Z-Image_FaceSwap_Gen_1.0.json"
with open(workflow_path, "r") as f:
    workflow = json.load(f)

# Submit workflow
response = requests.post(
    "http://127.0.0.1:8188/prompt",
    json={"prompt": workflow, "client_id": "batch-gen"}
)
print(f"Generation task submitted, ID: {response.json()['prompt_id']}")

7. Appendix: Common Prompt Templates and Resource Links

7.1 Prompt Template Collection

Photorealistic Portrait (General)

(masterpiece, best quality:1.2), photorealistic, ultra detailed,
1girl, looking at viewer, detailed face, beautiful eyes,
soft natural lighting, depth of field, bokeh,
portrait shot, upper body,
studio lighting, professional photography,
8k resolution, sharp focus

Natural Scene

(masterpiece, best quality:1.2), photorealistic,
1girl, looking away, gentle breeze, windblown hair,
natural outdoor lighting, golden hour,
medium shot, upper body,
standing in a flower field, soft warm colors,
dreamy atmosphere, shallow depth of field

LoRA Character + Scene Customization

zchar, (masterpiece, best quality:1.2), photorealistic,
1girl, looking at viewer, slight smile,
cinematic lighting, film grain,
portrait shot, upper body,
sitting at a desk in a cozy study room, warm desk lamp,
bookshelves in background, soft focus

7.2 Resource Links Summary

Resource	Link
Z-Image Official GitHub	Z.ai GitHub
Z-Image Turbo Model (HuggingFace)	[HuggingFace Page]
Z-Image_FaceSwap_Gen_1.0 (Civitai)	[Civitai Page]
ReActor Node	GitHub - ReActor
nphSi/Z-Image-Lora	[HuggingFace - LoRA Tutorial]
ComfyUI Manager	GitHub - ComfyUI-Manager

Final Thoughts

Z-Image's face swap capabilities are rapidly evolving. ComfyUI workflows are ideal for quick prototyping and flexible face swapping, while LoRA training suits deep customization and character consistency needs. The two are not mutually exclusive — many advanced users first validate their prompts and compositions using ComfyUI workflows, then train LoRAs for fixed characters to improve quality.

Regardless of which approach you choose, FP8 quantization and trigger words are two essential techniques to master. Happy generating!

📝 If you found this article helpful, feel free to share it with fellow creators. Questions? Drop a comment below!

Last updated: 2026-04-28 | Author: Z-Image Practical Guide

Z-Image Face Swap in Action: From ComfyUI Workflows to LoRA Character Training

Table of Contents

Z-Image Face Swap in Action: From ComfyUI Workflows to LoRA Character Training

Table of Contents

1. Introduction: Overview of Z-Image's Face Swap Capabilities

What is Z-Image?

Why is Z-Image Suitable for Face Swapping?

2. Comparing Two Face Swap Approaches: ComfyUI Workflows vs. LoRA Training

Approach One: ComfyUI Workflow (Z-Image Turbo + ReActor)

Approach Two: LoRA Character Fine-Tuning

3. Approach One: ComfyUI + Z-Image Turbo + ReActor Workflow Walkthrough

3.1 Environment Setup

Hardware Requirements

Software Installation Steps

Downloading Models

3.2 Building the Workflow

Core Node Structure

3.3 Prompt Writing Guide

Positive Prompt Template

Negative Prompt (Required)

3.4 ReActor Parameter Tuning

4. Approach Two: LoRA Character Fine-Tuning (nphSi/Z-Image-Lora)

4.1 Pre-Training Preparation

Data Collection

Data Directory Structure

Image Captioning

4.2 Training Configuration

Recommended Kohya SS GUI Training Parameters

Command Line Training Example

4.3 Post-Training Inference

Generating with LoRA Loaded in ComfyUI

Inference Prompt Template (Must Include Trigger Word)

5. Comparative Summary: Pros and Cons at a Glance

Comparison Table

Selection Guide

6. Best Practices and Troubleshooting Guide

6.1 General Tips

✅ Always Use FP8 Models to Reduce VRAM

✅ Always Use a Trigger Word for LoRA Training

✅ Resolution Selection

6.2 Common Issues Troubleshooting

6.3 Apple Silicon Optimization

6.4 Batch Generation Tips

7. Appendix: Common Prompt Templates and Resource Links

7.1 Prompt Template Collection

Photorealistic Portrait (General)

Natural Scene

LoRA Character + Scene Customization

7.2 Resource Links Summary