ERNIE-Image + IP-Adapter: Zero-Training Character Consistency and Style Transfer

Mai 3, 2026

ERNIE-Image + IP-Adapter: Zero-Training Character Consistency and Style Transfer

IP-Adapter lets a single reference image control your entire AI art style — paired with ERNIE-Image 8B, it delivers cross-scene character consistency at zero training cost.


The Character Consistency Challenge

In AI art workflows, character consistency has long been a core problem:

  • The same character appears across scenes (comic panels, multi-panel stories, illustration series)
  • Appearance, costume, and expressions must stay consistent
  • Traditional solutions require training custom LoRA models — high cost, long timeline

LoRA Training Workflow:

Collect 15-50 character images → Clean & annotate → Train 1000-5000 steps → Validate & tune → Output .safetensors

The whole process takes days and requires technical expertise.

IP-Adapter Workflow:

Choose 1 reference image → Load IP-Adapter node → Generate → Done

The whole process takes seconds with zero training cost.


How IP-Adapter Works

IP-Adapter (Image Prompt Adapter) is a lightweight adapter that injects image prompts into text-to-image diffusion models, enabling training-free visual guidance.

Core Architecture

Reference Image
    ↓
CLIP Vision Encoder (visual feature extraction)
    ↓
Cross-Attention Injection (into diffusion model)
    ↓
Diffusion Model (ERNIE-Image DiT)
    ↓
Generated Image (inherits style/character features)

Key Technical Components

Component Role
CLIP Vision Encoder Encodes reference image into visual feature vectors
Cross-Attention Injects visual features into DiT attention layers
Scale Parameter Controls reference image influence (0.6-1.0)
Base Model ERNIE-Image 8B DiT

IP-Adapter vs LoRA Comparison

Dimension IP-Adapter LoRA
Training Required No Yes (15-50 images)
Prep Time Seconds Hours to days
Fidelity Style transfer ⭐⭐⭐⭐ Character fidelity ⭐⭐⭐⭐⭐
Flexibility Switch reference anytime One model per character
VRAM Demand Low Extra VRAM for training
Best For Quick prototyping, style exploration High-fidelity character reproduction

ComfyUI IP-Adapter Workflow Setup

Node Connection Diagram

[CheckPointLoader] → ERNIE-Image 8B
    ↓
[CLIPVisionLoader] → CLIP-ViT-L-14
    ↓
[IPAdapterModelLoader] → IP-Adapter weights
    ↓
[IPAdapterApply] → Scale: 0.8, Weight: 1.0
    ↓
[KSampler] → steps=28, cfg=7.0
    ↓
[VAEDecode] → Output Image

Complete Node Parameter Configuration

Node Parameter Value Notes
IPAdapterApply scale 0.6-0.9 Style control strength
IPAdapterApply weight 0.8-1.0 Weight factor
IPAdapterApply start_at 0.0 Injection start step
IPAdapterApply end_at 1.0 Injection end step
KSampler steps 20-30 Turbo mode: 8-12
KSampler cfg 5.0-8.0 Prompt guidance
KSampler sampler euler_ancestral Sampler
KSampler scheduler normal Scheduler

Multi IP-Adapter Stacking

When you need to control both style and character simultaneously, stack multiple IP-Adapters:

IP-Adapter 1 (style reference) → scale=0.8
IP-Adapter 2 (character reference) → scale=0.6
    ↓
[IPAdapterCombine] → Merge injection
    ↓
KSampler → Generate

Character Consistency in Practice

Scene 1: Comic Panel Consistency

Goal: Same protagonist maintains appearance across comic panels.

Workflow:

[Load Checkpoint] → ERNIE-Image
    ↓
[LoadImage] → Character design (front view)
    ↓
[CLIPVisionEncode] → Extract visual features
    ↓
[IPAdapterApply] → scale=0.7
    ↓
[CLIPTextEncode] → Panel 1 scene description
    ↓
[KSampler] → Generate Panel 1
    ↓
[CLIPTextEncode] → Panel 2 scene description (same IP-Adapter)
    ↓
[KSampler] → Generate Panel 2
    ↓
... (repeat for more panels)

Prompt Template:

# Panel 1
comic panel, {character_description} standing in {location},
{action}, {expression}, speech bubble: "{text}"

# Panel 2
comic panel, {character_description} walking through {location},
{action}, {expression}, speech bubble: "{text}"

Scene 2: Series Illustration Style Unity

Goal: Multiple illustrations of the same IP character maintain consistent art style.

Techniques:

  1. Style reference selection: Choose the image that best represents the target style
  2. Scale adjustment: 0.6 for creative freedom, 0.9 for strict adherence
  3. Seed consistency: Use same Seed range within batches
  4. Prompt structure: Keep character description consistent, vary only the scene

Style Transfer in Practice

Scene: Brand Visual Unity

Goal: Apply brand design style to all marketing materials.

Workflow:

[LoadImage] → Brand design mockup (style reference)
    ↓
[CLIPVisionEncode]
    ↓
[IPAdapterApply] → scale=0.85
    ↓
[CLIPTextEncode] → Marketing material description
    ↓
[KSampler] → Generate

Prompt Template:

{product_name}, {brand_style_description},
{scene_description},
professional commercial photography,
high resolution, brand consistent

IP-Adapter + ERNIE-Image PE Module Synergy

The built-in Prompt Enhancer (PE) module in ERNIE-Image works better when combined with IP-Adapter:

Synergy Strategy

Scene PE Setting IP-Adapter Scale
Style Exploration ON (PE) 0.6-0.7
Character Consistency OFF (PE) 0.7-0.9
Precise Control OFF (PE) 0.8-0.9

Why Turn OFF PE?

IP-Adapter already controls style and character through visual features. PE module's prompt rewriting may conflict with IP-Adapter's visual guidance.


Common Issues and Solutions

Q1: Generated character differs significantly from reference

Cause: IP-Adapter scale too low or poor reference image quality.

Solution:

  1. Increase scale to 0.8-0.9
  2. Choose clear, frontal, well-lit reference images
  3. Stack multiple IP-Adapters (front + side views)

Q2: Style transfer too strong, image looks rigid

Cause: IP-Adapter scale too high.

Solution:

  1. Reduce scale to 0.5-0.6
  2. Adjust start_at and end_at (e.g., 0.1 to 0.8)
  3. Use multiple IP-Adapters to distribute weight

Q3: IP-Adapter conflicts with ControlNet

Cause: Both inject simultaneously, causing over-control.

Solution:

  1. Use IP-Adapter for style, ControlNet for composition
  2. Reduce ControlNet strength to 0.3-0.5
  3. Step-wise approach: Generate with IP-Adapter first, then refine with ControlNet

Summary

IP-Adapter + ERNIE-Image workflow advantages:

  1. Zero training cost: No dataset preparation or LoRA training needed
  2. Instant switching: One reference image transforms the entire style
  3. Character consistency: Maintain character appearance across scenes and panels
  4. High flexibility: Adjust scale anytime to control style intensity
  5. PE module synergy: Toggle PE on/off for different effects

For comic creation, brand design, and illustration series that demand high consistency, IP-Adapter is the perfect LoRA alternative.


This workflow uses ComfyUI + ERNIE-Image 8B + IP-Adapter.

Z-Image Team

ERNIE-Image + IP-Adapter: Zero-Training Character Consistency and Style Transfer | Blog