ERNIE-Image + IP-Adapter: Zero-Training Character Consistency and Style Transfer
IP-Adapter lets a single reference image control your entire AI art style — paired with ERNIE-Image 8B, it delivers cross-scene character consistency at zero training cost.
The Character Consistency Challenge
In AI art workflows, character consistency has long been a core problem:
- The same character appears across scenes (comic panels, multi-panel stories, illustration series)
- Appearance, costume, and expressions must stay consistent
- Traditional solutions require training custom LoRA models — high cost, long timeline
LoRA Training Workflow:
Collect 15-50 character images → Clean & annotate → Train 1000-5000 steps → Validate & tune → Output .safetensors
The whole process takes days and requires technical expertise.
IP-Adapter Workflow:
Choose 1 reference image → Load IP-Adapter node → Generate → Done
The whole process takes seconds with zero training cost.
How IP-Adapter Works
IP-Adapter (Image Prompt Adapter) is a lightweight adapter that injects image prompts into text-to-image diffusion models, enabling training-free visual guidance.
Core Architecture
Reference Image
↓
CLIP Vision Encoder (visual feature extraction)
↓
Cross-Attention Injection (into diffusion model)
↓
Diffusion Model (ERNIE-Image DiT)
↓
Generated Image (inherits style/character features)
Key Technical Components
| Component | Role |
|---|---|
| CLIP Vision Encoder | Encodes reference image into visual feature vectors |
| Cross-Attention | Injects visual features into DiT attention layers |
| Scale Parameter | Controls reference image influence (0.6-1.0) |
| Base Model | ERNIE-Image 8B DiT |
IP-Adapter vs LoRA Comparison
| Dimension | IP-Adapter | LoRA |
|---|---|---|
| Training Required | No | Yes (15-50 images) |
| Prep Time | Seconds | Hours to days |
| Fidelity | Style transfer ⭐⭐⭐⭐ | Character fidelity ⭐⭐⭐⭐⭐ |
| Flexibility | Switch reference anytime | One model per character |
| VRAM Demand | Low | Extra VRAM for training |
| Best For | Quick prototyping, style exploration | High-fidelity character reproduction |
ComfyUI IP-Adapter Workflow Setup
Node Connection Diagram
[CheckPointLoader] → ERNIE-Image 8B
↓
[CLIPVisionLoader] → CLIP-ViT-L-14
↓
[IPAdapterModelLoader] → IP-Adapter weights
↓
[IPAdapterApply] → Scale: 0.8, Weight: 1.0
↓
[KSampler] → steps=28, cfg=7.0
↓
[VAEDecode] → Output Image
Complete Node Parameter Configuration
| Node | Parameter | Value | Notes |
|---|---|---|---|
| IPAdapterApply | scale | 0.6-0.9 | Style control strength |
| IPAdapterApply | weight | 0.8-1.0 | Weight factor |
| IPAdapterApply | start_at | 0.0 | Injection start step |
| IPAdapterApply | end_at | 1.0 | Injection end step |
| KSampler | steps | 20-30 | Turbo mode: 8-12 |
| KSampler | cfg | 5.0-8.0 | Prompt guidance |
| KSampler | sampler | euler_ancestral | Sampler |
| KSampler | scheduler | normal | Scheduler |
Multi IP-Adapter Stacking
When you need to control both style and character simultaneously, stack multiple IP-Adapters:
IP-Adapter 1 (style reference) → scale=0.8
IP-Adapter 2 (character reference) → scale=0.6
↓
[IPAdapterCombine] → Merge injection
↓
KSampler → Generate
Character Consistency in Practice
Scene 1: Comic Panel Consistency
Goal: Same protagonist maintains appearance across comic panels.
Workflow:
[Load Checkpoint] → ERNIE-Image
↓
[LoadImage] → Character design (front view)
↓
[CLIPVisionEncode] → Extract visual features
↓
[IPAdapterApply] → scale=0.7
↓
[CLIPTextEncode] → Panel 1 scene description
↓
[KSampler] → Generate Panel 1
↓
[CLIPTextEncode] → Panel 2 scene description (same IP-Adapter)
↓
[KSampler] → Generate Panel 2
↓
... (repeat for more panels)
Prompt Template:
# Panel 1
comic panel, {character_description} standing in {location},
{action}, {expression}, speech bubble: "{text}"
# Panel 2
comic panel, {character_description} walking through {location},
{action}, {expression}, speech bubble: "{text}"
Scene 2: Series Illustration Style Unity
Goal: Multiple illustrations of the same IP character maintain consistent art style.
Techniques:
- Style reference selection: Choose the image that best represents the target style
- Scale adjustment: 0.6 for creative freedom, 0.9 for strict adherence
- Seed consistency: Use same Seed range within batches
- Prompt structure: Keep character description consistent, vary only the scene
Style Transfer in Practice
Scene: Brand Visual Unity
Goal: Apply brand design style to all marketing materials.
Workflow:
[LoadImage] → Brand design mockup (style reference)
↓
[CLIPVisionEncode]
↓
[IPAdapterApply] → scale=0.85
↓
[CLIPTextEncode] → Marketing material description
↓
[KSampler] → Generate
Prompt Template:
{product_name}, {brand_style_description},
{scene_description},
professional commercial photography,
high resolution, brand consistent
IP-Adapter + ERNIE-Image PE Module Synergy
The built-in Prompt Enhancer (PE) module in ERNIE-Image works better when combined with IP-Adapter:
Synergy Strategy
| Scene | PE Setting | IP-Adapter Scale |
|---|---|---|
| Style Exploration | ON (PE) | 0.6-0.7 |
| Character Consistency | OFF (PE) | 0.7-0.9 |
| Precise Control | OFF (PE) | 0.8-0.9 |
Why Turn OFF PE?
IP-Adapter already controls style and character through visual features. PE module's prompt rewriting may conflict with IP-Adapter's visual guidance.
Common Issues and Solutions
Q1: Generated character differs significantly from reference
Cause: IP-Adapter scale too low or poor reference image quality.
Solution:
- Increase scale to 0.8-0.9
- Choose clear, frontal, well-lit reference images
- Stack multiple IP-Adapters (front + side views)
Q2: Style transfer too strong, image looks rigid
Cause: IP-Adapter scale too high.
Solution:
- Reduce scale to 0.5-0.6
- Adjust start_at and end_at (e.g., 0.1 to 0.8)
- Use multiple IP-Adapters to distribute weight
Q3: IP-Adapter conflicts with ControlNet
Cause: Both inject simultaneously, causing over-control.
Solution:
- Use IP-Adapter for style, ControlNet for composition
- Reduce ControlNet strength to 0.3-0.5
- Step-wise approach: Generate with IP-Adapter first, then refine with ControlNet
Summary
IP-Adapter + ERNIE-Image workflow advantages:
- Zero training cost: No dataset preparation or LoRA training needed
- Instant switching: One reference image transforms the entire style
- Character consistency: Maintain character appearance across scenes and panels
- High flexibility: Adjust scale anytime to control style intensity
- PE module synergy: Toggle PE on/off for different effects
For comic creation, brand design, and illustration series that demand high consistency, IP-Adapter is the perfect LoRA alternative.
This workflow uses ComfyUI + ERNIE-Image 8B + IP-Adapter.