- DocumentationGetting StartedSetup & AccessAI ModelsAPI Reference
AI Models Reference
Complete guide to all available AI models and their capabilities
On this page
Influencer Studio integrates the most advanced AI models in the industry. Each model is optimized for specific tasks to deliver the best possible results.
Video Generation Models
Text-to-Video Models
Kling v2.6 Pro
The industry-leading model for photorealistic video generation with native audio generation support.
Best For:
- High-fidelity video content
- Smooth motion and realistic physics
- Character animations
- Cinematic sequences
- Videos with native audio/dialogue
Key Features:
- Photorealistic output quality
- Advanced motion control
- Consistent character appearance
- Multiple duration options (5s, 10s)
- Aspect ratio support (16:9, 9:16, 1:1)
- Native audio generation (English/Chinese with auto-translation)
Credits: 70 credits for 5s (audio off), 140 credits for 5s (audio on)
Veo 3.1 Fast
Google's latest state-of-the-art video generation model. Fast and cost-effective.
Best For:
- Complex scene compositions
- Precise camera movements
- Controlled scene transitions
- Professional video production
Key Features:
- Superior scene understanding
- Precise control over video elements
- Excellent temporal consistency
- Aspect ratio support (16:9, 9:16)
Credits: 80 credits per 8-second video
MiniMax Hailuo-02 Pro
Advanced video generation model with 1080p resolution.
Best For:
- High-resolution video content
- Professional productions
- Marketing videos
Key Features:
- 1080p output
- Optional prompt optimization
- High-quality motion
Credits: 125 credits per video
ByteDance Seedance v1.0 Pro Fast
Upgraded fast video generation model from ByteDance.
Best For:
- Quick video generation
- Social media content
- Rapid iterations
Credits: Variable based on duration
ByteDance Seedance v1.0 Lite
Lightweight, fast video generation for quick iterations.
Credits: Lower cost option for faster generation
Wan 2.6
Wan 2.6 text-to-video model with enhanced quality and motion generation.
Best For:
- High-quality video generation
- Creative projects
- NSFW content support
Credits: Variable based on settings
Wan 2.2 5B
Wan 2.2 5B text-to-video model with NSFW support.
Credits: ~1 credit per second of video
Mochi v1
Specialized video generation model.
Credits: Variable
Hunyuan Video
Advanced video generation from Tencent.
Credits: Variable
CogVideoX
Video generation with advanced cognitive understanding.
Credits: Variable
Image-to-Video Models
Kling v2.6 Pro (Image-to-Video)
Animate static images with photorealistic motion and native audio support.
Best For:
- Bringing photos to life
- Product animations
- Character animations from photos
- Talking head videos with native audio
Credits: 70 credits for 5s (audio off), 140 credits for 5s (audio on)
VEED Fabric 1.0
Image to talking video with built-in lip sync capabilities.
Best For:
- Talking head videos
- Animated portraits
- Marketing videos with speech
Credits: Variable
OmniHuman (ByteDance)
Human image to video with audio support.
Best For:
- Human animations
- Talking avatars
- Influencer content
Credits: Variable
Veo 3.1 Fast (Image-to-Video)
Convert static images to video with Google's advanced model.
Best For:
- Scene animation
- Product videos
- Creative transitions
Credits: Similar to text-to-video pricing
Veo 3.1 First-Last Frame
Create videos from first and last frame keyframes.
Best For:
- Precise video control
- Animated transitions
- Storyboard-based videos
Credits: Variable
ByteDance Seedance v1.0 (Image-to-Video)
Fast image-to-video generation from ByteDance.
Credits: Variable based on duration
Wan 2.6 / 2.2 5B (Image-to-Video)
Wan models for image-to-video generation with NSFW support.
Credits: ~1 credit per second of video
MiniMax Hailuo-02 Pro (Image-to-Video)
High-resolution image-to-video conversion.
Credits: 125 credits per video
Hunyuan Video (Image-to-Video)
Tencent's image-to-video model.
Credits: Variable
Luma Dream Machine Ray 2
Advanced image-to-video with ray-traced quality.
Credits: Variable
CogVideoX (Image-to-Video)
Cognitive image-to-video generation.
Credits: Variable
Framepack
Specialized frame animation model.
Credits: Variable
Magi Image-to-Video
Creative image animation model.
Credits: Variable
Image Generation Models
Text-to-Image Models
Flux-Krea
State-of-the-art photorealistic image generation model with 12 billion parameters.
Best For:
- Ultra-realistic photographs
- Portrait photography
- Product photography
- Marketing materials
- High-detail imagery
Key Features:
- Unparalleled photorealism
- Exceptional detail rendering
- Natural lighting understanding
- Consistent character generation with LoRAs
- Multiple aspect ratios
- 40 inference steps for quality
Credits: ~25 credits per megapixel (e.g., 1024x1024 = 25 credits, 2048x2048 = 100 credits)
Usage Example:
{
"prompt": "Professional headshot of a woman in business attire, studio lighting, high detail, 8k",
"model": "flux-krea",
"settings": {
"aspect_ratio": "1:1",
"num_inference_steps": 40
}
}
Flux Ultra v1.1
The most high-resolution and best quality image generation model available. Premium quality with 2x credit cost.
Best For:
- Ultra-high resolution images
- Maximum quality requirements
- Professional photography
- Print materials
Key Features:
- Best-in-class image quality
- Highest resolution output
- Professional color accuracy
Credits: 40 credits per image
Flux Schnell
Fast, high-quality image generation. Best value for quality.
Best For:
- Quick generations
- Batch processing
- Cost-effective high-quality images
Key Features:
- Fully uncensored
- Fast 4-step inference
- High quality output
- Best value proposition
Credits: 24 credits per image
Flux SRPO
High aesthetic quality image generation with superior prompt adherence.
Best For:
- Artistic content
- Creative projects
- High aesthetic standards
Credits: Variable based on settings
ByteDance Seedream v4
ByteDance's advanced text-to-image model with improved quality and prompt adherence.
Best For:
- High-quality image generation
- Complex prompts
- Professional content
Key Features:
- Multiple aspect ratios
- Safety checker option
- Configurable guidance scale
- Seed control for reproducibility
Credits: 40 credits per image
Qwen Image
Intelligent image generation with natural language understanding.
Best For:
- Complex prompt interpretation
- Natural language descriptions
- Intelligent scene composition
Credits: Variable
HiDream I1 Fast
Fast, high-quality image generation.
Best For:
- Quick turnaround
- Batch processing
- Social media content
Credits: Variable
Flux Pro Kontext Max
Advanced image generation with superior context understanding.
Credits: Variable
Juggernaut Flux (Lightning & LoRA)
Fast flux-based models optimized for speed and LoRA support.
Credits: Variable
Ideogram v2 Turbo / v3
Advanced text rendering and image generation.
Best For:
- Images with text
- Logos and graphics
- Marketing materials
Credits: Variable
ICLight v2
Advanced lighting control for image generation.
Best For:
- Studio photography simulation
- Lighting experiments
- Product photography
Credits: Variable
Hunyuan Image v3
Tencent's advanced image generation model with excellent prompt adherence.
Credits: Variable
Nano Banana Pro (Text-to-Image)
Google's Gemini 3 Pro Image architecture for production-quality text-to-image generation with industry-leading text rendering.
Best For:
- Marketing campaign generation
- Product visualization workflows
- Creative content requiring text accuracy
- Infographic and diagram creation
- Content with typography and text elements
Key Features:
- Multimodal understanding through Gemini 3 Pro architecture
- Industry-leading text rendering in multiple languages
- Advanced semantic interpretation without prompt engineering
- Natural language creative direction
- Character consistency for up to 5 people
- Resolution options: 1K, 2K, 4K
- Multiple aspect ratios (21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16)
Credits: 22 credits per image (44 credits for 4K resolution)
Usage Example:
{
"prompt": "Professional marketing poster with text 'Grand Opening' in elegant typography, modern design, vibrant colors",
"model": "fal-ai/nano-banana-pro",
"settings": {
"aspect_ratio": "16:9",
"resolution": "2K"
}
}
Best Practices:
- Use natural language descriptions for best results
- Leverage text rendering for signs, posters, and graphics
- Specify typography style and mood for text elements
- Ideal for batch A/B testing with consistent quality
Image-to-Image & Editing Models
All editing models support both text-to-image and image-to-image workflows.
Seedream 4.0 Edit
Advanced AI editing with powerful manipulation capabilities.
Best For:
- Complex image editing
- Image modifications
- Background replacement
- Style transfer
- Creative image alterations
Key Features:
- Intelligent content understanding
- Seamless blending
- Context-aware editing
- High-quality output
- Support for multiple reference images
Credits: 12 credits per image
Usage Example:
{
"image_url": "https://example.com/image.jpg",
"prompt": "Replace the background with a tropical beach",
"model": "seedream-4.0-edit"
}
Best Practices:
- Provide clear, specific editing instructions
- Can use multiple reference images
- Describe desired changes in detail
- Specify style matching requirements
Nano Banana Pro (Image-to-Image)
Google's state-of-the-art Nano Banana 2 image editing model for precise modifications using multimodal understanding.
Best For:
- High-quality edits and adjustments
- Multi-image editing (up to 14 images)
- Professional image modifications
- Advanced image transformations
- Context-aware image blending
Key Features:
- Multimodal semantic understanding
- Multi-image support (up to 14 images)
- Precise control
- Professional results
- Natural language editing instructions
Credits: 22 credits per image
Usage Example:
{
"image_url": "https://example.com/image.jpg",
"prompt": "Adjust lighting to golden hour",
"model": "nano-banana-edit"
}
Qwen Image Edit
AI editing with natural language control and intelligence.
Best For:
- Conversational editing commands
- Complex multi-step edits
- Intelligent scene understanding
- Context-aware modifications
Key Features:
- Natural language processing
- Intelligent interpretation
- Smart object recognition
Credits: Variable
Usage Example:
{
"image_url": "https://example.com/image.jpg",
"prompt": "Make the person smile more and add warmer lighting",
"model": "qwen-edit"
}
Flux Krea Image-to-Image
Use Flux-Krea's powerful model for image-to-image transformations.
Credits: Same as text-to-image (~25 credits per megapixel)
Flux SRPO Image-to-Image
High aesthetic quality image editing and transformation.
Credits: Variable
3D & Text Models
Meshy
3D model generation from text or images.
Best For:
- Text-to-3D generation
- Image-to-3D conversion
- 3D asset creation
- Product visualization
- Game asset generation
Key Features:
- Text-to-3D synthesis
- Image-to-3D conversion
- Multiple export formats
- Optimized geometry
- Texture generation
Usage Example:
{
"prompt": "A modern wooden chair with metal legs",
"model": "meshy",
"settings": {
"output_format": "obj",
"texture_resolution": "2k"
}
}
Supported Formats:
- OBJ
- FBX
- GLB/GLTF
- STL
Audio Generation Models
Sonic-3
State-of-the-art voice synthesis and audio generation.
Best For:
- Voiceovers and narration
- Character voices
- Podcast audio
- Marketing videos
- Talking head videos (with lip sync)
Key Features:
- Natural-sounding speech
- Multiple voice options
- Emotion and tone control
- Multiple languages
- Professional audio quality
- Pronunciation control
Usage Example:
{
"text": "Welcome to Influencer Studio, where AI meets creativity.",
"model": "sonic-3",
"settings": {
"voice": "professional-female",
"emotion": "enthusiastic",
"speed": 1.0
}
}
Available Voice Profiles:
- Professional Male/Female
- Casual/Conversational
- Energetic/Enthusiastic
- Calm/Soothing
- Character voices
- Multiple accents and languages
Best Practices:
- Use proper punctuation for natural pauses
- Specify emotion and tone for better results
- Choose appropriate voice profile for your content
- Adjust speed for different content types
Model Selection Guide
Choosing the Right Model
For Maximum Quality:
- Images: Flux-Krea
- Videos: Kling or Veo 3.1
- Audio: Sonic-3
- 3D: Meshy
For Speed & Efficiency:
- Quick edits: Nano Banana Pro
- Fast iterations: Finetuned Flux models
- Video editing: Veed
For Creative Control:
- Complex edits: Seedream 4.0
- Natural language edits: Qwen
- Precise video control: Veo 3.1
For Character Consistency:
- Train a custom LoRA (influencer)
- Use Flux-Krea with your LoRA
- Generate thousands of consistent images
Influencer Training (LoRA Models)
Train custom character models (influencers) with consistent appearance across all generations.
What is Influencer Training?
Influencer training creates a custom LoRA (Low-Rank Adaptation) model that learns the unique characteristics of a person or character from a set of reference photos. Once trained, this model can generate unlimited consistent images of that character in any pose, outfit, location, or scenario.
Training Process
Requirements:
- 8-20 high-quality photos of the same person
- Clear, well-lit images
- Variety of angles and expressions
- Consistent lighting preferred
- Photos should show the face clearly
Training Time:
- Typically 15-45 minutes depending on dataset size
- Training happens in the background
- You'll be notified when complete
Best Practices:
- Photo Quality: Use high-resolution, clear photos (minimum 512x512, preferably 1024x1024 or higher)
- Variety: Include different angles: front, side, 3/4 view
- Expressions: Mix of neutral and smiling expressions
- Consistency: Same person throughout all training images
- Backgrounds: Variety is good, but person should be the focus
- Clothing: Different outfits help the model learn the person, not just specific clothes
- Avoid: Blurry images, heavy filters, multiple people in frame, sunglasses/hats covering face
Using Trained Influencers
Once trained, you can use your influencer in any image generation:
{
"prompt": "A professional headshot in business attire, studio lighting",
"model": "flux-krea",
"influencer_id": "your_influencer_id",
"settings": {
"aspect_ratio": "1:1"
}
}
The influencer will appear in the generated image while following your prompt's instructions for pose, clothing, location, and style.
Influencer Consistency
Trained influencers maintain:
- Facial features and structure
- Skin tone and texture
- Eye color and shape
- Hair color and style (though you can change these via prompts)
- Overall appearance and identity
You can still customize:
- Poses and expressions
- Clothing and outfits
- Locations and backgrounds
- Lighting and mood
- Artistic style
API Availability
Note: Influencer training is currently not available through the REST API. You can only train influencers through the web interface at app.influencerstudio.com.
However, once trained, you can use your influencers via the API by including the influencer_id parameter in image generation requests.
To get your influencer IDs:
- Train influencers via the web interface
- Find your influencer ID in the influencer management panel
- Use the ID in API requests for consistent character generation
Training Credits
Training an influencer consumes credits based on:
- Number of training images
- Training duration
- Model complexity
Check the web interface for current training costs.
Model Pricing
Credits are consumed per generation. Pricing is fixed per operation:
Image Generation (Text-to-Image):
- Flux-Krea: ~25 credits per megapixel (1024x1024 = 25 credits, 2048x2048 = 100 credits)
- Flux Ultra v1.1: 40 credits per image
- Flux Schnell: 24 credits per image
- ByteDance Seedream v4: 40 credits per image
- Other models: 24-40 credits per image typically
Image Editing (Image-to-Image):
- Seedream 4.0 Edit: 12 credits per image
- Nano Banana Pro: 22 credits per image (44 for 4K)
- Qwen Edit: Variable
- Flux-based editing: Similar to text-to-image pricing
Video Generation (Text-to-Video):
- Kling v2.6 Pro: 70 credits for 5s (audio off), 140 credits for 5s (audio on)
- Veo 3.1 Fast: 80 credits per 8s video
- MiniMax Hailuo-02 Pro: 125 credits per video
- Wan 2.2 5B: ~1 credit per second
- Other models: Variable based on duration and quality
Video Generation (Image-to-Video):
- Kling v2.6 Pro: 70 credits for 5s (audio off), 140 credits for 5s (audio on)
- Similar pricing structure to text-to-video for most models
3D Generation:
- Meshy: Variable based on complexity
Audio Generation:
- Sonic-3: Variable based on length
LoRA Training:
- Custom model training: Variable based on dataset size
API Access
All models are available through our REST API:
Model Updates
We continuously update and improve our models. Check our changelog for:
- New model releases
- Performance improvements
- Feature additions
- Deprecated models
Support
Need help choosing the right model?
- Check the API documentation for detailed parameters
- Try different models in the web interface
- Contact support for recommendations
- Review example use cases in our gallery