AI Models Reference

Complete guide to all available AI models and their capabilities

Influencer Studio integrates the most advanced AI models in the industry. Each model is optimized for specific tasks to deliver the best possible results.

Video Generation Models

Text-to-Video Models

Kling v2.6 Pro

The industry-leading model for photorealistic video generation with native audio generation support.

Best For:

  • High-fidelity video content
  • Smooth motion and realistic physics
  • Character animations
  • Cinematic sequences
  • Videos with native audio/dialogue

Key Features:

  • Photorealistic output quality
  • Advanced motion control
  • Consistent character appearance
  • Multiple duration options (5s, 10s)
  • Aspect ratio support (16:9, 9:16, 1:1)
  • Native audio generation (English/Chinese with auto-translation)

Credits: 70 credits for 5s (audio off), 140 credits for 5s (audio on)

Veo 3.1 Fast

Google's latest state-of-the-art video generation model. Fast and cost-effective.

Best For:

  • Complex scene compositions
  • Precise camera movements
  • Controlled scene transitions
  • Professional video production

Key Features:

  • Superior scene understanding
  • Precise control over video elements
  • Excellent temporal consistency
  • Aspect ratio support (16:9, 9:16)

Credits: 80 credits per 8-second video

MiniMax Hailuo-02 Pro

Advanced video generation model with 1080p resolution.

Best For:

  • High-resolution video content
  • Professional productions
  • Marketing videos

Key Features:

  • 1080p output
  • Optional prompt optimization
  • High-quality motion

Credits: 125 credits per video

ByteDance Seedance v1.0 Pro Fast

Upgraded fast video generation model from ByteDance.

Best For:

  • Quick video generation
  • Social media content
  • Rapid iterations

Credits: Variable based on duration

ByteDance Seedance v1.0 Lite

Lightweight, fast video generation for quick iterations.

Credits: Lower cost option for faster generation

Wan 2.6

Wan 2.6 text-to-video model with enhanced quality and motion generation.

Best For:

  • High-quality video generation
  • Creative projects
  • NSFW content support

Credits: Variable based on settings

Wan 2.2 5B

Wan 2.2 5B text-to-video model with NSFW support.

Credits: ~1 credit per second of video

Mochi v1

Specialized video generation model.

Credits: Variable

Hunyuan Video

Advanced video generation from Tencent.

Credits: Variable

CogVideoX

Video generation with advanced cognitive understanding.

Credits: Variable

Image-to-Video Models

Kling v2.6 Pro (Image-to-Video)

Animate static images with photorealistic motion and native audio support.

Best For:

  • Bringing photos to life
  • Product animations
  • Character animations from photos
  • Talking head videos with native audio

Credits: 70 credits for 5s (audio off), 140 credits for 5s (audio on)

VEED Fabric 1.0

Image to talking video with built-in lip sync capabilities.

Best For:

  • Talking head videos
  • Animated portraits
  • Marketing videos with speech

Credits: Variable

OmniHuman (ByteDance)

Human image to video with audio support.

Best For:

  • Human animations
  • Talking avatars
  • Influencer content

Credits: Variable

Veo 3.1 Fast (Image-to-Video)

Convert static images to video with Google's advanced model.

Best For:

  • Scene animation
  • Product videos
  • Creative transitions

Credits: Similar to text-to-video pricing

Veo 3.1 First-Last Frame

Create videos from first and last frame keyframes.

Best For:

  • Precise video control
  • Animated transitions
  • Storyboard-based videos

Credits: Variable

ByteDance Seedance v1.0 (Image-to-Video)

Fast image-to-video generation from ByteDance.

Credits: Variable based on duration

Wan 2.6 / 2.2 5B (Image-to-Video)

Wan models for image-to-video generation with NSFW support.

Credits: ~1 credit per second of video

MiniMax Hailuo-02 Pro (Image-to-Video)

High-resolution image-to-video conversion.

Credits: 125 credits per video

Hunyuan Video (Image-to-Video)

Tencent's image-to-video model.

Credits: Variable

Luma Dream Machine Ray 2

Advanced image-to-video with ray-traced quality.

Credits: Variable

CogVideoX (Image-to-Video)

Cognitive image-to-video generation.

Credits: Variable

Framepack

Specialized frame animation model.

Credits: Variable

Magi Image-to-Video

Creative image animation model.

Credits: Variable

Image Generation Models

Text-to-Image Models

Flux-Krea

State-of-the-art photorealistic image generation model with 12 billion parameters.

Best For:

  • Ultra-realistic photographs
  • Portrait photography
  • Product photography
  • Marketing materials
  • High-detail imagery

Key Features:

  • Unparalleled photorealism
  • Exceptional detail rendering
  • Natural lighting understanding
  • Consistent character generation with LoRAs
  • Multiple aspect ratios
  • 40 inference steps for quality

Credits: ~25 credits per megapixel (e.g., 1024x1024 = 25 credits, 2048x2048 = 100 credits)

Usage Example:

{
  "prompt": "Professional headshot of a woman in business attire, studio lighting, high detail, 8k",
  "model": "flux-krea",
  "settings": {
    "aspect_ratio": "1:1",
    "num_inference_steps": 40
  }
}

Flux Ultra v1.1

The most high-resolution and best quality image generation model available. Premium quality with 2x credit cost.

Best For:

  • Ultra-high resolution images
  • Maximum quality requirements
  • Professional photography
  • Print materials

Key Features:

  • Best-in-class image quality
  • Highest resolution output
  • Professional color accuracy

Credits: 40 credits per image

Flux Schnell

Fast, high-quality image generation. Best value for quality.

Best For:

  • Quick generations
  • Batch processing
  • Cost-effective high-quality images

Key Features:

  • Fully uncensored
  • Fast 4-step inference
  • High quality output
  • Best value proposition

Credits: 24 credits per image

Flux SRPO

High aesthetic quality image generation with superior prompt adherence.

Best For:

  • Artistic content
  • Creative projects
  • High aesthetic standards

Credits: Variable based on settings

ByteDance Seedream v4

ByteDance's advanced text-to-image model with improved quality and prompt adherence.

Best For:

  • High-quality image generation
  • Complex prompts
  • Professional content

Key Features:

  • Multiple aspect ratios
  • Safety checker option
  • Configurable guidance scale
  • Seed control for reproducibility

Credits: 40 credits per image

Qwen Image

Intelligent image generation with natural language understanding.

Best For:

  • Complex prompt interpretation
  • Natural language descriptions
  • Intelligent scene composition

Credits: Variable

HiDream I1 Fast

Fast, high-quality image generation.

Best For:

  • Quick turnaround
  • Batch processing
  • Social media content

Credits: Variable

Flux Pro Kontext Max

Advanced image generation with superior context understanding.

Credits: Variable

Juggernaut Flux (Lightning & LoRA)

Fast flux-based models optimized for speed and LoRA support.

Credits: Variable

Ideogram v2 Turbo / v3

Advanced text rendering and image generation.

Best For:

  • Images with text
  • Logos and graphics
  • Marketing materials

Credits: Variable

ICLight v2

Advanced lighting control for image generation.

Best For:

  • Studio photography simulation
  • Lighting experiments
  • Product photography

Credits: Variable

Hunyuan Image v3

Tencent's advanced image generation model with excellent prompt adherence.

Credits: Variable

Nano Banana Pro (Text-to-Image)

Google's Gemini 3 Pro Image architecture for production-quality text-to-image generation with industry-leading text rendering.

Best For:

  • Marketing campaign generation
  • Product visualization workflows
  • Creative content requiring text accuracy
  • Infographic and diagram creation
  • Content with typography and text elements

Key Features:

  • Multimodal understanding through Gemini 3 Pro architecture
  • Industry-leading text rendering in multiple languages
  • Advanced semantic interpretation without prompt engineering
  • Natural language creative direction
  • Character consistency for up to 5 people
  • Resolution options: 1K, 2K, 4K
  • Multiple aspect ratios (21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16)

Credits: 22 credits per image (44 credits for 4K resolution)

Usage Example:

{
  "prompt": "Professional marketing poster with text 'Grand Opening' in elegant typography, modern design, vibrant colors",
  "model": "fal-ai/nano-banana-pro",
  "settings": {
    "aspect_ratio": "16:9",
    "resolution": "2K"
  }
}

Best Practices:

  • Use natural language descriptions for best results
  • Leverage text rendering for signs, posters, and graphics
  • Specify typography style and mood for text elements
  • Ideal for batch A/B testing with consistent quality

Image-to-Image & Editing Models

All editing models support both text-to-image and image-to-image workflows.

Seedream 4.0 Edit

Advanced AI editing with powerful manipulation capabilities.

Best For:

  • Complex image editing
  • Image modifications
  • Background replacement
  • Style transfer
  • Creative image alterations

Key Features:

  • Intelligent content understanding
  • Seamless blending
  • Context-aware editing
  • High-quality output
  • Support for multiple reference images

Credits: 12 credits per image

Usage Example:

{
  "image_url": "https://example.com/image.jpg",
  "prompt": "Replace the background with a tropical beach",
  "model": "seedream-4.0-edit"
}

Best Practices:

  • Provide clear, specific editing instructions
  • Can use multiple reference images
  • Describe desired changes in detail
  • Specify style matching requirements

Nano Banana Pro (Image-to-Image)

Google's state-of-the-art Nano Banana 2 image editing model for precise modifications using multimodal understanding.

Best For:

  • High-quality edits and adjustments
  • Multi-image editing (up to 14 images)
  • Professional image modifications
  • Advanced image transformations
  • Context-aware image blending

Key Features:

  • Multimodal semantic understanding
  • Multi-image support (up to 14 images)
  • Precise control
  • Professional results
  • Natural language editing instructions

Credits: 22 credits per image

Usage Example:

{
  "image_url": "https://example.com/image.jpg",
  "prompt": "Adjust lighting to golden hour",
  "model": "nano-banana-edit"
}

Qwen Image Edit

AI editing with natural language control and intelligence.

Best For:

  • Conversational editing commands
  • Complex multi-step edits
  • Intelligent scene understanding
  • Context-aware modifications

Key Features:

  • Natural language processing
  • Intelligent interpretation
  • Smart object recognition

Credits: Variable

Usage Example:

{
  "image_url": "https://example.com/image.jpg",
  "prompt": "Make the person smile more and add warmer lighting",
  "model": "qwen-edit"
}

Flux Krea Image-to-Image

Use Flux-Krea's powerful model for image-to-image transformations.

Credits: Same as text-to-image (~25 credits per megapixel)

Flux SRPO Image-to-Image

High aesthetic quality image editing and transformation.

Credits: Variable

3D & Text Models

Meshy

3D model generation from text or images.

Best For:

  • Text-to-3D generation
  • Image-to-3D conversion
  • 3D asset creation
  • Product visualization
  • Game asset generation

Key Features:

  • Text-to-3D synthesis
  • Image-to-3D conversion
  • Multiple export formats
  • Optimized geometry
  • Texture generation

Usage Example:

{
  "prompt": "A modern wooden chair with metal legs",
  "model": "meshy",
  "settings": {
    "output_format": "obj",
    "texture_resolution": "2k"
  }
}

Supported Formats:

  • OBJ
  • FBX
  • GLB/GLTF
  • STL

Audio Generation Models

Sonic-3

State-of-the-art voice synthesis and audio generation.

Best For:

  • Voiceovers and narration
  • Character voices
  • Podcast audio
  • Marketing videos
  • Talking head videos (with lip sync)

Key Features:

  • Natural-sounding speech
  • Multiple voice options
  • Emotion and tone control
  • Multiple languages
  • Professional audio quality
  • Pronunciation control

Usage Example:

{
  "text": "Welcome to Influencer Studio, where AI meets creativity.",
  "model": "sonic-3",
  "settings": {
    "voice": "professional-female",
    "emotion": "enthusiastic",
    "speed": 1.0
  }
}

Available Voice Profiles:

  • Professional Male/Female
  • Casual/Conversational
  • Energetic/Enthusiastic
  • Calm/Soothing
  • Character voices
  • Multiple accents and languages

Best Practices:

  • Use proper punctuation for natural pauses
  • Specify emotion and tone for better results
  • Choose appropriate voice profile for your content
  • Adjust speed for different content types

Model Selection Guide

Choosing the Right Model

For Maximum Quality:

  • Images: Flux-Krea
  • Videos: Kling or Veo 3.1
  • Audio: Sonic-3
  • 3D: Meshy

For Speed & Efficiency:

  • Quick edits: Nano Banana Pro
  • Fast iterations: Finetuned Flux models
  • Video editing: Veed

For Creative Control:

  • Complex edits: Seedream 4.0
  • Natural language edits: Qwen
  • Precise video control: Veo 3.1

For Character Consistency:

  • Train a custom LoRA (influencer)
  • Use Flux-Krea with your LoRA
  • Generate thousands of consistent images

Influencer Training (LoRA Models)

Train custom character models (influencers) with consistent appearance across all generations.

What is Influencer Training?

Influencer training creates a custom LoRA (Low-Rank Adaptation) model that learns the unique characteristics of a person or character from a set of reference photos. Once trained, this model can generate unlimited consistent images of that character in any pose, outfit, location, or scenario.

Training Process

Requirements:

  • 8-20 high-quality photos of the same person
  • Clear, well-lit images
  • Variety of angles and expressions
  • Consistent lighting preferred
  • Photos should show the face clearly

Training Time:

  • Typically 15-45 minutes depending on dataset size
  • Training happens in the background
  • You'll be notified when complete

Best Practices:

  1. Photo Quality: Use high-resolution, clear photos (minimum 512x512, preferably 1024x1024 or higher)
  2. Variety: Include different angles: front, side, 3/4 view
  3. Expressions: Mix of neutral and smiling expressions
  4. Consistency: Same person throughout all training images
  5. Backgrounds: Variety is good, but person should be the focus
  6. Clothing: Different outfits help the model learn the person, not just specific clothes
  7. Avoid: Blurry images, heavy filters, multiple people in frame, sunglasses/hats covering face

Using Trained Influencers

Once trained, you can use your influencer in any image generation:

{
  "prompt": "A professional headshot in business attire, studio lighting",
  "model": "flux-krea",
  "influencer_id": "your_influencer_id",
  "settings": {
    "aspect_ratio": "1:1"
  }
}

The influencer will appear in the generated image while following your prompt's instructions for pose, clothing, location, and style.

Influencer Consistency

Trained influencers maintain:

  • Facial features and structure
  • Skin tone and texture
  • Eye color and shape
  • Hair color and style (though you can change these via prompts)
  • Overall appearance and identity

You can still customize:

  • Poses and expressions
  • Clothing and outfits
  • Locations and backgrounds
  • Lighting and mood
  • Artistic style

API Availability

Note: Influencer training is currently not available through the REST API. You can only train influencers through the web interface at app.influencerstudio.com.

However, once trained, you can use your influencers via the API by including the influencer_id parameter in image generation requests.

To get your influencer IDs:

  1. Train influencers via the web interface
  2. Find your influencer ID in the influencer management panel
  3. Use the ID in API requests for consistent character generation

Training Credits

Training an influencer consumes credits based on:

  • Number of training images
  • Training duration
  • Model complexity

Check the web interface for current training costs.

Model Pricing

Credits are consumed per generation. Pricing is fixed per operation:

Image Generation (Text-to-Image):

  • Flux-Krea: ~25 credits per megapixel (1024x1024 = 25 credits, 2048x2048 = 100 credits)
  • Flux Ultra v1.1: 40 credits per image
  • Flux Schnell: 24 credits per image
  • ByteDance Seedream v4: 40 credits per image
  • Other models: 24-40 credits per image typically

Image Editing (Image-to-Image):

  • Seedream 4.0 Edit: 12 credits per image
  • Nano Banana Pro: 22 credits per image (44 for 4K)
  • Qwen Edit: Variable
  • Flux-based editing: Similar to text-to-image pricing

Video Generation (Text-to-Video):

  • Kling v2.6 Pro: 70 credits for 5s (audio off), 140 credits for 5s (audio on)
  • Veo 3.1 Fast: 80 credits per 8s video
  • MiniMax Hailuo-02 Pro: 125 credits per video
  • Wan 2.2 5B: ~1 credit per second
  • Other models: Variable based on duration and quality

Video Generation (Image-to-Video):

  • Kling v2.6 Pro: 70 credits for 5s (audio off), 140 credits for 5s (audio on)
  • Similar pricing structure to text-to-video for most models

3D Generation:

  • Meshy: Variable based on complexity

Audio Generation:

  • Sonic-3: Variable based on length

LoRA Training:

  • Custom model training: Variable based on dataset size

API Access

All models are available through our REST API:

Model Updates

We continuously update and improve our models. Check our changelog for:

  • New model releases
  • Performance improvements
  • Feature additions
  • Deprecated models

Support

Need help choosing the right model?

  • Check the API documentation for detailed parameters
  • Try different models in the web interface
  • Contact support for recommendations
  • Review example use cases in our gallery