Cinematography for AI Video Creators

Most AI video prompts fail in the same place. The subject is fine. The action is fine. What's missing is the language a director or DP would use to describe how the camera sees it. That's the gap this guide fills.
S

Stefan

Why Cinematography Keywords Matter

Most AI video prompts fail in the same place. The subject is fine. The action is fine. What's missing is the language a director or DP would use to describe how the camera sees it. That's the gap this guide fills.

AI video models are trained on enormous libraries of film, TV, advertising, and stock footage tagged with the technical vocabulary of the trade. Phrases like "low angle tracking shot" or "shallow depth of field with bokeh" map onto huge clusters of training examples. When you use that vocabulary, you give the model a direct route to a specific visual outcome.

Prompts written in everyday language give the model a thousand directions to wander. Prompts written in cinematography language give it one.

Shot Sizes

Shot size is how much of the subject and scene fits in the frame. It is the single most powerful framing decision you can make in a prompt, and the easiest one to get wrong by leaving it out.

Shot Size

Subject

Use Cases

Extreme Wide Shot (EWS)

Tiny within the environment

Setting scale, isolation, establishing place

Wide Shot (WS)

Full body with environment around it

Action, dance, full body product shots

Medium Shot (MS)

Waist up

Dialogue, presenter-style content

Medium Close-Up (MCU)

Chest up

Interviews, conversational content with hand gestures

Close-Up (CU)

Face filling most of the frame

Emotional beats, reaction shots, thumbnails

Extreme Close-Up (ECU)

A single feature. Eyes, lips, product detail

Texture, detail, drama

Pro tip: If the model gives you a shot at the wrong distance, naming the shot size explicitly tends to fix it faster than describing the size with adjectives.

Camera Angles

Angle refers to the position of the camera in relation to the subject in your video. It changes the emotional reading of the shot before any other element does.

You'll get a sense of how these can be mixed and matched in scenes with practice, but here's the basics:

Angle

Camera Position

Emotional Effect

Eye Level

Camera at subject's eye height

Neutral, conversational, equal footing

Low Angle

Camera below subject, pointing up

Powerful, dominant, heroic

High Angle

Camera above subject, pointing down

Small, vulnerable, observed

Dutch Angle

Camera tilted off horizontal

Unease, chaos, disorientation

Bird's Eye View

Camera directly overhead

Pattern, geometry, god-like detachment

Worm's Eye View

Camera at ground level, pointing straight up

Towering, surreal, dramatic

Over-the-Shoulder (OTS)

Camera behind one subject, facing another

Conversation, intimacy, perspective

Check out how just changing the camera angle makes a scene feel completely different, and the subject feels more vs. less in control of the scene.

High angle:

Low/medium angle:

Pro tip: If the model gives you a shot at the wrong distance, naming the shot size explicitly tends to fix it faster than describing the size with adjectives. With Influencer Studio's Cinema Mode, you don't have to worry about that because you can select it as one of the settings on the shot.

Camera Movement

From the standpoint of technical capabilities, movement is where AI video has improved most dramatically over the last 18 months. Prompting precisely is the difference between a static scene with a wobble and a shot that feels directed and intentional.

Movement

What it does

Static

Camera locked off. No movement.

Slow push in

Camera physically moves toward the subject. Builds tension or intimacy.

Slow pull out

Camera physically moves away from the subject. Creates scale or separation.

Zoom in

Optical, not physical. Lens tightens. Reads as tension or unease.

Zoom out

Optical, not physical. Lens widens. Reads as reveal or detachment.

Pan left

Camera rotates horizontally left from a fixed point.

Pan right

Camera rotates horizontally right from a fixed point.

Tilt up

Camera rotates vertically upward from a fixed point.

Tilt down

Camera rotates vertically downward from a fixed point.

Orbit left

Camera circles the subject to the left. Good for reveals and product shots.

Orbit right

Camera circles the subject to the right. Good for reveals and product shots.

Handheld

Slight natural imperfection in movement. Documentary feel, intimacy, urgency.

Something as simple as whether the camera is pushing in on the subject is pulling away from them turns an intense scene into an expositional reference.

Adding or removing elements of the surroundings gives you control over what the viewer sees, but also what the viewer feels.

Take a look at this scene where the camera closes in on our subject:

And compare it with this one, where the camera is moving away:

Can you sense how one makes you focus on the moment while the other makes you focus on the context around the moment?

Pro tip: One movement per prompt. Stacking competing instructions like "slow push in with a pan left and handheld movement" gives the model nowhere to go, and it will have to guess which movement to use. Pick the one that serves the shot.

Framing and Composition

Composition is how elements are arranged inside the frame. AI models respond well to a small set of named compositional rules. You can experiment with these and discover how some overlap, but here are some general expectations you can have.

Technique

What it does

Rule of thirds

Subject placed on a third line rather than dead center. Naturally pleasing, leaves room for context.

Centered composition

Subject perfectly centered. Formal, intentional. Wes Anderson or Kubrick references land well here.

Leading lines

Roads, hallways, fences, light beams that draw the eye toward the subject.

Frame within a frame

Subject seen through a doorway, window, or archway. Adds depth and context.

Negative space

Large empty area around the subject. Loneliness, scale, or room for a text overlay in social content.

Foreground/midground/background

Three layers of depth. Naming all three in a prompt almost always produces a more dimensional shot.

Composition is arguably the most "artistic" choice to make about a scene. Many directors make a certain type of composition their trademark and once you see it, it's impossible to miss.

While most of the other decisions about a scene have relatively predictable outcomes (wide shots lead to an epic feeling and so on), framing is a bit ambiguous. The same sequence, framed differently, can either ground the action or completely disorient the viewer (sometimes both).

If we put the subject dead center in the framing, the viewer understands what the focus of the scene is and what the shot is about.

But if you leave enough space around the subject, there's no clear focus and no obvious focal point for the scene.

Pro tip: When you want the model to leave room for a logo, caption, or product overlay in social ad creatives, say so! Try including something like "negative space on the right for text overlay" in your prompt.

Depth of Field

Depth of field is what separates a shot that looks like a phone clip and a shot that looks like it was taken on a cine lens. AI models handle this concept well when you name it explicitly.

Technique

What it does

Shallow depth of field

Narrow plane of focus. Background blurs into bokeh. Cinematic, intimate, isolates the subject.

Deep depth of field

Foreground and background both sharp. Documentary, landscape, environmental.

Bokeh

The quality of the out-of-focus areas. Specify "creamy bokeh" or "anamorphic bokeh" for character.

Rack focus

Focus shifts from one subject to another within the same shot. A directed reveal.

Macro focus

Extreme close focus on tiny detail. Pollen, water droplets, fabric weave.

Depth of field is less about aesthetics and more about attention. It tells the viewer where to look... and just as importantly, what to ignore. A shallow depth of field collapses the world down to one thing (useful for when you want viewers to focus on you product!). A deep depth of field says everything in front of you is relevant.

Lighting

Lighting often goes unnoticed because we're just so used to seeing certain standard lighting setups. Often, you don't even realize how lighting is adding to the emotional weight of a scene.

However, it makes all the difference when you get it wrong. Especially with AI video, lighting is the thing that sets apart slop from engaging cinematography.

Most AI models have a strong default style, often slightly oversaturated and evenly lit. Naming a lighting setup overrides that default and can make your scene lose that "AI look" and feel.

Setup

What it does

Three-point lighting

Key light, fill light, backlight. The studio standard. Clean, professional, talking head ready.

Rembrandt lighting

Key light at 45 degrees creating a small triangle of light on the shadow-side cheek. Classic portrait look.

Golden hour

Warm, low-angle sunlight in the hour after sunrise or before sunset. Soft, flattering, cinematic shorthand for emotion.

Blue hour

The brief window after sunset. Cool, moody, dusk atmosphere.

Practical lighting

Light sources visible in the frame: lamps, neon signs, candles, screens. Realism and atmosphere.

Backlight / rim light

Light source behind the subject. Creates a halo and separates the subject from the background.

Silhouette

Subject completely dark against a bright background. Mood, mystery, anonymity.

High key

Bright, low contrast, evenly lit. Beauty, fashion, comedy.

Low key

Dark, high contrast, deep shadows. Drama, thriller, noir.

Chiaroscuro

Strong contrast between dark and light within the same frame. Painterly, Caravaggio-adjacent.

Hard light

Produces sharp-edged shadows. Directional, dramatic, unforgiving.

Soft light

Wraps around the subject. Flattering, natural, commercial.

Of all the cinematography decisions you can make, lighting has the most immediate impact on mood. The same subject, the same framing, the same movement; lit differently, you have a completely different scene.

Check out this shot in standard three-point lighting:

Compare that with a shot with harsh overhead lighting:

The difference isn't subtle. Three-point lighting says: this person is in control, this is a professional environment, trust what they're telling you. Harsh overhead lighting says something is wrong. Same person, same room, completely different story.

Pro tip: When a generation comes out flat, the issue is usually lighting direction, not lighting amount. Specify where the key light is coming from. "Key light from camera left at 45 degrees" gives the model a clear instruction.

Putting It Together: The Cinematography Stack

Every section of this guide covers one decision.

When you combine them in a single prompt, the results improve dramatically. Think of it as a checklist you run through before you commit to a generation.

Start with shot size.

How far is the camera from the subject? Then decide on an angle. Then movement. Then think about where the subject sits in the frame, what the lighting is doing, what is in focus, and finally, whether a film reference would push the look in a specific direction.

A prompt that answers all seven of those questions will outperform one that answers two or three, almost every time. Not because the model needs the information but because the specificity removes the guesswork, and guesswork is where AI video goes wrong.