Why Cinematography Keywords Matter
Most AI video prompts fail in the same place. The subject is fine. The action is fine. What's missing is the language a director or DP would use to describe how the camera sees it. That's the gap this guide fills.
AI video models are trained on enormous libraries of film, TV, advertising, and stock footage tagged with the technical vocabulary of the trade. Phrases like "low angle tracking shot" or "shallow depth of field with bokeh" map onto huge clusters of training examples. When you use that vocabulary, you give the model a direct route to a specific visual outcome.
Prompts written in everyday language give the model a thousand directions to wander. Prompts written in cinematography language give it one.
Shot Sizes
Shot size is how much of the subject and scene fits in the frame. It is the single most powerful framing decision you can make in a prompt, and the easiest one to get wrong by leaving it out.
Shot Size | Subject | Use Cases |
Extreme Wide Shot (EWS) | Tiny within the environment | Setting scale, isolation, establishing place |
Wide Shot (WS) | Full body with environment around it | Action, dance, full body product shots |
Medium Shot (MS) | Waist up | Dialogue, presenter-style content |
Medium Close-Up (MCU) | Chest up | Interviews, conversational content with hand gestures |
Close-Up (CU) | Face filling most of the frame | Emotional beats, reaction shots, thumbnails |
Extreme Close-Up (ECU) | A single feature. Eyes, lips, product detail | Texture, detail, drama |
Pro tip: If the model gives you a shot at the wrong distance, naming the shot size explicitly tends to fix it faster than describing the size with adjectives.
Camera Angles
Angle refers to the position of the camera in relation to the subject in your video. It changes the emotional reading of the shot before any other element does.
You'll get a sense of how these can be mixed and matched in scenes with practice, but here's the basics:
Angle | Camera Position | Emotional Effect |
Eye Level | Camera at subject's eye height | Neutral, conversational, equal footing |
Low Angle | Camera below subject, pointing up | Powerful, dominant, heroic |
High Angle | Camera above subject, pointing down | Small, vulnerable, observed |
Dutch Angle | Camera tilted off horizontal | Unease, chaos, disorientation |
Bird's Eye View | Camera directly overhead | Pattern, geometry, god-like detachment |
Worm's Eye View | Camera at ground level, pointing straight up | Towering, surreal, dramatic |
Over-the-Shoulder (OTS) | Camera behind one subject, facing another | Conversation, intimacy, perspective |
Check out how just changing the camera angle makes a scene feel completely different, and the subject feels more vs. less in control of the scene.
High angle:
Low/medium angle:
Pro tip: If the model gives you a shot at the wrong distance, naming the shot size explicitly tends to fix it faster than describing the size with adjectives. With Influencer Studio's Cinema Mode, you don't have to worry about that because you can select it as one of the settings on the shot.
Camera Movement
From the standpoint of technical capabilities, movement is where AI video has improved most dramatically over the last 18 months. Prompting precisely is the difference between a static scene with a wobble and a shot that feels directed and intentional.
Movement | What it does |
|---|---|
Static | Camera locked off. No movement. |
Slow push in | Camera physically moves toward the subject. Builds tension or intimacy. |
Slow pull out | Camera physically moves away from the subject. Creates scale or separation. |
Zoom in | Optical, not physical. Lens tightens. Reads as tension or unease. |
Zoom out | Optical, not physical. Lens widens. Reads as reveal or detachment. |
Pan left | Camera rotates horizontally left from a fixed point. |
Pan right | Camera rotates horizontally right from a fixed point. |
Tilt up | Camera rotates vertically upward from a fixed point. |
Tilt down | Camera rotates vertically downward from a fixed point. |
Orbit left | Camera circles the subject to the left. Good for reveals and product shots. |
Orbit right | Camera circles the subject to the right. Good for reveals and product shots. |
Handheld | Slight natural imperfection in movement. Documentary feel, intimacy, urgency. |
Something as simple as whether the camera is pushing in on the subject is pulling away from them turns an intense scene into an expositional reference.
Adding or removing elements of the surroundings gives you control over what the viewer sees, but also what the viewer feels.
Take a look at this scene where the camera closes in on our subject:
And compare it with this one, where the camera is moving away:
Can you sense how one makes you focus on the moment while the other makes you focus on the context around the moment?
Pro tip: One movement per prompt. Stacking competing instructions like "slow push in with a pan left and handheld movement" gives the model nowhere to go, and it will have to guess which movement to use. Pick the one that serves the shot.
Framing and Composition
Composition is how elements are arranged inside the frame. AI models respond well to a small set of named compositional rules. You can experiment with these and discover how some overlap, but here are some general expectations you can have.
Technique | What it does |
|---|---|
Rule of thirds | Subject placed on a third line rather than dead center. Naturally pleasing, leaves room for context. |
Centered composition | Subject perfectly centered. Formal, intentional. Wes Anderson or Kubrick references land well here. |
Leading lines | Roads, hallways, fences, light beams that draw the eye toward the subject. |
Frame within a frame | Subject seen through a doorway, window, or archway. Adds depth and context. |
Negative space | Large empty area around the subject. Loneliness, scale, or room for a text overlay in social content. |
Foreground/midground/background | Three layers of depth. Naming all three in a prompt almost always produces a more dimensional shot. |
Composition is arguably the most "artistic" choice to make about a scene. Many directors make a certain type of composition their trademark and once you see it, it's impossible to miss.
While most of the other decisions about a scene have relatively predictable outcomes (wide shots lead to an epic feeling and so on), framing is a bit ambiguous. The same sequence, framed differently, can either ground the action or completely disorient the viewer (sometimes both).
If we put the subject dead center in the framing, the viewer understands what the focus of the scene is and what the shot is about.
But if you leave enough space around the subject, there's no clear focus and no obvious focal point for the scene.
Pro tip: When you want the model to leave room for a logo, caption, or product overlay in social ad creatives, say so! Try including something like "negative space on the right for text overlay" in your prompt.
Depth of Field
Depth of field is what separates a shot that looks like a phone clip and a shot that looks like it was taken on a cine lens. AI models handle this concept well when you name it explicitly.
Technique | What it does |
|---|---|
Shallow depth of field | Narrow plane of focus. Background blurs into bokeh. Cinematic, intimate, isolates the subject. |
Deep depth of field | Foreground and background both sharp. Documentary, landscape, environmental. |
Bokeh | The quality of the out-of-focus areas. Specify "creamy bokeh" or "anamorphic bokeh" for character. |
Rack focus | Focus shifts from one subject to another within the same shot. A directed reveal. |
Macro focus | Extreme close focus on tiny detail. Pollen, water droplets, fabric weave. |
Depth of field is less about aesthetics and more about attention. It tells the viewer where to look... and just as importantly, what to ignore. A shallow depth of field collapses the world down to one thing (useful for when you want viewers to focus on you product!). A deep depth of field says everything in front of you is relevant.
Lighting
Lighting often goes unnoticed because we're just so used to seeing certain standard lighting setups. Often, you don't even realize how lighting is adding to the emotional weight of a scene.
However, it makes all the difference when you get it wrong. Especially with AI video, lighting is the thing that sets apart slop from engaging cinematography.
Most AI models have a strong default style, often slightly oversaturated and evenly lit. Naming a lighting setup overrides that default and can make your scene lose that "AI look" and feel.
Setup | What it does |
|---|---|
Three-point lighting | Key light, fill light, backlight. The studio standard. Clean, professional, talking head ready. |
Rembrandt lighting | Key light at 45 degrees creating a small triangle of light on the shadow-side cheek. Classic portrait look. |
Golden hour | Warm, low-angle sunlight in the hour after sunrise or before sunset. Soft, flattering, cinematic shorthand for emotion. |
Blue hour | The brief window after sunset. Cool, moody, dusk atmosphere. |
Practical lighting | Light sources visible in the frame: lamps, neon signs, candles, screens. Realism and atmosphere. |
Backlight / rim light | Light source behind the subject. Creates a halo and separates the subject from the background. |
Silhouette | Subject completely dark against a bright background. Mood, mystery, anonymity. |
High key | Bright, low contrast, evenly lit. Beauty, fashion, comedy. |
Low key | Dark, high contrast, deep shadows. Drama, thriller, noir. |
Chiaroscuro | Strong contrast between dark and light within the same frame. Painterly, Caravaggio-adjacent. |
Hard light | Produces sharp-edged shadows. Directional, dramatic, unforgiving. |
Soft light | Wraps around the subject. Flattering, natural, commercial. |
Of all the cinematography decisions you can make, lighting has the most immediate impact on mood. The same subject, the same framing, the same movement; lit differently, you have a completely different scene.
Check out this shot in standard three-point lighting:
Compare that with a shot with harsh overhead lighting:
The difference isn't subtle. Three-point lighting says: this person is in control, this is a professional environment, trust what they're telling you. Harsh overhead lighting says something is wrong. Same person, same room, completely different story.
Pro tip: When a generation comes out flat, the issue is usually lighting direction, not lighting amount. Specify where the key light is coming from. "Key light from camera left at 45 degrees" gives the model a clear instruction.
Putting It Together: The Cinematography Stack
Every section of this guide covers one decision.
When you combine them in a single prompt, the results improve dramatically. Think of it as a checklist you run through before you commit to a generation.
Start with shot size.
How far is the camera from the subject? Then decide on an angle. Then movement. Then think about where the subject sits in the frame, what the lighting is doing, what is in focus, and finally, whether a film reference would push the look in a specific direction.
A prompt that answers all seven of those questions will outperform one that answers two or three, almost every time. Not because the model needs the information but because the specificity removes the guesswork, and guesswork is where AI video goes wrong.

