How to Use Google Gemini Omni: A Step-by-Step Guide Google Gemini Omni

Google Gemini Omni generates video from text, images, or existing footage right in your browser no software to install, no GPU required. This step-by-step guide covers everything: creating your account, understanding the three input types, writing effective prompts, and getting the most out of your credit balance.

By the end of this guide you’ll have generated your first video and understand exactly how the credit system works so you don’t waste anything.

Create an account with your email or Google account. Note: there is no free tier. You can sign up and browse the dashboard for free, but generating requires a paid plan. Plans start at $49/mo for 1,200 credits.

A 1080p, 8-second video costs approximately 150 credits. At the Starter tier, that’s about 8 full videos per month. If you need more, Pro at $69/mo gives you 4,000 credits (~26 videos/month at 1080p) and includes API access.

See the full pricing breakdown including the yearly discount options.

Step 2: Understanding the three input types

Gemini Omni generates video three ways, all from the same form and the same credit balance. Choosing the right input is the most important decision before you click Generate.

Text to Video

The workhorse. You type a description and the model generates a clip from scratch.

Best for: B-roll, ad creative, atmospheric footage, abstract visuals, concept work.

Image to Video

You upload a single still and the model animates it, using your image as the first frame. Drop in one product photo and get a polished moving clip, or bring a character still to life.

Best for: product videos, turning photography into motion, e-commerce.

Video to Video

You provide an existing clip plus a prompt, and the model restyles or transforms it without a reshoot.

Best for: restyling footage, refreshing old clips, applying a new look to existing video.

Step 3: Open the Playground

The Playground is on the homepage. You’ll see the Text to Video form, with optional drop zones to attach a reference image (for image-to-video) or a reference video (for video-to-video).

You don’t need to configure anything before your first generation the defaults (1080p, 8 seconds) are sensible for most use cases.

Step 4: Write your first prompt

The single biggest factor in output quality is prompt quality. A weak prompt produces generic output. A specific prompt produces exactly what you need.

The anatomy of a strong video prompt

Every good video prompt has five elements:

1. Subject who or what is in the video?

“A golden retriever puppy” (specific) vs “a dog” (generic)

2. Action what is the subject doing?

“sprinting through a sunlit wheat field, ears flapping”

3. Setting where, when, what conditions?

“late afternoon, golden hour, slight warm breeze, summer”

4. Style what visual quality or aesthetic?

“cinematic 4K, shallow depth of field, warm color grade”

5. Camera movement how does the frame move?

“slow dolly forward, low angle”

Assembled prompt:

“A golden retriever puppy sprinting through a sunlit wheat field, ears flapping, late afternoon golden hour, slight warm breeze, cinematic 4K shallow depth of field, slow dolly forward at a low angle”

Compare that to “a dog running” the difference in output quality is significant.

Prompt examples by use case

Product shot (e-commerce):

“A premium sneaker rotating slowly on a minimalist white platform, dramatic studio lighting with subtle blue rim light, professional product photography, 4K”

Abstract / tech:

“Glowing data streams flowing through dark abstract networks, blue and white particles, deep space background, technology aesthetic, 4K, slow camera drift”

Real estate exterior:

“Modern home exterior at golden hour, subtle forward camera push, lush landscaping, professional real estate style, 4K”

Vertical social content:

“A barista pouring latte art in a cozy coffee shop, warm lighting, slow motion steam rising, vertical 9:16 format for Instagram Reels”

Step 5: Set resolution and duration

After writing your prompt, set:

Resolution:

720p fast, cheap, use for drafts and iteration
1080p standard for social media, YouTube, most uses (default)
4K for broadcast, large screens, hero assets

Credit cost scales with resolution: roughly 75 credits (720p), 150 credits (1080p), 300 credits (4K) for an 8-second clip.

Duration:

Range: 4–10 seconds per clip
8 seconds is the default and works for most use cases
For longer sequences, generate multiple clips and edit them together

Step 6: Generate and review

Hit Generate. Most jobs complete in 30–90 seconds. You’ll see a progress indicator in the Playground; when it’s done, the video appears inline.

If the output isn’t quite right:

Adjust the prompt add more detail on the specific element that’s off
Change the style modifier try “documentary style” vs “cinematic” vs “photorealistic”
Try again diffusion models have randomness. The same prompt can produce meaningfully different results on each run

Step 7: Download and use

Click the download button to get your MP4. The file is yours commercial license included on all paid plans. Use it in:

Social media (TikTok, Reels, YouTube)
Paid advertising (Meta, Google, YouTube ads)
Client deliverables
Embedded in products you sell

See the full commercial license terms for what’s included.

Step 8: Find your jobs in History

All your generations are saved in /history. You can:

Filter by model type, status, or date
Re-run any job with the original settings (useful when you find a prompt that works)
Download outputs again if you forgot to save them
Export to CSV for billing reconciliation
Bulk delete jobs you no longer need

Using image to video

Image-to-video has one extra step: attaching a reference image.

In the Playground, drop a still into the reference image zone (or click to browse)
Write a prompt describing the motion you want: “Product rotating slowly on a minimalist pedestal, soft studio light, subtle reflections”
Set your resolution and duration
Generate the model uses your image as the first frame and animates from there

Tips for image to video:

Use a clean, well-lit image the clearer the subject, the better the motion
Describe the motion explicitly (“slow rotation,” “gentle camera push-in”) rather than leaving it open
Front-lit product shots on simple backgrounds animate the most reliably

Using video to video

Video-to-video restyles footage you already have.

Drop an existing clip into the reference video zone in the Playground
Write a prompt describing the new look: “Restyle as a warm, cinematic film grade with shallow depth of field”
Generate the model transforms your clip while keeping its motion

Tips for video to video:

Shorter source clips transform faster and cost fewer credits
Be specific about the target style (“anime,” “film noir,” “watercolor”) for a clear result
Keep the original framing simple busy scenes are harder to restyle cleanly

Optimizing your credit usage

Credits are precious, especially on Starter. Here’s how to use them efficiently:

Draft at 720p, final at 4K. A 720p draft costs half the credits of a 1080p final. Iterate on the prompt at 720p, then do one 4K final when you have the prompt dialed in.

Short duration for A/B testing. Generate 4-second clips when testing variations of a prompt. Twice as many tests for the same credits.

Re-run winners, not losers. When you find a prompt that produces a good result, use the re-run feature in /history rather than typing the same prompt from scratch.

Use the credit estimate. The Playground shows you the credit cost before you generate. Check it before hitting Submit on anything.

Getting started today

The fastest path from zero to your first video:

Create your account
Choose a plan that fits your volume Starter for occasional use, Pro if you’re generating weekly
Open the Playground
Try this prompt to start: “A lone lighthouse on a rocky cliff, storm waves below, dark clouds, cinematic 4K wide shot, slow push forward”
Download, use commercially, and build from there

Related:

How to Use Google Gemini Omni: A Step-by-Step Guide

Step 1: Sign up and choose a plan