Skip to content

VidStitch AI Workflow

VidStitch AI transforms your script and voiceover into a fully-produced video with AI-sourced visuals. This guide covers the complete workflow from upload to final video.


Overview

What it does: Takes your SRT transcription and voiceover audio, analyzes the content, automatically sources relevant visuals from the web, and composes a complete video.

Best for: - News content production - Educational explainers - Documentary narration - Any script-based video

Time required: 15-45 minutes (depending on script length)


What You Need

Before starting, prepare:

Item Format Notes
Transcription SRT file Timed subtitles matching voiceover
Voiceover MP3 or WAV Audio narration

💡 Tip: The quality of your SRT timing directly affects video synchronization.


Step 1: Access VidStitch AI

  1. Click "VidStitch AI" in the sidebar
  2. You'll see the main interface with:
  3. Upload zone
  4. Recent projects
  5. AnimaBot assistant

Step 2: Upload Your Files

Upload Transcription (SRT)

  1. Click "Upload Transcription" or drag file
  2. Select your .srt file
  3. File is validated and parsed

SRT Format Example:

1
00:00:00,000 --> 00:00:03,500
The ancient city of Rome was founded in 753 BC.

2
00:00:03,500 --> 00:00:07,200
It grew to become one of the largest empires in history.

3
00:00:07,200 --> 00:00:11,800
At its peak, Rome controlled most of Europe and beyond.

Upload Voiceover Audio

  1. Click "Upload Voiceover" or drag file
  2. Select your MP3 or WAV file
  3. Audio duration is detected

⚠️ Important: Voiceover audio must match the SRT timing exactly.


Step 3: Configure Settings

Content Mode

Choose the style that matches your content:

Mode Description Best For
Documentary Informative, educational visuals Documentaries, history, science
News Current events, journalistic style News reports, current affairs

Background Style

Select the visual aesthetic:

  • Cinematic - Film-like visuals
  • Clean - Minimalist, professional
  • Dynamic - Fast-paced, energetic
  • Vintage - Retro, historical feel

Optional Briefing

Add context to improve AI accuracy:

Topic: The Roman Empire
Style: Historical documentary
Focus: Architecture and military conquests
Avoid: Modern recreations, dramatic reenactments

Step 4: Start Processing

  1. Review your settings
  2. Click "Generate Video"
  3. Processing begins

Step 5: Monitor Progress

VidStitch AI progresses through several stages:

Processing Stages

Stage Description Typical Time
Analyzing Parsing script, understanding content 1-2 min
Strategizing Generating search strategies 1-2 min
Sourcing Finding and downloading visuals 5-15 min
Editing Composing final video 5-10 min

AnimaBot Assistant

The AnimaBot provides: - Real-time status updates - Helpful tips during processing - Error explanations if issues occur

Progress Indicators

  • Overall progress percentage
  • Current stage indicator
  • Estimated time remaining
  • Segment completion count

Step 6: Handle Stalls (If Needed)

Sometimes processing may stall due to: - Visual sourcing difficulties - Network issues - Server load

Retry Options

If processing stalls:

  1. Retry - Attempt to continue from current point
  2. Resume - Start from last checkpoint
  3. Restart - Begin fresh processing

💡 Tip: VidStitch AI automatically saves checkpoints, so you rarely lose progress.


Step 7: Review and Download

Preview Your Video

Once complete: 1. Video player appears with result 2. Watch the full video 3. Check visual-audio synchronization

Quality Check

Verify: - ✅ Visuals match narration context - ✅ Timing is synchronized - ✅ No awkward transitions - ✅ Audio levels are balanced

Download

  1. Click "Download Video"
  2. MP4 file saves to your device
  3. Optionally download thumbnail

Understanding AI Decisions

How Visual Selection Works

  1. Script Analysis - AI identifies key concepts, entities, locations
  2. Query Generation - Creates specific search queries
  3. Source Retrieval - Finds relevant videos/images
  4. Quality Scoring - Ranks sources by relevance
  5. Selection - Chooses best matches for each segment

Visual Source Types

Source Description
Video clips Short video segments from web
Images Static images with Ken Burns effect
Stock footage Generic B-roll when specific unavailable

Best Practices

Script Writing for AI

DO: - ✅ Use specific names, dates, places - ✅ Describe visual subjects clearly - ✅ Keep sentences concise - ✅ Use chronological order

DON'T: - ❌ Use vague language ("things", "stuff") - ❌ Reference fictional content - ❌ Include off-screen dialogue - ❌ Use heavy metaphors

Example Good Script

The Colosseum in Rome was completed in 80 AD.
Emperor Titus opened it with 100 days of games.
Gladiators fought wild animals from Africa.
Over 50,000 spectators could watch the events.

Example Poor Script

It was built a long time ago.
People went there for entertainment.
Things happened that were exciting.
Many folks enjoyed the shows.

Content Mode Details

Documentary Mode

  • Longer, more contemplative visuals
  • Educational imagery focus
  • Historical accuracy priority
  • Slower transitions

News Mode

  • Faster-paced editing
  • Current event imagery
  • Dynamic visual changes
  • More contemporary sources

Troubleshooting

"Analysis Failed"

  • Check SRT file format
  • Ensure voiceover matches SRT timing
  • Try shorter script for testing

"Sourcing Stalled"

  • Click "Retry" to continue
  • AI may struggle with obscure topics
  • Add briefing context to help

"Poor Visual Matches"

  • Make script more specific
  • Add briefing context
  • Consider Documentary vs News mode

"Audio Out of Sync"

  • Verify SRT timestamps match audio
  • Regenerate SRT file if needed
  • Check voiceover file integrity

Script Length Guidelines

Script Length Estimated Processing Recommendation
0-1,000 chars 10-15 min Quick test
1,000-5,000 chars 15-25 min Standard video
5,000-15,000 chars 25-40 min Long-form content
15,000-24,000 chars 40-60 min Maximum length

⚠️ Limit: Maximum script length is 24,000 characters.


Next Steps