VidStitch AI Workflow

VidStitch AI transforms your script and voiceover into a fully-produced video with AI-sourced visuals. This guide covers the complete workflow from upload to final video.

Overview

What it does: Takes your SRT transcription and voiceover audio, analyzes the content, automatically sources relevant visuals from the web, and composes a complete video.

Best for: - News content production - Educational explainers - Documentary narration - Any script-based video

Time required: 15-45 minutes (depending on script length)

What You Need

Before starting, prepare:

Item	Format	Notes
Transcription	SRT file	Timed subtitles matching voiceover
Voiceover	MP3 or WAV	Audio narration

💡 Tip: The quality of your SRT timing directly affects video synchronization.

Step 1: Access VidStitch AI

Click "VidStitch AI" in the sidebar
You'll see the main interface with:
Upload zone
Recent projects
AnimaBot assistant

Step 2: Upload Your Files

Upload Transcription (SRT)

Click "Upload Transcription" or drag file
Select your .srt file
File is validated and parsed

SRT Format Example:

1
00:00:00,000 --> 00:00:03,500
The ancient city of Rome was founded in 753 BC.

2
00:00:03,500 --> 00:00:07,200
It grew to become one of the largest empires in history.

3
00:00:07,200 --> 00:00:11,800
At its peak, Rome controlled most of Europe and beyond.

Upload Voiceover Audio

Click "Upload Voiceover" or drag file
Select your MP3 or WAV file
Audio duration is detected

⚠️ Important: Voiceover audio must match the SRT timing exactly.

Step 3: Configure Settings

Content Mode

Choose the style that matches your content:

Mode	Description	Best For
Documentary	Informative, educational visuals	Documentaries, history, science
News	Current events, journalistic style	News reports, current affairs

Background Style

Select the visual aesthetic:

Cinematic - Film-like visuals
Clean - Minimalist, professional
Dynamic - Fast-paced, energetic
Vintage - Retro, historical feel

Optional Briefing

Add context to improve AI accuracy:

Topic: The Roman Empire
Style: Historical documentary
Focus: Architecture and military conquests
Avoid: Modern recreations, dramatic reenactments

Step 4: Start Processing

Review your settings
Click "Generate Video"
Processing begins

Step 5: Monitor Progress

VidStitch AI progresses through several stages:

Processing Stages

Stage	Description	Typical Time
Analyzing	Parsing script, understanding content	1-2 min
Strategizing	Generating search strategies	1-2 min
Sourcing	Finding and downloading visuals	5-15 min
Editing	Composing final video	5-10 min

AnimaBot Assistant

The AnimaBot provides: - Real-time status updates - Helpful tips during processing - Error explanations if issues occur

Progress Indicators

Overall progress percentage
Current stage indicator
Estimated time remaining
Segment completion count

Step 6: Handle Stalls (If Needed)

Sometimes processing may stall due to: - Visual sourcing difficulties - Network issues - Server load

Retry Options

If processing stalls:

Retry - Attempt to continue from current point
Resume - Start from last checkpoint
Restart - Begin fresh processing

💡 Tip: VidStitch AI automatically saves checkpoints, so you rarely lose progress.

Step 7: Review and Download

Preview Your Video

Once complete: 1. Video player appears with result 2. Watch the full video 3. Check visual-audio synchronization

Quality Check

Verify: - ✅ Visuals match narration context - ✅ Timing is synchronized - ✅ No awkward transitions - ✅ Audio levels are balanced

Download

Click "Download Video"
MP4 file saves to your device
Optionally download thumbnail

Understanding AI Decisions

How Visual Selection Works

Script Analysis - AI identifies key concepts, entities, locations
Query Generation - Creates specific search queries
Source Retrieval - Finds relevant videos/images
Quality Scoring - Ranks sources by relevance
Selection - Chooses best matches for each segment

Visual Source Types

Source	Description
Video clips	Short video segments from web
Images	Static images with Ken Burns effect
Stock footage	Generic B-roll when specific unavailable

Best Practices

Script Writing for AI

DO: - ✅ Use specific names, dates, places - ✅ Describe visual subjects clearly - ✅ Keep sentences concise - ✅ Use chronological order

DON'T: - ❌ Use vague language ("things", "stuff") - ❌ Reference fictional content - ❌ Include off-screen dialogue - ❌ Use heavy metaphors

Example Good Script

The Colosseum in Rome was completed in 80 AD.
Emperor Titus opened it with 100 days of games.
Gladiators fought wild animals from Africa.
Over 50,000 spectators could watch the events.

Example Poor Script

It was built a long time ago.
People went there for entertainment.
Things happened that were exciting.
Many folks enjoyed the shows.

Content Mode Details

Documentary Mode

Longer, more contemplative visuals
Educational imagery focus
Historical accuracy priority
Slower transitions

News Mode

Faster-paced editing
Current event imagery
Dynamic visual changes
More contemporary sources

Troubleshooting

"Analysis Failed"

Check SRT file format
Ensure voiceover matches SRT timing
Try shorter script for testing

"Sourcing Stalled"

Click "Retry" to continue
AI may struggle with obscure topics
Add briefing context to help

"Poor Visual Matches"

Make script more specific
Add briefing context
Consider Documentary vs News mode

"Audio Out of Sync"

Verify SRT timestamps match audio
Regenerate SRT file if needed
Check voiceover file integrity

Script Length Guidelines

Script Length	Estimated Processing	Recommendation
0-1,000 chars	10-15 min	Quick test
1,000-5,000 chars	15-25 min	Standard video
5,000-15,000 chars	25-40 min	Long-form content
15,000-24,000 chars	40-60 min	Maximum length

⚠️ Limit: Maximum script length is 24,000 characters.

Next Steps

Writing Effective Scripts - Improve AI results
Rendering Options - Output settings
Troubleshooting - Common issues