Skip to content

AI Analysis

VidStitch uses advanced AI to analyze your content and make intelligent decisions about clip placement and visual sourcing. This document explains how the AI works.


AI Technology

VidStitch is powered by Google Gemini 2.5 Flash, a state-of-the-art multimodal AI model that excels at:

  • Natural language understanding
  • Context analysis
  • Visual-text matching
  • Temporal reasoning

How Analysis Works

Content Understanding

The AI processes your content in several ways:

  1. Semantic Analysis - Understanding meaning, not just words
  2. Entity Extraction - Identifying people, places, events
  3. Temporal Mapping - Understanding time references
  4. Thematic Grouping - Recognizing related concepts

Example Analysis

Input Script:

The Roman Colosseum was completed in 80 AD under Emperor Titus.

AI Understanding: - Entity: Roman Colosseum (landmark, Rome, Italy) - Entity: Emperor Titus (historical figure, Roman) - Date: 80 AD (ancient history) - Event: Completion/construction - Theme: Ancient Roman architecture


Analysis Types by Workflow

B-roll Clips Analysis

The AI identifies: - Optimal insertion moments (natural pauses, topic transitions) - Contextual relevance (what B-roll fits the narration) - Pacing considerations (avoiding rapid cuts) - Content type awareness (documentary vs entertainment)

VidStitch AI Analysis

The AI determines: - Visual search strategies - Content categorization - Sentence-level visual mapping - Quality scoring for source materials

V5 Story Analysis

The AI performs: - Source video scene cataloging - Script-to-scene matching - Coverage gap identification - Transition planning


Moment Detection

For B-roll insertion, AI detects ideal moments based on:

Natural Pause Points

  • End of sentences
  • Topic transitions
  • Speaker changes

Content Signals

  • Descriptive language ("The vast landscape...")
  • Time references ("In 1492...")
  • Location mentions ("In Paris, France...")

Exclusion Criteria

AI avoids suggesting insertions during: - Direct quotes - Critical explanations - Emotional peaks - Question-answer sequences


Visual Matching

Query Generation

For each script segment, AI generates targeted search queries:

Script: "The Eiffel Tower was built for the 1889 World's Fair"

Generated Queries: 1. "Eiffel Tower Paris daytime" 2. "Eiffel Tower construction historical" 3. "1889 World's Fair Paris" 4. "Eiffel Tower aerial view"

Source Selection

AI ranks potential sources by: - Relevance score (0-100) - Visual quality indicators - Duration suitability - Content appropriateness


Quality Scoring

Each AI decision includes a confidence score:

Score Meaning Action
90-100 Excellent match Auto-approve
70-89 Good match Review recommended
50-69 Acceptable Manual review
< 50 Poor match Likely rejected

Content Modes

Documentary Mode

AI prioritizes: - Educational accuracy - Historical authenticity - Informative visuals - Longer clip durations

News Mode

AI prioritizes: - Current/recent footage - Fast-paced editing - Multiple visual changes - Contemporary sources

Sermon Mode

AI prioritizes: - Respectful imagery - Biblical/spiritual context - Avoiding speaker interruption - Contemplative pacing


Improving AI Results

Write Specific Content

Poor: "Many things happened during that time."

Good: "The Industrial Revolution transformed British factories between 1760 and 1840."

Include Visual Subjects

Poor: "It was important for many reasons."

Good: "The steam engine revolutionized transportation across railway networks."

Use Proper Nouns

Poor: "The leader gave a famous speech."

Good: "Winston Churchill delivered his 'We shall fight on the beaches' speech."


AI Limitations

What AI Cannot Do

  • Source copyrighted/restricted content
  • Create fictional imagery
  • Understand sarcasm reliably
  • Handle heavy metaphors
  • Process non-English content (currently)

Edge Cases

The AI may struggle with: - Very abstract concepts - Highly specialized jargon - Regional/local references - Recent events (training cutoff)


Cost and Credits

AI analysis consumes credits based on: - Script/transcript length - Analysis complexity - Number of segments - Visual sourcing (VidStitch AI)

Typical costs: | Operation | Credit Cost | |-----------|-------------| | Short analysis (<5 min) | 1-2 credits | | Standard analysis (5-15 min) | 2-5 credits | | Long analysis (15+ min) | 5-10 credits | | Visual sourcing | Additional 1-5 credits |


Transparency

VidStitch AI provides reasoning for its decisions:

Placement Reasoning

{
  "time": 45.5,
  "reason": "Topic transition to Roman architecture",
  "confidence": 87,
  "context": "Speaker moves from history to visual description"
}

Source Reasoning

{
  "segment": "The Colosseum stands in Rome",
  "query": "Colosseum Rome exterior daytime",
  "selected": "video_id_123",
  "reason": "Clear exterior shot matching description"
}

Next Steps