LyrAssist - AI-Powered Lyric Transcription & Video Generation

Overview

LyrAssist is an AI-powered full-stack web application that transforms audio and video files into professional lyric videos with synchronized transcripts. Upload a song or video, and get back a beautifully formatted lyric video with interactive, Spotify-style transcript features that allow you to click any word to jump to that moment in the video.

Key Features

Smart Audio Processing

  • Multiple Input Methods: Upload audio files (MP3, WAV, M4A), video files (MP4, MOV, AVI), or record directly in your browser
  • Vocal Separation: Optional AI-powered vocal isolation using Demucs for improved transcription accuracy on music tracks
  • Flexible Model Selection: Choose from 5 Whisper model sizes to balance speed and accuracy

Intelligent Video Generation

  • Audio-to-Video: Automatically generates lyric videos with black backgrounds for audio-only files
  • Video Enhancement: Overlays synchronized lyrics on existing video content
  • Two Rendering Modes:
    • Phrase Mode: Clean, phrase-level subtitles with dynamic positioning
    • Karaoke Mode: Word-by-word highlighting with precise timestamps (experimental)

Interactive Transcript Features

  • Live Auto-Highlighting: Lyrics automatically highlight in sync with video playback (Spotify-style)
    • Current line highlighted with indigo background
    • Active word highlighted in gold with glow effect
    • Auto-scrolls to keep current lyrics visible
  • Clickable Lyrics: Click any word or line in the transcript to jump directly to that moment in the video
  • Auto-Play: Video automatically starts playing when you click on lyrics
  • Downloadable Transcript: Export timestamped transcripts as formatted text files
  • Modern UI: Clean, Spotify-inspired interface with hover effects and smooth animations

Real-Time Processing

  • Live Logs: Watch real-time processing updates as AI transcribes and renders your video
  • Background Processing: Asynchronous task handling prevents browser timeouts
  • Progress Tracking: Clear status updates throughout the entire pipeline

Technical Implementation

Technology Stack

Backend:

  • Python 3.10+ with Flask web framework
  • OpenAI Whisper - State-of-the-art speech-to-text transcription
  • WhisperX - Forced alignment for word-level timestamps
  • Demucs - Neural source separation for vocal isolation
  • MoviePy - Video composition and subtitle rendering
  • PyTorch - Deep learning backend

Frontend:

  • HTML5 + Tailwind CSS - Responsive, modern UI
  • Vanilla JavaScript - Client-side interactivity
  • MediaRecorder API - Browser-based audio recording

Media Processing:

  • FFmpeg - Video/audio encoding and processing
  • Pydub - Audio manipulation and format conversion
  • h264_videotoolbox - Hardware-accelerated video encoding

System Architecture

The application uses a sophisticated processing pipeline:

  1. Upload/Record → User provides audio or video input
  2. Audio Extract → FFmpeg extracts audio track
  3. Transcribe → Whisper performs speech-to-text
  4. Align → WhisperX adds word-level timestamps
  5. Render → MoviePy generates synchronized lyric video

Optional vocal separation (Demucs) can be inserted before transcription for improved accuracy on music tracks.

Impact & Use Cases

LyrAssist serves multiple use cases:

  • Musicians: Create professional lyric videos for social media and streaming platforms
  • Content Creators: Add captions to video content for accessibility
  • Podcasters: Generate searchable transcripts with timestamps
  • Educators: Create educational videos with synchronized subtitles
  • Researchers: Transcribe interviews and lectures with precise timing

Performance

  • Typical processing time for a 3-minute song: ~30-90 seconds (Medium model, no vocal separation)
  • With vocal separation: add 30-60 seconds
  • With karaoke mode: 2-3x longer
  • GPU acceleration supported for faster transcription
  • Hardware-accelerated video encoding on macOS

Technologies Used

Python • Flask • OpenAI Whisper • WhisperX • Demucs • PyTorch • MoviePy • FFmpeg • JavaScript • Tailwind CSS • HTML5