LyrAssist - AI-Powered Lyric Transcription & Video Generation

Overview

LyrAssist is an AI-powered full-stack web application that transforms audio and video files into professional lyric videos with synchronized transcripts. Upload a song or video, and get back a beautifully formatted lyric video with interactive, Spotify-style transcript features that allow you to click any word to jump to that moment in the video.

Key Features

Smart Audio Processing

Multiple Input Methods: Upload audio files (MP3, WAV, M4A), video files (MP4, MOV, AVI), or record directly in your browser
Vocal Separation: Optional AI-powered vocal isolation using Demucs for improved transcription accuracy on music tracks
Flexible Model Selection: Choose from 5 Whisper model sizes to balance speed and accuracy

Intelligent Video Generation

Audio-to-Video: Automatically generates lyric videos with black backgrounds for audio-only files
Video Enhancement: Overlays synchronized lyrics on existing video content
Two Rendering Modes:
- Phrase Mode: Clean, phrase-level subtitles with dynamic positioning
- Karaoke Mode: Word-by-word highlighting with precise timestamps (experimental)

Interactive Transcript Features

Live Auto-Highlighting: Lyrics automatically highlight in sync with video playback (Spotify-style)
- Current line highlighted with indigo background
- Active word highlighted in gold with glow effect
- Auto-scrolls to keep current lyrics visible
Clickable Lyrics: Click any word or line in the transcript to jump directly to that moment in the video
Auto-Play: Video automatically starts playing when you click on lyrics
Downloadable Transcript: Export timestamped transcripts as formatted text files
Modern UI: Clean, Spotify-inspired interface with hover effects and smooth animations

Real-Time Processing

Live Logs: Watch real-time processing updates as AI transcribes and renders your video
Background Processing: Asynchronous task handling prevents browser timeouts
Progress Tracking: Clear status updates throughout the entire pipeline

Technical Implementation

Technology Stack

Backend:

Python 3.10+ with Flask web framework
OpenAI Whisper - State-of-the-art speech-to-text transcription
WhisperX - Forced alignment for word-level timestamps
Demucs - Neural source separation for vocal isolation
MoviePy - Video composition and subtitle rendering
PyTorch - Deep learning backend

Frontend:

HTML5 + Tailwind CSS - Responsive, modern UI
Vanilla JavaScript - Client-side interactivity
MediaRecorder API - Browser-based audio recording

Media Processing:

FFmpeg - Video/audio encoding and processing
Pydub - Audio manipulation and format conversion
h264_videotoolbox - Hardware-accelerated video encoding

System Architecture

The application uses a sophisticated processing pipeline:

Upload/Record → User provides audio or video input
Audio Extract → FFmpeg extracts audio track
Transcribe → Whisper performs speech-to-text
Align → WhisperX adds word-level timestamps
Render → MoviePy generates synchronized lyric video

Optional vocal separation (Demucs) can be inserted before transcription for improved accuracy on music tracks.

Impact & Use Cases

LyrAssist serves multiple use cases:

Musicians: Create professional lyric videos for social media and streaming platforms
Content Creators: Add captions to video content for accessibility
Podcasters: Generate searchable transcripts with timestamps
Educators: Create educational videos with synchronized subtitles
Researchers: Transcribe interviews and lectures with precise timing

Performance

Typical processing time for a 3-minute song: ~30-90 seconds (Medium model, no vocal separation)
With vocal separation: add 30-60 seconds
With karaoke mode: 2-3x longer
GPU acceleration supported for faster transcription
Hardware-accelerated video encoding on macOS

Links

Live Demo: https://shyamksateesh.github.io/LyrAssist/
GitHub Repository: https://github.com/shyamksateesh/LyrAssist

Technologies Used

Python • Flask • OpenAI Whisper • WhisperX • Demucs • PyTorch • MoviePy • FFmpeg • JavaScript • Tailwind CSS • HTML5

Share on

X (formerly Twitter) Facebook LinkedIn