LyrAssist - AI-Powered Lyric Transcription & Video Generation
Overview
LyrAssist is an AI-powered full-stack web application that transforms audio and video files into professional lyric videos with synchronized transcripts. Upload a song or video, and get back a beautifully formatted lyric video with interactive, Spotify-style transcript features that allow you to click any word to jump to that moment in the video.
Key Features
Smart Audio Processing
- Multiple Input Methods: Upload audio files (MP3, WAV, M4A), video files (MP4, MOV, AVI), or record directly in your browser
- Vocal Separation: Optional AI-powered vocal isolation using Demucs for improved transcription accuracy on music tracks
- Flexible Model Selection: Choose from 5 Whisper model sizes to balance speed and accuracy
Intelligent Video Generation
- Audio-to-Video: Automatically generates lyric videos with black backgrounds for audio-only files
- Video Enhancement: Overlays synchronized lyrics on existing video content
- Two Rendering Modes:
- Phrase Mode: Clean, phrase-level subtitles with dynamic positioning
- Karaoke Mode: Word-by-word highlighting with precise timestamps (experimental)
Interactive Transcript Features
- Live Auto-Highlighting: Lyrics automatically highlight in sync with video playback (Spotify-style)
- Current line highlighted with indigo background
- Active word highlighted in gold with glow effect
- Auto-scrolls to keep current lyrics visible
- Clickable Lyrics: Click any word or line in the transcript to jump directly to that moment in the video
- Auto-Play: Video automatically starts playing when you click on lyrics
- Downloadable Transcript: Export timestamped transcripts as formatted text files
- Modern UI: Clean, Spotify-inspired interface with hover effects and smooth animations
Real-Time Processing
- Live Logs: Watch real-time processing updates as AI transcribes and renders your video
- Background Processing: Asynchronous task handling prevents browser timeouts
- Progress Tracking: Clear status updates throughout the entire pipeline
Technical Implementation
Technology Stack
Backend:
- Python 3.10+ with Flask web framework
- OpenAI Whisper - State-of-the-art speech-to-text transcription
- WhisperX - Forced alignment for word-level timestamps
- Demucs - Neural source separation for vocal isolation
- MoviePy - Video composition and subtitle rendering
- PyTorch - Deep learning backend
Frontend:
- HTML5 + Tailwind CSS - Responsive, modern UI
- Vanilla JavaScript - Client-side interactivity
- MediaRecorder API - Browser-based audio recording
Media Processing:
- FFmpeg - Video/audio encoding and processing
- Pydub - Audio manipulation and format conversion
- h264_videotoolbox - Hardware-accelerated video encoding
System Architecture
The application uses a sophisticated processing pipeline:
- Upload/Record → User provides audio or video input
- Audio Extract → FFmpeg extracts audio track
- Transcribe → Whisper performs speech-to-text
- Align → WhisperX adds word-level timestamps
- Render → MoviePy generates synchronized lyric video
Optional vocal separation (Demucs) can be inserted before transcription for improved accuracy on music tracks.
Impact & Use Cases
LyrAssist serves multiple use cases:
- Musicians: Create professional lyric videos for social media and streaming platforms
- Content Creators: Add captions to video content for accessibility
- Podcasters: Generate searchable transcripts with timestamps
- Educators: Create educational videos with synchronized subtitles
- Researchers: Transcribe interviews and lectures with precise timing
Performance
- Typical processing time for a 3-minute song: ~30-90 seconds (Medium model, no vocal separation)
- With vocal separation: add 30-60 seconds
- With karaoke mode: 2-3x longer
- GPU acceleration supported for faster transcription
- Hardware-accelerated video encoding on macOS
Links
- Live Demo: https://shyamksateesh.github.io/LyrAssist/
- GitHub Repository: https://github.com/shyamksateesh/LyrAssist
Technologies Used
Python • Flask • OpenAI Whisper • WhisperX • Demucs • PyTorch • MoviePy • FFmpeg • JavaScript • Tailwind CSS • HTML5
