LyrAssist - AI-Powered Lyric Transcription & Video Generation
Full-stack web application that automatically transcribes audio/video files and generates synchronized lyric videos using OpenAI Whisper, WhisperX, and Demucs AI models
Full-stack web application that automatically transcribes audio/video files and generates synchronized lyric videos using OpenAI Whisper, WhisperX, and Demucs AI models
Interactive visualization tool analyzing 20 years of Manhattan pedestrian infrastructure data using React, D3.js, and geospatial processing
Advanced multi-agent reinforcement learning implementation with parameter sharing and Independent Q-Learning, achieving robust scalability across 2-5 agents![]()
Published in International Conference on Data Science and Applications (ICDSA 2024), 2024
This study aims to understand and improve the predictive accuracy of emotional state classification through metrics such as valence, arousal, dominance, and likeness by applying a long short-term memory (LSTM) network to analyze EEG signals.
Citation: Sateesh, S. K., Sparsh, B. K., & Uma, D. (2024). "Decoding Human Emotions: Analyzing Multi-channel EEG Data Using LSTM Networks." International Conference on Data Science and Applications. Springer Nature Singapore, 503-515.
Download Paper | View on Springer
Published in International Conference on Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2024), 2024
This survey overviews various meta-learning approaches used in audio and speech processing scenarios. Meta-learning is used where model performance needs to be maximized with minimum annotated samples, making it suitable for low-sample audio processing.
Citation: Raimon, A., Masti, S., Sateesh, S. K., Vengatagiri, S., & Das, B. (2024). "Meta-learning in Audio and Speech Processing: An End to End Comprehensive Review." International Conference on Multi-disciplinary Trends in Artificial Intelligence. Springer Nature Singapore, 140-154.
Download Paper | View on Springer
Published in 2025 IEEE International Conference on Big Data (BigData), 2025
This study investigates how audio and video modalities contribute to emotion perception in music videos, accounting for cognitive effects such as the primacy-recency effect. Using EfficientNetB0 for audio and transformers for video, with valence, arousal, and dominance as labels, weighted late fusion is applied to study modal influences.
Citation: S. Masti, S. K. Sateesh, S. Vengatagiri, A. Raimon and B. Das, "Weight of a Feeling: Temporal and Modal Contributions to Emotion from Music Videos," 2025 IEEE International Conference on Big Data (BigData), Macau, China, 2025, pp. 5187-5193, doi: 10.1109/BigData66926.2025.11401735.
View on IEEE Xplore