Posts by Collection

projects

LyrAssist - AI-Powered Lyric Transcription & Video Generation

Full-stack web application that automatically transcribes audio/video files and generates synchronized lyric videos using OpenAI Whisper, WhisperX, and Demucs AI models

NYC Sidewalk Time Machine

Interactive visualization tool analyzing 20 years of Manhattan pedestrian infrastructure data using React, D3.js, and geospatial processing

Multi-Agent Reinforcement Learning System (Taxi-MARL)

Advanced multi-agent reinforcement learning implementation with parameter sharing and Independent Q-Learning, achieving robust scalability across 2-5 agents

publications

Decoding Human Emotions: Analyzing Multi-channel EEG Data Using LSTM Networks

Published in International Conference on Data Science and Applications (ICDSA 2024), 2024

This study aims to understand and improve the predictive accuracy of emotional state classification through metrics such as valence, arousal, dominance, and likeness by applying a long short-term memory (LSTM) network to analyze EEG signals.

Citation: Sateesh, S. K., Sparsh, B. K., & Uma, D. (2024). "Decoding Human Emotions: Analyzing Multi-channel EEG Data Using LSTM Networks." International Conference on Data Science and Applications. Springer Nature Singapore, 503-515.
Download Paper | View on Springer

Meta-learning in Audio and Speech Processing: An End to End Comprehensive Review

Published in International Conference on Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2024), 2024

This survey overviews various meta-learning approaches used in audio and speech processing scenarios. Meta-learning is used where model performance needs to be maximized with minimum annotated samples, making it suitable for low-sample audio processing.

Citation: Raimon, A., Masti, S., Sateesh, S. K., Vengatagiri, S., & Das, B. (2024). "Meta-learning in Audio and Speech Processing: An End to End Comprehensive Review." International Conference on Multi-disciplinary Trends in Artificial Intelligence. Springer Nature Singapore, 140-154.
Download Paper | View on Springer

Weight of a Feeling: Temporal and Modal Contributions to Emotion from Music Videos

Published in 2025 IEEE International Conference on Big Data (BigData), 2025

This study investigates how audio and video modalities contribute to emotion perception in music videos, accounting for cognitive effects such as the primacy-recency effect. Using EfficientNetB0 for audio and transformers for video, with valence, arousal, and dominance as labels, weighted late fusion is applied to study modal influences.

Citation: S. Masti, S. K. Sateesh, S. Vengatagiri, A. Raimon and B. Das, "Weight of a Feeling: Temporal and Modal Contributions to Emotion from Music Videos," 2025 IEEE International Conference on Big Data (BigData), Macau, China, 2025, pp. 5187-5193, doi: 10.1109/BigData66926.2025.11401735.
View on IEEE Xplore

Shyam Krishna Sateesh