Vitthal Bhandari

I am a graduate student in Computational Linguistics at the University of Washington UW. Before coming to Seattle, I spent more than 4 years in the banking industry as a generalist software engineer working at American Express, Standard Chartered Bank, and PayPal. I completed my Bachelor's in Computer Science and Engineering from BITS Pilani where I also did a minor in Data Science and worked with Prof. Poonam Goyal and Prof. Sundaresan Raman.

Email  /  CV  /  Scholar  /  Twitter  /  Github  /  LinkedIn  /  Blog

profile photo

News

  • My paper "Voices from the Margins: Modeling Linguistic Diversity in Spontaneous Speech for Low-Resource Languages" was accepted as an oral to the ComputEl workshop at ACL 2026. See you in sunny San Diego in July!

Research

My interests lie in efficient language modeling, adaptive agentic memory, and software engineering !!!

  1. Worked on efficient speech modeling and data curation for 21 extremely low-resource endangered languages - see Code
  2. I implemented Language Modeling from Scratch - see Blog & Code on tokenization
  3. I am an SWE with over four years of experience with Python and TypeScript, working at American Express, Standard Chartered, and PayPal
  4. At Amex, I created a language translation feature (HuggingFace, Sanic) using Marian MT for translating chats between English and Spanish
  5. At Standard Chartered, I led a team of 4 in developing an API management framework (TypeScript, React, Flask) and creating configurable APIs (1.3K+ APIs)
  6. Previous research experience includes leveraging pretrained models for detecting homophobia and transphobia in YouTube comments - ACL workshop paper
  7. A Qualitative study on the challenges of building hate speech datasets - Preprint
  8. A project on studying the evolution of attitudes toward Trans people over time across partisan leanings in popular political podcasts - Code
prl On the Challenges of Building Datasets for Hate Speech Detection
Vitthal Bhandari
Preprint

This paper presents a comprehensive framework that standardizes the dataset creation pipeline across seven critical checkpoints by identifying systemic challenges in hate speech dataset creation.

arXiv
blind-date Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments
Vitthal Bhandari and Poonam Goyal
ACL 2022 Workshop on Language Technology for Equality, Diversity and Inclusion

I contributed to a shared task focused on identifying homophobic and transphobic content in YouTube comments by implementing basic classifiers using multilingual pre-trained language models to analyze English, Tamil, and code-mixed datasets.

Paper | Code
clean-usnob Reviewing the collaborative role of Image processing in retinal imaging
Rehana Khan, Vitthal Bhandari, Sundaresan Raman, Abhishek Vyas, Akshay Raman, Maitreyee Roy and Rajiv Raman
Teleophthalmology and Digital Health: A Practical Guide to Applications, Springer Nature

Paper

Coursework

LING 575: Speech Technology for Endangered Languages
LING 572: Advanced Statistical Methods for Natural Language Processing
LING 571: Deep Processing Techniques for Natural Language Processing
LING 570: Shallow Processing Techniques for Natural Language Processing
LING 575: Societal Impacts of Language Technology
Stanford CS 336: Language Modeling from Scratch
Harvard CS 2881: AI Safety
Stanford CS 234: Reinforcement Learning

Credits of this template go to source code.