Vitthal Bhandari

I'm currently interested in AI Safety from a variety of perspectives - evaluation of language models for hidden objectives/biases; pluralistic alignment of language models; emergent misalignment.

Here is some more information about me.

What kind of research am I interested in?: I am an NLP researcher who applies ML to solve sociotechnical issues using human-centered approaches. Few of my major previous projects are:
1. Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments - Link - ACL 2022 workshop paper
2. On the Challenges of Building Hate Speech Datasets - Link - preprint
3. LGBQTweet - A community-sourced dataset for detecting hate against sexual and gender identity minorities
4. Studying the evolution of attitudes toward gender identity minorities over time across partisan leanings in popular political podcasts - Code
What am I currently working on?:
1. Evaluating the cultural competence of LLMs - Project for LING 575 (Societal Impacts of Language Technology)
2. Stanford CS 336: Language Modeling from Scratch
Regarding the ideas I am interested in exploring, I do have a few broad overarching themes in my mind (non-exhaustive list):
1. I am interested in assessing the harms of emergent misalignment from a human-centered lens, its effect on user-facing applications such as content moderation, and ways to personalize LMs with different social norms and values
Why am I a great hire?: I am a generalist software engineer with 4+ years of experience across American Express, PayPal, and Standard Chartered. Here are some notable highlights and projects I have under my belt:
1. At Amex, I created a language translation feature (HuggingFace, Sanic, Flask) using the Marian MT framework for translating chats between English and Spanish
2. At Standard Chartered I led a team of 4 in developing an API management framework (TypeScript, React, Flask) and creating configurable APIs (1.3K+ APIs) to fetch, add, and update data from Oracle SQL and Dremio data lakes

Research [ minimize ]

Coursework