-
Experiments with building a BPE Tokenizer
I implemented a subset of Assignment 1 from Stanford's CS 336 (Language Modeling from Scratch) involving Byte-Pair Encoding (BPE) tokenization. I expected a "cute algorithm + a couple unit tests." What I got was a surprisingly real systems problem: data structures, CPU caches, file I/O, and a lot of "why is this slower?" moments.
Read more → -
Notes on my first quarter at UW
I am writing this as my finals grade for a linguistics course just dropped and made me speechless for a good half hour. I realized I just completed my first quarter at UW. As someone who left a cushy tech job (with an exponential growth curve) back in India for my Master's, I felt it was important for me to write everything down. I wanted to be grateful about the things that worked for me, rant about those that didn't, and simmer in deep thought about everything in between.
Read more →