Projects & Competitions
Music Generator Using LLMs
Adapting Diffusion-LM to Discrete Music Domain​
October 2022 - December 2022
Boston, MA
-
Capitalizing on the sequential nature of text model generators, utilized symbolic representation of music as training data to re-train the entire Diffusion-LM model to generate new piano music sounds
-
Worked with over 10,000+ MIDI files representation of piano sounds and encoded each musical note in the form of (pitch, velocity, duration) and parsed it into text files
-
Fine-tuned Diffusion-LM architecture through the inclusion of two different transformer architectures - BERT and ELECTRA-BERT, to test the fluctuations in music quality, melody rhythm, overfitting of the model, and optimization power between the two versions of the architecture

Feedback Prize-English Language Learning (NLP) Competition
Bronze Medal Winner in Kaggle Challenge
November 2022 - December 2022
Boston, MA
-
Built a regression model deploying BERT for encoding input data to predict English Language Learner’s (ELL) essay score and ranked 159 (top 8%) across approximately 2600 teams.
-
Fine-tuned the model with different hidden and dense layers and applied 4 optimization strategies- gradient accumulation, free embedding, dynamic padding, and uniform length batching to increase model accuracy and training speed.

The Neural MMO Competition (NeuralIPS 2022)
Ranked top 15% across over 300 teams
September 2022 - October 2022
Boston, MA
-
Effectively used DCGANs and VQ-VAE as an effective data augmentation tool to generate 500+ different video replays of top-score players while simultaneously filtering out short-lived agents.
-
Adopted and improved ML/DL pipeline with an Imitation Learning backbone using TorchBeast-Monobeast trained for a checkpoint with 1000 generated high quality video replays.
-
Applied multi-agent PPO algorithm to boost the IL-pretrained checkpoint, tuned hyper-parameters by visualizing and comparing loss function of the model.

Stanford University’s Open Data Hackathon
Winner of Data Modeling Event
April 2021
San Fransisco, CA
-
Utilized supervised ML models to quantify the relationship between college students’ academic performance and their food choices with an aim to recommend food items that would improve students’ nutritional index.
-
Created a novel concentration index based on 5+ predictors by cleaning and pre-processing 126+ rows of dataset in Python.
-
Employed AIC and BIC score method in R to fit Multi-Linear Regression model based on 62+ explanatory variables to get the “best-fit” model and expressed the validity of the model by generating diagnostic plots such as scatter-plot matrix, and others.
