top of page

Projects & Competitions

Music Generator Using LLMs

  • GitHub
Adapting Diffusion-LM to Discrete Music Domain​

October 2022 - December 2022

Boston, MA

  • Capitalizing on the sequential nature of text model generators, utilized symbolic representation of music as training data to re-train the entire Diffusion-LM model to generate new piano music sounds

  • Worked with over 10,000+ MIDI files representation of piano sounds and encoded each musical note in the form of (pitch, velocity, duration) and parsed it into text files

  • Fine-tuned Diffusion-LM architecture through the inclusion of two different transformer architectures - BERT and ELECTRA-BERT, to test the fluctuations in music quality, melody rhythm, overfitting of the model, and optimization power between the two versions of the architecture

9-midi-waveform.png

Feedback Prize-English Language Learning (NLP) Competition

  • GitHub
Bronze Medal Winner in Kaggle Challenge

November 2022 - December 2022

Boston, MA

  • Built a regression model deploying BERT for encoding input data to predict English Language Learner’s (ELL) essay score and ranked 159 (top 8%) across approximately 2600 teams.

  • Fine-tuned the model with different hidden and dense layers and applied 4 optimization strategies- gradient accumulation, free embedding, dynamic padding, and uniform length batching to increase model accuracy and training speed.

Screen Shot 2023-09-17 at 12.16.42 AM.png

The Neural MMO Competition (NeuralIPS 2022)        

Ranked top 15% across over 300 teams

September 2022 - October 2022

Boston, MA

  • Effectively used DCGANs and VQ-VAE as an effective data augmentation tool to generate 500+ different video replays of top-score players while simultaneously filtering out short-lived agents.

  • Adopted and improved ML/DL pipeline with an Imitation Learning backbone using TorchBeast-Monobeast trained for a checkpoint with 1000 generated high quality video replays.

  • Applied multi-agent PPO algorithm to boost the IL-pretrained checkpoint, tuned hyper-parameters by visualizing and comparing loss function of the model.

2.png

Stanford University’s Open Data Hackathon

Winner of Data Modeling Event

April 2021

San Fransisco, CA

  • Utilized supervised ML models to quantify the relationship between college students’ academic performance and their food choices with an aim to recommend food items that would improve students’ nutritional index.

  • Created a novel concentration index based on 5+ predictors by cleaning and pre-processing 126+ rows of dataset in Python.

  • Employed AIC and BIC score method in R to fit Multi-Linear Regression model based on 62+ explanatory variables to get the “best-fit” model and expressed the validity of the model by generating diagnostic plots such as scatter-plot matrix, and others.

Screen Shot 2023-09-17 at 12.36.40 AM.png
bottom of page