Projects & Competitions

October 2022 - December 2022

Boston, MA

Capitalizing on the sequential nature of text model generators, utilized symbolic representation of music as training data to re-train the entire Diffusion-LM model to generate new piano music sounds
Worked with over 10,000+ MIDI files representation of piano sounds and encoded each musical note in the form of (pitch, velocity, duration) and parsed it into text files
Fine-tuned Diffusion-LM architecture through the inclusion of two different transformer architectures - BERT and ELECTRA-BERT, to test the fluctuations in music quality, melody rhythm, overfitting of the model, and optimization power between the two versions of the architecture

November 2022 - December 2022

Boston, MA

Built a regression model deploying BERT for encoding input data to predict English Language Learner’s (ELL) essay score and ranked 159 (top 8%) across approximately 2600 teams.
Fine-tuned the model with different hidden and dense layers and applied 4 optimization strategies- gradient accumulation, free embedding, dynamic padding, and uniform length batching to increase model accuracy and training speed.

September 2022 - October 2022

Boston, MA

Effectively used DCGANs and VQ-VAE as an effective data augmentation tool to generate 500+ different video replays of top-score players while simultaneously filtering out short-lived agents.
Adopted and improved ML/DL pipeline with an Imitation Learning backbone using TorchBeast-Monobeast trained for a checkpoint with 1000 generated high quality video replays.
Applied multi-agent PPO algorithm to boost the IL-pretrained checkpoint, tuned hyper-parameters by visualizing and comparing loss function of the model.

April 2021

San Fransisco, CA

Utilized supervised ML models to quantify the relationship between college students’ academic performance and their food choices with an aim to recommend food items that would improve students’ nutritional index.
Created a novel concentration index based on 5+ predictors by cleaning and pre-processing 126+ rows of dataset in Python.
Employed AIC and BIC score method in R to fit Multi-Linear Regression model based on 62+ explanatory variables to get the “best-fit” model and expressed the validity of the model by generating diagnostic plots such as scatter-plot matrix, and others.