Hello! I am a student at the University of Pennsylvania pursuing a B.S. in Computer Science and B.S. in Business dual degree at the Jerome Fisher Program in Management and Technology (Engineering + Wharton).
Most recently, I was a Machine Learning Engineer Intern at Apple, where I worked with transformers and structured state-space models (SSMs) for the Ads team. Currently, I am working on GPU Profiling and Long-Context Reasoning @ Penn Machine Learning & The Distributed Systems Lab.
Outside of class, I help lead MLR@Penn and serve as a Teaching Assistant for the Graduate Machine Learning class here (CIS 5200). I also work with the Cypher Accelerator at UPenn, focusing on incubating early-stage AI startups.
If you're interested in the collaborating, talking about research, or startups, feel free to reach out—I’d love to connect.
You can find my resume here.
Research
I'm interested in improving the efficiency of foundation models by analyzing and exploiting the structure of their internal dynamics. By cleverly employing interpretability tools and explainability techniques, I believe we can significantly reduce model resource consumption and circumvent memory requirements.
Investigating Language Model Dynamics using Meta-Tokens
Alok Shah, Khush Gupta, Keshav Ramji, Vedant Gaur
NeurIPS 2024, ATTRIB Workshop
Weak-to-Strong In-Context Optimization of Language Model Reasoning
Keshav Ramji, Vedant Gaur, Alok Shah, Khush Gupta,
NeurIPS 2024, ATTRIB Workshop
Software
The following is a non-exhaustive list of open source software projects that I started or contribute to:
discus.ai
An open source project to enhance AI interactions, Synthetic Text Generation, and LLM Fine-Tuning.
mlx-examples
MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research
kan transformer
a quick hack to redo the GPT-2 Transformer to use a KAN instead of a traditional MLP