Khush Gupta

Hello! I am a student at the University of Pennsylvania pursuing a B.S. in Computer Science and B.S. in Business dual degree at the Jerome Fisher Program in Management and Technology (Engineering + Wharton).

Most recently, I was a Machine Learning Engineer Intern at Apple, where I worked with transformers and structured state-space models (SSMs) for the Ads team. Currently, I am working on GPU Profiling and Long-Context Reasoning @ Penn Machine Learning & The Distributed Systems Lab.

Outside of class, I help lead MLR@Penn and serve as a Teaching Assistant for the Graduate Machine Learning class here (CIS 5200). I also work with the Cypher Accelerator at UPenn, focusing on incubating early-stage AI startups.

If you're interested in the collaborating, talking about research, or startups, feel free to reach out—I’d love to connect.

You can find my resume here.

Research

I'm interested in improving the efficiency of foundation models by analyzing and exploiting the structure of their internal dynamics. By cleverly employing interpretability tools and explainability techniques, I believe we can significantly reduce model resource consumption and circumvent memory requirements.

Investigating Language Model Dynamics using Meta-Tokens

Alok Shah, Khush Gupta, Keshav Ramji, Vedant Gaur

NeurIPS 2024, ATTRIB Workshop

Weak-to-Strong In-Context Optimization of Language Model Reasoning

Keshav Ramji, Vedant Gaur, Alok Shah, Khush Gupta,

NeurIPS 2024, ATTRIB Workshop

Software

The following is a non-exhaustive list of open source software projects that I started or contribute to:

discus.ai

An open source project to enhance AI interactions, Synthetic Text Generation, and LLM Fine-Tuning.

mlx-examples

MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research

kan transformer

a quick hack to redo the GPT-2 Transformer to use a KAN instead of a traditional MLP