I am a Senior Research Scientist at Microsoft, where I work in the CodeAI organization training specialized models behind GitHub Copilot. My work spans the underlying models that power AI-driven software development — code completion, context retrieval, agentic coding workflows, and automated code review.

Current Work at Microsoft CodeAI

Copilot Subagents & Context Management
Training custom models for Copilot subagents, context compaction, and memory generation — enabling agentic workflows to delegate focused sub-tasks and operate over long horizons.
Context Retrieval Embeddings
Trained the embedding models used to index GitHub for Copilot's context retrieval. [GitHub blog]
Code Completion Models
Trained the code completion models that power GitHub Copilot in editors used by millions of developers.
Copilot Code Review
Trained code quality evaluation models used in Copilot Code Review, helping the system identify substantive issues in pull request changes.

Background

I received my PhD from Columbia University in 2023, advised by Prof. Suman Jana, where I worked on AI-driven security and software development. During my PhD I interned at AWS AI Labs (2023) and Microsoft Research RiSE (2021), resulting in publications at FSE ‘24 and ICSE ‘22 (Distinguished Paper Award).

Selected Publications

ICML · 2026
SemRep: Generative Code Representation Learning with Code Transformations
Generative learning of code representations using semantics-preserving code transformations as a self-supervised signal.
paper
NeurIPS DL4C Workshop · 2025
DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models
A realistic, developer-informed benchmark for evaluating code generation models on tasks that reflect real-world software development.
paper
FSE · 2024
Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM
Static-analysis-guided prompting that drives LLMs to reason symbolically about execution paths, doubling coverage of generated regression tests.
paper
ICSE · 2022 (ACM Sigsoft Distinguished Paper Award)
TOGA: A Neural Method for Test Oracle Generation
A neural transformer paired with a test-oracle grammar that automatically generates bug-finding assertions, with a 170% improvement in bug detection over prior systems.
paper code
OSDI · 2021 (OSDI Jay Lepreau Best Paper Award)
DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols
Data-driven inference of inductive invariants for verifying safety properties of distributed protocols.
paper code

See the full list of publications for all papers.