Gabriel Ryan

I am a Senior Research Scientist at Microsoft, where I work in the CodeAI organization on AI-driven software development. My work spans model training and harness design, including completion, context retrieval, and agentic code review now shipping in GitHub Copilot to millions of developers. I’m interested in post-training and multi-agent methods for building more capable agents across software engineering and related domains that require complex reasoning and long-term planning.

Current Work at Microsoft CodeAI

Copilot Subagents & Context Management

Training custom models for Copilot subagents, context compaction, and memory generation — enabling agentic workflows to delegate focused sub-tasks and operate over long horizons. Custom subagent models have reduced costs by 15% in internal deployments.

Context Retrieval Embeddings

Trained the embedding models used to index GitHub for Copilot's context retrieval, reducing indexing cost by 2× and index size by 3× while improving retrieval quality. [GitHub blog]

Code Completion Models

Trained the code completion models that power GitHub Copilot in editors used by millions of developers, increasing acceptance rate by 5% while reducing latency by 10%.

Copilot Code Review

Optimizing agent design and evaluations for more accurate and useful code review.

Background

I received my PhD from Columbia University in 2023, advised by Prof. Suman Jana, where my PhD research combined program analysis and machine learning, developing neural methods for program verification, test generation, and vulnerability detection. During my PhD I interned at AWS AI Labs (2023) and the Microsoft Research RiSE Group (2021), resulting in publications at FSE ‘24 and ICSE ‘22 (Distinguished Paper Award).

Awards: ICSE 2022 Distinguished Paper Award, OSDI 2021 Best Paper Award, and National Defense Science and Engineering Graduate Fellowship (NDSEG).

Selected Publications

OOPSLA · 2024

Accurate Data Race Prediction in the Linux Kernel through Sparse Fourier Learning

Gabriel Ryan, Burcu Cetin, Yonghwan Lim, Suman Jana

Sparse Fourier learning predicts feasible kernel race traces, finding 44 more races and five new bugs.

paper code

FSE · 2024

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

Gabriel Ryan, Siddhartha Jain, Mingyue Shang, Shiqi Wang, Xiaofei Ma, Murali Krishna Ramanathan, Baishakhi Ray

Static-analysis-guided prompting that drives LLMs to reason symbolically about execution paths, doubling coverage of generated regression tests.

paper

Oakland S&P · 2023

Precise Detection of Kernel Data Races with Probabilistic Lockset Analysis

Gabriel Ryan, Abhishek Shah, Dongdong She, Suman Jana

Probabilistic lockset analysis finds kernel races 3× faster, uncovering 183 races, including 102 harmful ones.

paper

ICSE · 2022 (ACM Sigsoft Distinguished Paper Award)

TOGA: A Neural Method for Test Oracle Generation

Elizabeth Dinella*, Gabriel Ryan*, Todd Mytkowicz, Shuvendu Lahiri

A neural transformer paired with a test-oracle grammar that automatically generates bug-finding assertions, with a 170% improvement in bug detection over prior systems.

paper code

USENIX Security · 2021

Fine Grained Dataflow Tracking with Proximal Gradients

Gabriel Ryan, Abhishek Shah, Dongdong She, Koustubha Bhat, and Suman Jana

Proximal gradients improve dataflow-tracking F1 by 20% on average, finding 22 bugs with under 5% overhead.

paper code

* denotes equal contribution.

See the full list of publications for all papers.