Robust speech recognition (Whisper)
Co-authored Whisper: robust speech recognition via large-scale weak supervision.
Robust speech recognition (Whisper)
Co-authored Whisper: robust speech recognition via large-scale weak supervision.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open large-scale image-text data (LAION-5B)
Co-authored LAION-5B: a widely used open dataset for vision-language foundation models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Large-scale language modeling (GPT-3)
Co-authored GPT-3: Language Models are Few-Shot Learners.
NLP, language understanding
A foundational NLP researcher whose work matters both for classic representation learning and for institution-building around the modern Stanford NLP ecosystem.
Large-scale language modeling (GPT-3)
Co-authored GPT-3: Language Models are Few-Shot Learners.
Model-written evaluations for LM behavior
Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Fast, memory-efficient attention
Important because he sits at a productive seam between machine learning, data systems, and model infrastructure, with work that ranges from weak supervision to some of the most important efficiency breakthroughs in modern training stacks.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Visual instruction tuning (LLaVA)
Co-authored Visual Instruction Tuning: a widely-cited recipe for LLaVA-style multimodal assistants.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the Gemma 3 Technical Report.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Efficient MoE scaling (GLaM)
Co-authored GLaM: an influential MoE scaling reference in large language modeling.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Hybrid Transformer–Mamba language models (Jamba)
A distinctive page in this AI21 cluster because she brings a linguistics and human-evaluation angle to model work, especially around user interaction, multilingual language behavior, and how LLM performance gets tested in practice.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-source tooling for modern NLP (Transformers library)
Co-authored the Hugging Face Transformers paper that helped standardize modern NLP workflows.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open large-scale image-text data (LAION-5B)
Co-authored LAION-5B: a widely used open dataset for vision-language foundation models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Open-source tooling for modern NLP (Transformers library)
Co-authored the Hugging Face Transformers paper that helped standardize modern NLP workflows.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Fast, cheap LLM serving (PagedAttention)
Co-authored vLLM: a widely used serving stack for efficient LLM inference.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Text-to-text transfer and pretraining (T5)
Co-authored T5: a practical template for unified NLP training and evaluation.
Broad capability evaluation (MMLU)
Co-authored MMLU: a widely used benchmark for general LLM capability across many subjects.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open models, governance, communication
An important bridge figure between open-weight language-model communities and the modern alignment debate, especially when you want to understand how frontier capability, openness, and control arguments collide in practice.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Model-written evaluations for LM behavior
Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Open, fully-documented language models (OLMo)
Co-authored OLMo: Accelerating the Science of Language Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Deep Q-Networks (DQN)
Co-authored the original DQN preprint: a core reference for deep reinforcement learning.
Human preference evaluation at scale (Chatbot Arena)
Co-authored Chatbot Arena: a high-impact human-preference evaluation platform for LLMs.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Chain-of-thought prompting and reasoning
Co-authored the chain-of-thought prompting paper; foundational for modern reasoning prompting.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the DeepSeek-V3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Broad capability evaluation (MMLU)
Co-authored MMLU: a widely used benchmark for general LLM capability across many subjects.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Hybrid Transformer–Mamba language models (Jamba)
A strong page for the applied side of frontier AI because his work sits closer to deployment and platform architecture than pure modeling, which makes him useful for understanding how AI21 turned model research into products other teams could actually build on.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Hybrid Transformer–Mamba language models (Jamba)
A stronger page than the default Jamba byline because his work clearly predates it: he has earlier papers on active learning and implicit bias in deep networks before showing up on Jamba-1.5.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Hybrid Transformer–Mamba language models (Jamba)
A worthwhile profile because he is tied directly to the main public Jamba releases, which makes him one of the clearer names behind the hybrid Transformer-Mamba model line rather than just another long author list entry.
Synthetic instructions for alignment (Self-Instruct)
Co-authored Self-Instruct: a key reference for instruction data generation pipelines.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Co-authored “The Llama 3 Herd of Models”.
Large-scale language modeling (GPT-3)
Co-authored GPT-3: Language Models are Few-Shot Learners.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Scaled multilingual vision-language models (PaLI)
Co-authored PaLI: a key reference for scaling multilingual vision-language models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Fast, memory-efficient attention
One of the more useful people to follow for the systems side of modern model building, especially where better kernels and sequence methods translate directly into frontier-model training and inference speed.
Model-written evaluations for LM behavior
Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).