Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open, fully-documented language models (OLMo)
Co-authored OLMo: Accelerating the Science of Language Models.
Co-authored the Qwen Technical Report.
Score-based diffusion modeling via SDEs
Co-authored the score-based diffusion SDE paper: a key theoretical view of diffusion models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the Qwen Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Masked autoencoders for vision (MAE)
Co-authored MAE: a strong template for scalable self-supervised vision pretraining.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Faster LLM inference via speculative decoding
A high-signal researcher for the latency and systems side of modern language models, especially where clever decoding tricks turn frontier models into usable products.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Fully Sharded Data Parallel training (FSDP)
Co-authored PyTorch FSDP: practical lessons for scaling fully-sharded training workloads.
Representation learning, AI systems
A foundational deep-learning figure whose influence spans convolutional networks, representation learning, and long-running arguments about what capable AI systems should optimize for next.
Efficient MoE scaling (GLaM)
Co-authored GLaM: an influential MoE scaling reference in large language modeling.
Text-to-text transfer and pretraining (T5)
Co-authored T5: a practical template for unified NLP training and evaluation.
Co-authored the DeepSeek-V3 Technical Report.
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored “The Llama 3 Herd of Models”.
Co-authored “The Llama 3 Herd of Models”.
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Synthetic instructions for alignment (Self-Instruct)
Co-authored Self-Instruct: a key reference for instruction data generation pipelines.
Hybrid Transformer–Mamba language models (Jamba)
One of the clearer non-model pages in the AI21 cluster because he connects data leadership, infrastructure realities, and public explanation of enterprise AI rather than only pure modeling work.
Commonsense reasoning evaluation (HellaSwag)
Co-authored HellaSwag: a widely used commonsense benchmark for language understanding.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Parameter-efficient finetuning
Useful because his work spans the older machine-comprehension era at Microsoft and the later LoRA-style adaptation line that became core infrastructure for modern finetuning.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored PaLM: Scaling Language Modeling with Pathways.
Co-authored “The Llama 3 Herd of Models”.
Co-authored the DeepSeek-V3 Technical Report.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Co-authored the DeepSeek-V3 Technical Report.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Audio-capable open models (Qwen2-Audio)
Co-authored the Qwen2-Audio Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Co-authored the DeepSeek-V3 Technical Report.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Linear transformers via the delta rule
Useful because his work links two strands that usually get discussed separately: efficient sequence-model architectures on one side and multimodal alignment work on the other.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Co-authored the DeepSeek-V3 Technical Report.
Fast, cheap LLM serving (PagedAttention)
Co-authored vLLM: a widely used serving stack for efficient LLM inference.
Co-authored the DeepSeek-V3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Synthetic instructions for alignment (Self-Instruct)
Co-authored Self-Instruct: a key reference for instruction data generation pipelines.
Hybrid Transformer–Mamba language models (Jamba)
A field-shaping figure for agentic AI and multi-agent reasoning long before the current LLM cycle, and now one of the clearest bridges between that older intellectual lineage and AI21’s frontier-model work.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Hybrid Transformer–Mamba language models (Jamba)
A high-signal researcher for understanding what large language models represent internally, especially where interpretability, robustness, and multilingual NLP meet.
Commonsense reasoning evaluation (HellaSwag)
Co-authored HellaSwag: a widely used commonsense benchmark for language understanding.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Visual instruction tuning (LLaVA)
Co-authored Visual Instruction Tuning: a widely-cited recipe for LLaVA-style multimodal assistants.
LLM-as-a-judge evaluation (MT-Bench)
Co-authored MT-Bench / LLM-as-a-judge: a widely used template for scalable multi-turn evaluation.
Efficient MoE scaling (GLaM)
Co-authored GLaM: an influential MoE scaling reference in large language modeling.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Linear transformers via the delta rule
A useful researcher to study for the line from classic neural NLP into today’s efficient large-model work, with papers that span early sentence models, character-aware language modeling, and current sequence-model efficiency research.
Deep learning, representation learning, safety
A foundational deep-learning researcher whose influence spans representation learning, institution building, and the long-running effort to connect frontier AI progress with public-interest concerns.
Co-authored Code Llama: a key open-model reference for code generation and coding assistants.
Faster LLM inference via speculative decoding
Important because his profile sits at the intersection of field-level research leadership and concrete systems work such as speculative decoding that directly changed how modern LLM inference gets deployed.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Co-authored “The Llama 3 Herd of Models”.
Efficient MoE scaling (GLaM)
Co-authored GLaM: an influential MoE scaling reference in large language modeling.
Co-authored the Qwen Technical Report.
Rotary position embeddings (RoPE)
Worth keeping because he is part of the original Zhuiyi author team behind RoFormer, which means his page ties directly into the introduction of rotary position embeddings rather than a generic long-tail language-model paper.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the Qwen2 Technical Report.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Co-authored the DeepSeek-V3 Technical Report.
Linear transformers via the delta rule
Worth surfacing because he leads the Gated Slot Attention paper, which is one of the clearer attempts to push the RWKV-adjacent efficient-sequence line toward stronger memory and retrieval behavior rather than stopping at architecture novelty.
Co-authored “The Llama 3 Herd of Models”.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Reasoning + acting for LLM agents (ReAct)
Co-authored ReAct: a simple, high-leverage template for tool-using LLM agents.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Streaming + long-context stability (attention sinks)
A high-signal researcher for the systems side of modern AI, especially where reinforcement learning, memory-efficient large-model training, and long-context inference meet.
Audio-capable open models (Qwen2-Audio)
Co-authored the Qwen2-Audio Technical Report.
Parameter-efficient finetuning
A useful profile for the seam between deep-learning theory and practical large-model methods, especially if you want someone whose work spans convergence theory, small-language-model data design, and LoRA.
Efficient MoE scaling (GLaM)
Co-authored GLaM: an influential MoE scaling reference in large language modeling.
Fully Sharded Data Parallel training (FSDP)
Co-authored PyTorch FSDP: practical lessons for scaling fully-sharded training workloads.