Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Model-parallel training at scale (Megatron-LM)
Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.
Scaling laws, LLM training dynamics
One of the clearest anchors for understanding why scaling laws became such a central planning tool for frontier-model research and training strategy.
Alignment via AI feedback (Constitutional AI)
A good person to follow if you care about the practical evaluation layer at Anthropic rather than only its highest-level alignment claims.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Score-based diffusion modeling via SDEs
Co-authored the score-based diffusion SDE paper: a key theoretical view of diffusion models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-source LLMs (EleutherAI)
One of the better people to study for the thread connecting classic transfer learning in NLP to modern large-model evaluation and open-model research practice.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Co-authored PaLM: Scaling Language Modeling with Pathways.
Self-rewarding post-training
Co-authored Self-Rewarding Language Models: explores self-improvement via internal reward modeling.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored Imagen: a milestone for photorealistic text-to-image diffusion models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Gemini (multimodal foundation models)
One of the clearest multimodal researchers to track if you want to understand how frontier labs turned vision-language work from narrow benchmarks into general-purpose model capability.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Co-authored “The Llama 3 Herd of Models”.
Model-written evaluations for LM behavior
Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Foundational less for any single public paper than for shaping the infrastructure, engineering culture, and systems thinking that make frontier-model research possible.
Few-shot vision-language models (Flamingo)
Co-authored Flamingo: an influential multimodal model for few-shot vision-language tasks.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Memory-efficient distributed training (ZeRO)
Co-authored ZeRO: foundational memory optimizations for training very large models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Co-authored “The Llama 3 Herd of Models”.
A useful anchor for understanding the practical scaling-law and GPT-3 era, especially the people who turned broad intuition about scale into concrete training decisions.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code models (CodeGemma)
Co-authored CodeGemma: open code models based on Gemma.
Alignment via AI feedback (Constitutional AI)
A strong person to know for the security-first side of AI risk work, especially where practical model behavior, jailbreak removal, and broader catastrophic-risk framing start to overlap.
Reasoning + acting for LLM agents (ReAct)
Co-authored ReAct: a simple, high-leverage template for tool-using LLM agents.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open large-scale image-text data (LAION-5B)
Co-authored LAION-5B: a widely used open dataset for vision-language foundation models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Open, fully-documented language models (OLMo)
Co-authored OLMo: Accelerating the Science of Language Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open foundation models for code (Code Llama)
Co-authored Code Llama: a key open-model reference for code generation and coding assistants.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Open, fully-documented language models (OLMo)
Co-authored OLMo: Accelerating the Science of Language Models.
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Hybrid Transformer–Mamba language models (Jamba)
A useful long-tail page because it exposes another named contributor to AI21’s hybrid architecture work rather than leaving the profile buried inside a shared cohort summary.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Gemini (multimodal foundation models)
A strong researcher to study for the evolution of Google’s multimodal stack from vision-language pretraining and image generation into Gemini-era foundation models.
RWKV and efficient sequence modeling
A good RWKV page because he appears on the original paper, Eagle/Finch, and RWKV-7, which gives the profile real continuity instead of a one-off coauthor credit before he moved into a broader PhD research program.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
RWKV and efficient sequence modeling
A useful profile because his public trail is not just one RWKV coauthorship: it links the original RWKV paper to low-resource ASR work at TalTech and to ongoing hands-on RWKV experimentation in public model hubs.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Co-authored the Qwen Technical Report.
RWKV and efficient sequence modeling
Useful because he is one of the recurring names from the original RWKV paper into Eagle/Finch, which makes this a real continuation page instead of a one-off author listing.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Rotary position embeddings (RoPE)
An important architecture page to keep because he is the lead author on RoFormer, the paper that introduced rotary position embeddings; that design later became standard infrastructure across modern open-weight language models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.