Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Few-shot vision-language models (Flamingo)
Co-authored Flamingo: an influential multimodal model for few-shot vision-language tasks.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Co-authored Segment Anything.
Mixture-of-experts LLMs
Co-authored Mixtral of Experts: a key MoE reference in the open-weights frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Pathways-scale language modeling (PaLM)
Co-authored PaLM: Scaling Language Modeling with Pathways.
Open large-scale image-text data (LAION-5B)
Co-authored LAION-5B: a widely used open dataset for vision-language foundation models.
Mixture-of-experts LLMs
Co-authored Mixtral of Experts: a key MoE reference in the open-weights frontier.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight foundation models (LLaMA)
A strong page to keep because he sits on both sides of a major shift in open models: he appears on Meta's LLaMA 2 paper and then on Mistral 7B and Mixtral, which makes him part of the early handoff from the first LLaMA wave into Mistral's open-weight model line.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Self-play RL with search (AlphaZero)
Co-authored AlphaZero: a canonical reference for self-play + search in RL.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Vision Transformers (ViT)
Co-authored ViT: a turning point for transformers in vision.
Mixture-of-experts LLMs
Co-authored Mixtral of Experts: a key MoE reference in the open-weights frontier.
Open-source tooling for modern NLP (Transformers library)
Co-authored the Hugging Face Transformers paper that helped standardize modern NLP workflows.
Self-play RL with search (AlphaZero)
Co-authored AlphaZero: a canonical reference for self-play + search in RL.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the DeepSeek-V3 Technical Report.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Human preference evaluation at scale (Chatbot Arena)
Co-authored Chatbot Arena: a high-impact human-preference evaluation platform for LLMs.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Efficient finetuning of quantized LLMs
A core person to know for making serious language-model finetuning and inference feasible on smaller hardware, especially through quantization and optimizer tooling that working builders actually use.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-source tooling for modern NLP (Transformers library)
Co-authored the Hugging Face Transformers paper that helped standardize modern NLP workflows.
Retrieval-augmented generation (RAG)
Co-authored RAG: a canonical reference for retrieval-augmented generation in NLP.
Text-to-image diffusion with strong language understanding (Imagen)
Co-authored Imagen: a milestone for photorealistic text-to-image diffusion models.
Teaching LMs to use tools (Toolformer)
Co-authored Toolformer: an influential approach to tool use via self-supervision.
Open-weight LLMs and training infrastructure
One of the clearest people to follow for the open-weight frontier-model line, especially where Meta’s LLaMA work flows directly into Mistral’s more aggressive efficiency push.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Gemini (multimodal foundation models)
Important for the branch of DeepMind research that connects control, world models, and modern agent behavior rather than treating them as separate eras.
Alignment via AI feedback (Constitutional AI)
A useful page for the more evaluation-heavy side of Anthropic’s alignment program, especially where constitutional methods, model-written evals, and faithfulness checks start to connect.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored Gemma 2: improving open language models at a practical size.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier-scale training infrastructure
Builds core infrastructure for xAI’s frontier models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Co-authored PaLM: Scaling Language Modeling with Pathways.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Practical RL from human feedback
Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.
Hybrid Transformer–Mamba language models (Jamba)
Useful because it puts a name and a clear role on one of the engineers working at the boundary between research and implementation for AI21’s hybrid-model stack.
Hybrid Transformer–Mamba language models (Jamba)
A solid head-page upgrade because it turns another thin Jamba coauthor page into a real profile tied to pre- and post-training, the part of the stack where hybrid-model behavior gets tuned into something shippable.
One of the clearest researchers to study for the GPT-3 era, especially around few-shot learning, scaling behavior, and what larger language models started making possible in practice.
Alignment via AI feedback (Constitutional AI)
Worth knowing because his paper trail hits several of the most useful early Anthropic threads at once: induction heads, calibration, repeated-data scaling, and the practical behavior of post-trained assistants.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Generalist agents (Gato)
Co-authored Gato: a key reference for generalist, multi-task agents.
Alignment via AI feedback (Constitutional AI)
A good profile for the less public part of frontier-model progress, where pretraining quality, evaluation loops, and systems choices do a lot of the real work.
Compute-optimal scaling for LLM training
A useful page for the research layer behind DeepMind’s frontier-language-model program, especially across Gopher, Chinchilla, and Gemini.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Hybrid Transformer–Mamba language models (Jamba)
A sensible page to keep because his name appears directly on the original Jamba paper, giving users another concrete entry point into the people who built AI21’s hybrid architecture.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Co-authored “The Llama 3 Herd of Models”.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-source LLMs (EleutherAI)
Worth knowing as one of the early open-data contributors around the EleutherAI orbit, with a profile that mixes work on The Pile with a long tail of small, public NLP and machine-learning experiments.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Compute-optimal scaling for LLM training
A useful profile for the core DeepMind contributor layer behind Chinchilla, Gopher, and Gemini rather than only the more public faces of those systems.
Deep learning infrastructure (PyTorch)
Co-authored the PyTorch paper describing the imperative-style deep learning framework.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Efficient sequence models + attention kernels
One of the clearest researchers to follow for efficient sequence-model systems, especially the line of work that made frontier training and inference materially faster rather than merely cleaner on paper.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Alignment via AI feedback (Constitutional AI)
A useful profile for the systems side of alignment work, especially where infrastructure choices and evaluation throughput determine what a lab can actually test.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Open, fully-documented language models (OLMo)
Co-authored OLMo: Accelerating the Science of Language Models.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.