Co-authored the Qwen Technical Report.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Co-authored CodeGemma: open code models based on Gemma.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Fast, cheap LLM serving (PagedAttention)
Co-authored vLLM: a widely used serving stack for efficient LLM inference.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Co-authored Gemma: open models based on Gemini research and technology.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Streaming + long-context stability (attention sinks)
One of the clearest researchers to follow for efficient AI systems, especially the line of work that makes large models smaller, faster, and easier to deploy without giving up too much quality.
Linear transformers via the delta rule
A high-signal researcher for the post-attention design space, especially if you care about the line of work trying to make linear-attention and Delta-rule models actually competitive in real language-model systems.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Mixture-of-experts LLMs
Co-authored Mixtral of Experts: a key MoE reference in the open-weights frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Scaled multilingual vision-language models (PaLI)
Co-authored PaLI: a key reference for scaling multilingual vision-language models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Promptable segmentation foundation models (SAM)
Co-authored Segment Anything.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open large-scale image-text data (LAION-5B)
Co-authored LAION-5B: a widely used open dataset for vision-language foundation models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Alignment via AI feedback (Constitutional AI)
Important because his work sits at a useful junction of robustness, scaling, adversarial attacks, and security-minded analysis of large models rather than staying inside one narrow alignment niche.
RWKV and efficient sequence modeling
A worthwhile long-tail page because he appears on both the original RWKV paper and Eagle/Finch and also has visible follow-on work from the same Wrocław group rather than disappearing after the first release.
Direct preference optimization (DPO)
A high-signal researcher for the probabilistic and generative-modeling side of modern AI, and an important bridge into the Stanford preference-optimization cluster that helped make DPO mainstream.
Open-source LLMs, datasets
A key open-model ecosystem builder whose work matters because it combines research, public infrastructure, and field-level coordination rather than isolated paper output alone.
Open foundation models for code (Code Llama)
Co-authored Code Llama: a key open-model reference for code generation and coding assistants.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Truthfulness and hallucination evaluation
Co-authored TruthfulQA: an influential benchmark for truthfulness and falsehood mimicry in LMs.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Broad capability evaluation (MMLU)
Co-authored MMLU: a widely used benchmark for general LLM capability across many subjects.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Bootstrapped vision-language pretraining (BLIP)
Co-authored BLIP: a high-impact recipe for unified vision-language understanding and generation.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Pathways-scale language modeling (PaLM)
Co-authored PaLM: Scaling Language Modeling with Pathways.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Synthetic instructions for alignment (Self-Instruct)
Co-authored Self-Instruct: a key reference for instruction data generation pipelines.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Vision Transformers (ViT)
Co-authored ViT: a turning point for transformers in vision.
Open-source tooling for modern NLP (Transformers library)
Co-authored the Hugging Face Transformers paper that helped standardize modern NLP workflows.
Mixture-of-experts LLMs
Co-authored Mixtral of Experts: a key MoE reference in the open-weights frontier.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Co-authored the DeepSeek-V3 Technical Report.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Hybrid Transformer–Mamba language models (Jamba)
A useful page because it gives another one of the non-model contributors on Jamba-1.5 a real place in the map; frontier-model launches depend on product and execution work, not just research authorship.
A worthwhile long-tail researcher page because it makes the data-and-evaluation layer of modern language-model work visible instead of treating frontier systems as if they were only architecture or scaling stories.
Co-authored Code Llama: a key open-model reference for code generation and coding assistants.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Alignment via AI feedback (Constitutional AI)
A high-signal page for anyone tracking whether model reasoning traces are actually trustworthy, not just fluent explanations pasted on after the fact.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Efficient MoE scaling (GLaM)
Co-authored GLaM: an influential MoE scaling reference in large language modeling.
Co-authored Whisper: robust speech recognition via large-scale weak supervision.
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.