Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open, fully-documented language models (OLMo)
Co-authored OLMo: Accelerating the Science of Language Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Text-to-image diffusion with strong language understanding (Imagen)
Co-authored Imagen: a milestone for photorealistic text-to-image diffusion models.
Mixture-of-experts LLMs
Co-authored Mixtral of Experts: a key MoE reference in the open-weights frontier.
Trillion-parameter scaling with sparsity (Switch Transformers)
Co-authored Switch Transformers: a core reference for practical MoE scaling.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open, fully-documented language models (OLMo)
Co-authored OLMo: Accelerating the Science of Language Models.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Fast, cheap LLM serving (PagedAttention)
Co-authored vLLM: a widely used serving stack for efficient LLM inference.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the DeepSeek-V3 Technical Report.
Pathways-scale language modeling (PaLM)
Co-authored PaLM: Scaling Language Modeling with Pathways.
Open-weight foundation models (LLaMA)
Worth upgrading because he is present across multiple major generations of the LLaMA family, which makes his page more useful as a stable thread through Meta's open-model program than as a one-paper author stub.
Co-authored PaLI: a key reference for scaling multilingual vision-language models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Self-rewarding post-training
Co-authored Self-Rewarding Language Models: explores self-improvement via internal reward modeling.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored Gemma 2: improving open language models at a practical size.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
RWKV and efficient sequence modeling
Worth keeping because it connects an early RWKV byline to a much more visible later research program in agentic AI, biomedical discovery, and code-focused evaluation, which makes the page far more useful than a one-paper ghost profile.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored PaLI: a key reference for scaling multilingual vision-language models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Vision Transformers (ViT)
Co-authored ViT: a turning point for transformers in vision.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Co-authored “The Llama 3 Herd of Models”.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored “The Llama 3 Herd of Models”.
RWKV and efficient sequence modeling
Co-authored RWKV: Reinventing RNNs for the Transformer Era.
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Co-authored the DeepSeek-V3 Technical Report.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Co-authored the DeepSeek-V3 Technical Report.
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Masked autoencoders for vision (MAE)
Co-authored MAE: a strong template for scalable self-supervised vision pretraining.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the Qwen2 Technical Report.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Co-authored the InstructGPT paper that set the standard instruction-tuning + RLHF recipe.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Pathways-scale language modeling (PaLM)
Co-authored PaLM: Scaling Language Modeling with Pathways.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
RWKV and efficient sequence modeling
A useful page because his public trail moved well beyond the original RWKV paper into reasoning, in-context learning, and decoder-only model design, which makes him one of the stronger later-follow-up names from that original author list.
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Co-authored the DeepSeek-V3 Technical Report.
Co-authored the DeepSeek-V3 Technical Report.
Open-source tooling for modern NLP (Transformers library)
Co-authored the Hugging Face Transformers paper that helped standardize modern NLP workflows.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Few-shot vision-language models (Flamingo)
Co-authored Flamingo: an influential multimodal model for few-shot vision-language tasks.