Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Hybrid Transformer–Mamba language models (Jamba)
A useful page because his public trail is broader than the generic Jamba author stub: it runs from earlier language grounding and text-similarity work into Jamba-1.5 and later multimodal hallucination mitigation.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Alignment via AI feedback (Constitutional AI)
High-signal for the seam between machine learning and hardware systems, especially where learned optimization methods begin affecting the actual compute infrastructure underneath frontier models.
Linear transformers via the delta rule
A good page to have because he is one of the recurring names in the recent MIT line of work on linear-attention alternatives, especially where hardware-efficient training meets practical long-context sequence modeling.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Human preference evaluation at scale (Chatbot Arena)
Co-authored Chatbot Arena: a high-impact human-preference evaluation platform for LLMs.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
Open-weight foundation models (LLaMA)
Important for the code-model side of the open-weight ecosystem, especially where general-purpose LLaMA work turns into stronger coding systems.
Hybrid Transformer–Mamba language models (Jamba)
One of the higher-signal people to know in the hybrid-LLM line because he sits at the point where AI21’s research architecture, long-context systems work, and real product deployment meet.
Hybrid Transformer–Mamba language models (Jamba)
Worth tracking on the architecture side of AI21 because his profile sits where infrastructure leadership, hybrid-model design, and the mechanics of shipping long-context systems overlap.
Trillion-parameter scaling with sparsity (Switch Transformers)
Co-authored Switch Transformers: a core reference for practical MoE scaling.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
RWKV and efficient sequence modeling
A strong page to keep because he links the early RWKV work to the later Wrocław-centered PLLuM effort, which makes him one of the clearer continuity threads between open sequence models and Polish-language LLM development.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Scaled multilingual vision-language models (PaLI)
Co-authored PaLI: a key reference for scaling multilingual vision-language models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Co-authored the DeepSeek-V3 Technical Report.
Streaming + long-context stability (attention sinks)
A strong researcher to follow for efficient and long-context LLM systems, especially where inference tricks and memory management make large models practical to run.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Hybrid Transformer–Mamba language models (Jamba)
A better page than the default Jamba stub because it gives one of the quieter AI21 researchers a real place in the company’s hybrid-model program instead of treating him as just another author in a long list.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Pathways-scale language modeling (PaLM)
Co-authored PaLM: Scaling Language Modeling with Pathways.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
A strong profile for the engineering and product layer underneath early Anthropic alignment work, especially where human-feedback collection and evaluation infrastructure had to become real systems.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Neural radiance fields (NeRF)
Co-authored NeRF: a foundational paper for neural rendering and 3D scene representations.
Score-based diffusion modeling via SDEs
Co-authored the score-based diffusion SDE paper: a key theoretical view of diffusion models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-source LLMs (EleutherAI)
Important for the bridge between early open-model scaling work and later frontier closed-model systems, especially around architecture and training-stack choices that ended up mattering at both ends of the field.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Large-scale language modeling (GPT-3)
Co-authored GPT-3: Language Models are Few-Shot Learners.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Fully Sharded Data Parallel training (FSDP)
Co-authored PyTorch FSDP: practical lessons for scaling fully-sharded training workloads.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Co-authored “The Llama 3 Herd of Models”.
Co-authored Gemma: open models based on Gemini research and technology.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Co-authored CodeGemma: open code models based on Gemma.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Co-authored “The Llama 3 Herd of Models”.
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Open-weight LLMs (Qwen)
Co-authored the Qwen Technical Report.
Latent diffusion for high-res generation
Co-authored Latent Diffusion Models: the foundation behind Stable Diffusion-style pipelines.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Mixture-of-experts LLMs
Co-authored Mixtral of Experts: a key MoE reference in the open-weights frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
RWKV and efficient sequence modeling
Worth tracking if you care about alternatives to the standard transformer playbook, especially the line of work trying to keep strong language-model performance while making inference and memory use much cheaper.
Rotary position embeddings (RoPE)
A better page than the generated stub because it places him in the original RoFormer team at Zhuiyi, tied to the positional-embedding design that became standard in later open-weight model families.
Co-authored Gemma 2: improving open language models at a practical size.
Co-authored the Qwen2 Technical Report.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Co-authored the DeepSeek-V3 Technical Report.
Compute-optimal scaling for LLM training
A useful page for the less public but still important DeepMind contributors behind frontier language-model scaling and Gemini.
RWKV and efficient sequence modeling
Important within the RWKV cluster because his name carries from the original RWKV paper into Gated Slot Attention, making him part of the small set of contributors who reappear as this sequence-model thread evolves.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored the Qwen Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Adversarial robustness and feature learning
Co-authored “Adversarial Examples Are Not Bugs, They Are Features”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Pathways-scale language modeling (PaLM)
Co-authored PaLM: Scaling Language Modeling with Pathways.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight chat and foundation models (Llama 2)
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Chain-of-thought prompting and reasoning
Co-authored the chain-of-thought prompting paper; foundational for modern reasoning prompting.
Model-written evaluations for LM behavior
Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Instruction tuning for better zero-shot behavior
Co-authored FLAN: a practical anchor for instruction tuning and zero-shot transfer.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.