Hybrid Transformer–Mamba language models (Jamba)
A solid long-tail profile for the engineering core of AI21 because it captures the deep-learning practitioners who help translate research releases into tuned, operational model systems.
Hybrid Transformer–Mamba language models (Jamba)
A solid long-tail profile for the engineering core of AI21 because it captures the deep-learning practitioners who help translate research releases into tuned, operational model systems.
Small-model reasoning and capability (Phi-4)
Co-authored the Phi-4 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Human preference evaluation at scale (Chatbot Arena)
Co-authored Chatbot Arena: a high-impact human-preference evaluation platform for LLMs.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Text-to-text transfer and pretraining (T5)
Co-authored T5: a practical template for unified NLP training and evaluation.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Open-source LLMs (EleutherAI)
Useful for the applied side of open-model work because his profile bridges EleutherAI-era public model training and production radiology AI inside a real clinical-imaging company.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Alignment via AI feedback (Constitutional AI)
A useful profile for the segment of Anthropic that turns alignment concerns into concrete evals, behavior probes, and red-team-style measurement.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Pathways-scale language modeling (PaLM)
Co-authored PaLM: Scaling Language Modeling with Pathways.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Holistic evaluation of language models (HELM)
Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Streaming + long-context stability (attention sinks)
A strong person to study for the modern NLP stack because his work spans denoising pretraining, retrieval-augmented generation, and later long-context inference tricks rather than only one phase of the language-model pipeline.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Text-to-image generation (DALL·E)
Co-authored the original DALL·E paper: zero-shot text-to-image generation.
Few-shot vision-language models (Flamingo)
Co-authored Flamingo: an influential multimodal model for few-shot vision-language tasks.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Universal jailbreak-style attacks on aligned LMs
Co-authored universal and transferable adversarial attacks on aligned language models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Practical RL from human feedback
Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Co-authored the Gemma 3 Technical Report.
Co-authored “The Llama 3 Herd of Models”.
Fully Sharded Data Parallel training (FSDP)
Co-authored PyTorch FSDP: practical lessons for scaling fully-sharded training workloads.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Bidirectional transformer pretraining (BERT)
Co-authored BERT: a turning point for transfer learning in NLP.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-weight LLMs (Qwen2)
Co-authored the Qwen2 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Open-model frontier reports (DeepSeek-V3)
Co-authored the DeepSeek-V3 Technical Report.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Large-scale transformer inference (DeepSpeed)
Co-authored DeepSpeed Inference: practical inference optimizations for serving large transformer models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Challenging BIG-bench tasks (BBH)
Co-authored BBH: a popular set of hard reasoning tasks used for evaluating chain-of-thought prompting.
Model-written evaluations for LM behavior
Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open large-scale image-text data (LAION-5B)
Co-authored LAION-5B: a widely used open dataset for vision-language foundation models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Code-focused LLMs and evaluation (Codex)
Co-authored the Codex evaluation paper: an early anchor for code LLM capability measurement.
Text-to-image diffusion with strong language understanding (Imagen)
Co-authored Imagen: a milestone for photorealistic text-to-image diffusion models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Model-parallel training at scale (Megatron-LM)
Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models (Gemma 2)
Co-authored Gemma 2: improving open language models at a practical size.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Small, capable models (Phi-3)
Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).
Scaled multilingual vision-language models (PaLI)
Co-authored PaLI: a key reference for scaling multilingual vision-language models.
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Hybrid Transformer–Mamba language models (Jamba)
Helpful because it adds contributor-level detail to the original Jamba release, which is exactly the kind of context these long-tail pages need to be useful rather than decorative.
Open-source tooling for modern NLP (Transformers library)
Co-authored the Hugging Face Transformers paper that helped standardize modern NLP workflows.
Frontier model development (GPT-4)
Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open language models from Google (Gemma)
Co-authored Gemma: open models based on Gemini research and technology.
Vision Transformers (ViT)
Co-authored ViT: a turning point for transformers in vision.
Model-parallel training at scale (Megatron-LM)
Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open code LLMs (StarCoder)
Co-authored StarCoder: a foundational open code model effort (BigCode).
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Open-weight frontier models (Llama 3)
Co-authored “The Llama 3 Herd of Models”.
Multimodal frontier models (Gemini)
Co-authored Gemini: A Family of Highly Capable Multimodal Models.
Fully Sharded Data Parallel training (FSDP)
Co-authored PyTorch FSDP: practical lessons for scaling fully-sharded training workloads.
Co-authored the Qwen2 Technical Report.
Hybrid Transformer–Mamba language models (Jamba)
A useful page because evaluation work is easy to flatten into leaderboard noise, and her profile anchors the people inside AI21 who were responsible for turning Jamba performance claims into something measurable.
Open multimodal models (Gemma 3)
Co-authored the Gemma 3 Technical Report.