Topic

Systems & Infrastructure

Researchers who make large-scale training and inference practical through architecture, kernels, sharding, and serving work.

Start with Demis Hassabis, Ashish Vaswani, Stella Biderman if you want the clearest first pass through systems & infrastructure as it shows up in practice.

This area overlaps heavily with AI21, Mistral, Anthropic. Common institution signals include AI21 Labs, Meta, Anthropic. Recurring starting points include Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale.

Snapshot

Researchers

417

Related labs

Starting points

Developed dossiers

Angles To Understand

Useful entry points pulled from the strongest linked researcher dossiers.

Deep reinforcement learning

Via Demis Hassabis

The Transformer architecture

Via Ashish Vaswani

Open-model infrastructure

Via Stella Biderman

Large-scale ML systems

Via Jeff Dean

Residual networks and visual backbones

Via Kaiming He

Transformer-era architecture work

Via Noam Shazeer

Institution Signals

Frequent institutions showing up across profiles in this area.

AI21 Labs (56)Meta (17)Anthropic (16)Google (16)Google DeepMind (16)OpenAI (8)Microsoft (7)Mistral AI (7)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

122

Linked by 122 profiles in this topic

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Linked by 61 profiles in this topic

Jamba: A Hybrid Transformer-Mamba Language Model

Linked by 61 profiles in this topic

AI21 Jamba Large 1.5 model card

Linked by 42 profiles in this topic

RWKV: Reinventing RNNs for the Transformer Era

Linked by 34 profiles in this topic

RWKV (project)

Linked by 32 profiles in this topic

Mixtral of Experts

Linked by 23 profiles in this topic

Constitutional AI: Harmlessness from AI Feedback

Linked by 18 profiles in this topic

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

122

Used across 122 researcher pages in this topic

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Used across 61 researcher pages in this topic

Jamba: A Hybrid Transformer-Mamba Language Model

Used across 61 researcher pages in this topic

RWKV: Reinventing RNNs for the Transformer Era

Used across 34 researcher pages in this topic

RWKV (project)

Used across 32 researcher pages in this topic

Mixtral of Experts

Used across 20 researcher pages in this topic

Researchers To Start With

A stronger first pass through systems & infrastructure, ranked by profile depth, evidence, and editorial importance.

Demis Hassabis

Deep RL, scientific AI, leadership

4 sources

Important both as a researcher and as an institution builder whose long-running agenda tied deep RL, multimodal systems, and scientific AI into one coherent lab strategy.

Google DeepMind Multimodal Systems & Infrastructure

Start HereGoogle DeepMind

Ashish Vaswani

Transformers

3 sources

A foundational figure in modern sequence modeling whose work on the Transformer changed the technical direction of language and multimodal systems.

Multimodal Systems & Infrastructure

Start HereAttention Is All You Need

Stella Biderman

Open-source LLMs, datasets

5 sources

A key open-model ecosystem builder whose work matters because it combines research, public infrastructure, and field-level coordination rather than isolated paper output alone.

EleutherAI Open Models Systems & Infrastructure

Start HereThe Pile: An 800GB Dataset of Diverse Text for Language Modeling

Jeff Dean

ML systems, large-scale infrastructure

4 sources

Foundational less for any single public paper than for shaping the infrastructure, engineering culture, and systems thinking that make frontier-model research possible.

Google Multimodal Systems & Infrastructure

Start HereMapReduce: Simplified Data Processing on Large Clusters

Kaiming He

Computer vision, representation learning

3 sources

A foundational computer-vision researcher whose work on representations and architectures still shapes modern pretraining and perception systems.

Meta Systems & Infrastructure Vision & Robotics

Start HereKaiming He

Noam Shazeer

Transformers, Mixture-of-Experts, scaling

3 sources

One of the most important architecture-level thinkers in modern AI, with influence spanning Transformers, efficient scaling, and mixture-of-experts systems.

Multimodal Systems & Infrastructure

Start HereAttention Is All You Need

Azalia Mirhoseini

Alignment via AI feedback (Constitutional AI)

5 sources

High-signal for the seam between machine learning and hardware systems, especially where learned optimization methods begin affecting the actual compute infrastructure underneath frontier models.

Anthropic Post-Training & Alignment Systems & Infrastructure

Start HereChip Design with Deep Reinforcement Learning

Anna Goldie

Alignment via AI feedback (Constitutional AI)

5 sources

A strong person to follow for the point where machine learning research starts shaping the compute stack itself, especially in chip placement and systems-aware optimization.

Anthropic Multimodal Post-Training & Alignment

Start HereHow AlphaChip transformed computer chip design

Andrew M. Dai

Gemini (multimodal foundation models)

4 sources

A good researcher to follow for the infrastructure side of frontier language models, especially mixture-of-experts scaling, instruction tuning, and the data systems that make very large models usable.

Multimodal Post-Training & Alignment

Start HereMore Efficient In-Context Learning with GLaM