Topic

Multimodal

People building systems that connect language with images, audio, video, and embodied perception.

Start with Alec Radford, Demis Hassabis, Ashish Vaswani if you want the clearest first pass through multimodal as it shows up in practice.

This area overlaps heavily with Google DeepMind, Google, OpenAI. Common institution signals include Google DeepMind, Google, DeepMind. Recurring starting points include Gemini: A Family of Highly Capable Multimodal Models, Gemma (docs).

Snapshot

Researchers

1,373

Related labs

Starting points

Developed dossiers

Angles To Understand

Useful entry points pulled from the strongest linked researcher dossiers.

Generative pretraining

Via Alec Radford

Deep reinforcement learning

Via Demis Hassabis

The Transformer architecture

Via Ashish Vaswani

Large-scale ML systems

Via Jeff Dean

Transformer-era architecture work

Via Noam Shazeer

Chip placement with deep reinforcement learning

Via Anna Goldie

Institution Signals

Frequent institutions showing up across profiles in this area.

Google DeepMind (37)Google (36)DeepMind (9)Bar-Ilan University (1)Birla Institute of Technology and Science - Hyderabad Campus (1)Birla Institute of Technology and Science, Pilani - Goa Campus (1)Chang'an University (1)China Agricultural University (1)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

Gemini: A Family of Highly Capable Multimodal Models

1128

Linked by 1128 profiles in this topic

Gemma (docs)

113

Linked by 113 profiles in this topic

Gemma 3 Technical Report

113

Linked by 113 profiles in this topic

Gemini: A Family of Highly Capable Multimodal Models

Linked by 27 profiles in this topic

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Linked by 21 profiles in this topic

Flamingo: a Visual Language Model for Few-Shot Learning

Linked by 20 profiles in this topic

A Generalist Agent

Linked by 18 profiles in this topic

Training Compute-Optimal Large Language Models

Linked by 18 profiles in this topic

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

Gemini: A Family of Highly Capable Multimodal Models

1128

Used across 1128 researcher pages in this topic

Gemma (docs)

113

Used across 113 researcher pages in this topic

Gemma 3 Technical Report

113

Used across 113 researcher pages in this topic

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Used across 21 researcher pages in this topic

Flamingo: a Visual Language Model for Few-Shot Learning

Used across 20 researcher pages in this topic

A Generalist Agent

Used across 18 researcher pages in this topic

Researchers To Start With

A stronger first pass through multimodal, ranked by profile depth, evidence, and editorial importance.

Alec Radford

Generative pretraining, multimodal models

5 sources

Important because several of the modern foundation-model playbooks trace back to work he helped drive, especially around generative pretraining and multimodal transfer.

OpenAI Multimodal Diffusion & Generative Media

Start HereLanguage Models are Unsupervised Multitask Learners

Demis Hassabis

Deep RL, scientific AI, leadership

4 sources

Important both as a researcher and as an institution builder whose long-running agenda tied deep RL, multimodal systems, and scientific AI into one coherent lab strategy.

Google DeepMind Multimodal Systems & Infrastructure

Start HereGoogle DeepMind

Ashish Vaswani

Transformers

3 sources

A foundational figure in modern sequence modeling whose work on the Transformer changed the technical direction of language and multimodal systems.

Multimodal Systems & Infrastructure

Start HereAttention Is All You Need

Jeff Dean

ML systems, large-scale infrastructure

4 sources

Foundational less for any single public paper than for shaping the infrastructure, engineering culture, and systems thinking that make frontier-model research possible.

Google Multimodal Systems & Infrastructure

Start HereMapReduce: Simplified Data Processing on Large Clusters

Noam Shazeer

Transformers, Mixture-of-Experts, scaling

3 sources

One of the most important architecture-level thinkers in modern AI, with influence spanning Transformers, efficient scaling, and mixture-of-experts systems.

Multimodal Systems & Infrastructure

Start HereAttention Is All You Need

Anna Goldie

Alignment via AI feedback (Constitutional AI)

5 sources

A strong person to follow for the point where machine learning research starts shaping the compute stack itself, especially in chip placement and systems-aware optimization.

Anthropic Multimodal Post-Training & Alignment

Start HereHow AlphaChip transformed computer chip design

Andrew M. Dai

Gemini (multimodal foundation models)

4 sources

A good researcher to follow for the infrastructure side of frontier language models, especially mixture-of-experts scaling, instruction tuning, and the data systems that make very large models usable.

Multimodal Post-Training & Alignment

Start HereMore Efficient In-Context Learning with GLaM

Angeliki Lazaridou

Gemini (multimodal foundation models)

4 sources

A high-signal researcher for grounded language and retrieval-heavy systems, especially if you want to understand how language models stay useful as the world changes around them.

Multimodal Evaluation & Benchmarks

Start HereAngeliki Lazaridou

David Silver

Deep RL, planning, games

3 sources

A central figure in modern reinforcement learning whose work turned deep RL from an exciting idea into a line of systems that repeatedly reset expectations.

Multimodal

Angles To Understand

Institution Signals

Canonical Starting Points

Frequently Linked Sources

Researchers To Start With

All Researchers In This Topic