Generative pretraining, multimodal models
Important because several of the modern foundation-model playbooks trace back to work he helped drive, especially around generative pretraining and multimodal transfer.
Topic
People building systems that connect language with images, audio, video, and embodied perception.
Start with Alec Radford, Demis Hassabis, Ashish Vaswani if you want the clearest first pass through multimodal as it shows up in practice.
This area overlaps heavily with Google DeepMind, Google, OpenAI. Common institution signals include Google DeepMind, Google, DeepMind. Recurring starting points include Gemini: A Family of Highly Capable Multimodal Models, Gemma (docs).
Snapshot
Researchers
1,373
Related labs
7
Starting points
8
Developed dossiers
54
Useful entry points pulled from the strongest linked researcher dossiers.
Frequent institutions showing up across profiles in this area.
Papers, project pages, and repositories that recur across this part of the field.
Gemini: A Family of Highly Capable Multimodal Models
1128Linked by 1128 profiles in this topic
Gemma (docs)
113Linked by 113 profiles in this topic
Gemma 3 Technical Report
113Linked by 113 profiles in this topic
Gemini: A Family of Highly Capable Multimodal Models
27Linked by 27 profiles in this topic
PaLI: A Jointly-Scaled Multilingual Language-Image Model
21Linked by 21 profiles in this topic
Flamingo: a Visual Language Model for Few-Shot Learning
20Linked by 20 profiles in this topic
A Generalist Agent
18Linked by 18 profiles in this topic
Training Compute-Optimal Large Language Models
18Linked by 18 profiles in this topic
Source clusters that repeatedly anchor researchers in this area.
Gemini: A Family of Highly Capable Multimodal Models
1128Used across 1128 researcher pages in this topic
Gemma (docs)
113Used across 113 researcher pages in this topic
Gemma 3 Technical Report
113Used across 113 researcher pages in this topic
PaLI: A Jointly-Scaled Multilingual Language-Image Model
21Used across 21 researcher pages in this topic
Flamingo: a Visual Language Model for Few-Shot Learning
20Used across 20 researcher pages in this topic
A Generalist Agent
18Used across 18 researcher pages in this topic
A stronger first pass through multimodal, ranked by profile depth, evidence, and editorial importance.
Generative pretraining, multimodal models
Important because several of the modern foundation-model playbooks trace back to work he helped drive, especially around generative pretraining and multimodal transfer.
Deep RL, scientific AI, leadership
Important both as a researcher and as an institution builder whose long-running agenda tied deep RL, multimodal systems, and scientific AI into one coherent lab strategy.
Transformers
A foundational figure in modern sequence modeling whose work on the Transformer changed the technical direction of language and multimodal systems.
ML systems, large-scale infrastructure
Foundational less for any single public paper than for shaping the infrastructure, engineering culture, and systems thinking that make frontier-model research possible.
Transformers, Mixture-of-Experts, scaling
One of the most important architecture-level thinkers in modern AI, with influence spanning Transformers, efficient scaling, and mixture-of-experts systems.
Alignment via AI feedback (Constitutional AI)
A strong person to follow for the point where machine learning research starts shaping the compute stack itself, especially in chip placement and systems-aware optimization.
Gemini (multimodal foundation models)
A good researcher to follow for the infrastructure side of frontier language models, especially mixture-of-experts scaling, instruction tuning, and the data systems that make very large models usable.
Gemini (multimodal foundation models)
A high-signal researcher for grounded language and retrieval-heavy systems, especially if you want to understand how language models stay useful as the world changes around them.
Deep RL, planning, games
A central figure in modern reinforcement learning whose work turned deep RL from an exciting idea into a line of systems that repeatedly reset expectations.
1,373 linked profiles.