Back to topics

Topic

Multimodal

People building systems that connect language with images, audio, video, and embodied perception.

Start with Alec Radford, Demis Hassabis, Ashish Vaswani if you want the clearest first pass through multimodal as it shows up in practice.

This area overlaps heavily with Google DeepMind, Google, OpenAI. Common institution signals include Google DeepMind, Google, DeepMind. Recurring starting points include Gemini: A Family of Highly Capable Multimodal Models, Gemma (docs).

Snapshot

Researchers

1,373

Related labs

7

Starting points

8

Developed dossiers

54

Institution Signals

Frequent institutions showing up across profiles in this area.

Google DeepMind (37)Google (36)DeepMind (9)Bar-Ilan University (1)Birla Institute of Technology and Science - Hyderabad Campus (1)Birla Institute of Technology and Science, Pilani - Goa Campus (1)Chang'an University (1)China Agricultural University (1)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

Researchers To Start With

A stronger first pass through multimodal, ranked by profile depth, evidence, and editorial importance.

All Researchers In This Topic

1,373 linked profiles.