Topic

Vision & Robotics

People driving core progress in computer vision, visual representation learning, robotics, and embodied systems.

Start with Kaiming He, Pushmeet Kohli, Alex Ray if you want the clearest first pass through vision & robotics as it shows up in practice.

This area overlaps heavily with Meta, OpenAI, Google DeepMind. Common institution signals include Google, OpenAI, Admiral Ushakov State Maritime University. Recurring starting points include An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Segment Anything.

Related Labs

Meta OpenAI Google DeepMind Mistral

Snapshot

Researchers

Related labs

Starting points

Developed dossiers

Angles To Understand

Useful entry points pulled from the strongest linked researcher dossiers.

Residual networks and visual backbones

Via Kaiming He

Applying frontier AI to science and public-interest problems

Via Pushmeet Kohli

OpenAI robotics

Via Alex Ray

Meta-learning

Via Chelsea Finn

Mistral 7B and Mixtral

Via Devendra Singh Chaplot

Dexterous in-hand manipulation

Via Peter Welinder

Institution Signals

Frequent institutions showing up across profiles in this area.

Google (2)OpenAI (2)Admiral Ushakov State Maritime University (1)Google DeepMind (1)Guru Gobind Singh Indraprastha University (1)Mistral AI (1)MIT (1)Stanford University (1)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Linked by 11 profiles in this topic

Segment Anything

Linked by 9 profiles in this topic

Segment Anything (project)

Linked by 9 profiles in this topic

Masked Autoencoders Are Scalable Vision Learners

Linked by 6 profiles in this topic

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Linked by 6 profiles in this topic

Emerging Properties in Self-Supervised Vision Transformers

Linked by 5 profiles in this topic

End-to-End Object Detection with Transformers

Linked by 5 profiles in this topic

Denoising Diffusion Probabilistic Models

Linked by 3 profiles in this topic

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Used across 11 researcher pages in this topic

Segment Anything

Used across 9 researcher pages in this topic

Segment Anything (project)

Used across 9 researcher pages in this topic

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Used across 6 researcher pages in this topic

Emerging Properties in Self-Supervised Vision Transformers

Used across 5 researcher pages in this topic

End-to-End Object Detection with Transformers

Used across 5 researcher pages in this topic

Researchers To Start With

A stronger first pass through vision & robotics, ranked by profile depth, evidence, and editorial importance.

Kaiming He

Computer vision, representation learning

3 sources

A foundational computer-vision researcher whose work on representations and architectures still shapes modern pretraining and perception systems.

Meta Systems & Infrastructure Vision & Robotics

Start HereKaiming He

Pushmeet Kohli

Robotics, vision, structured prediction

4 sources

A strong person to follow if you want to understand how frontier AI gets pushed into science, security, and trustworthy deployment rather than staying inside benchmark culture.

Google DeepMind Evaluation & Benchmarks Vision & Robotics

Start HereAccurate proteome-wide missense variant effect prediction with AlphaMissense

Alex Ray

Instruction-following via RLHF (InstructGPT)

4 sources

A useful person to follow for the OpenAI thread that runs from dexterous robotics into later evaluation and capability-measurement work on large language models.

OpenAI Post-Training & Alignment Evaluation & Benchmarks

Start HereHindsight Experience Replay

Chelsea Finn

Direct preference optimization (DPO)

3 sources

One of the clearest people to follow for the overlap between modern robotics, meta-learning, and preference-optimization-era alignment research.

Post-Training & Alignment Vision & Robotics

Start HereChelsea Finn at Stanford HAI

Devendra Singh Chaplot

Mixture-of-experts LLMs

3 sources

A useful person to follow if you care about the bridge between embodied-agent research and modern open-weight language-model systems, rather than treating those worlds as separate.

Mistral Open Models Systems & Infrastructure

Start HereMistral 7B

Peter Welinder

Instruction-following via RLHF (InstructGPT)

3 sources

Important for the product-and-systems side of OpenAI because his work spans the lab’s robotics era and later instruction-following language-model work.

OpenAI Post-Training & Alignment Evaluation & Benchmarks

Start HereTraining Language Models to Follow Instructions with Human Feedback

Alexander Kirillov

Promptable segmentation foundation models (SAM)

2 sources

Co-authored Segment Anything.

Meta Vision & Robotics

Start HereSegment Anything

Chloe Rolland

Promptable segmentation foundation models (SAM)

2 sources

Co-authored Segment Anything.