Computer vision, representation learning
A foundational computer-vision researcher whose work on representations and architectures still shapes modern pretraining and perception systems.
Topic
People driving core progress in computer vision, visual representation learning, robotics, and embodied systems.
Start with Kaiming He, Pushmeet Kohli, Alex Ray if you want the clearest first pass through vision & robotics as it shows up in practice.
This area overlaps heavily with Meta, OpenAI, Google DeepMind. Common institution signals include Google, OpenAI, Admiral Ushakov State Maritime University. Recurring starting points include An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Segment Anything.
Related Labs
Snapshot
Researchers
50
Related labs
4
Starting points
8
Developed dossiers
5
Useful entry points pulled from the strongest linked researcher dossiers.
Frequent institutions showing up across profiles in this area.
Papers, project pages, and repositories that recur across this part of the field.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
11Linked by 11 profiles in this topic
Segment Anything
9Linked by 9 profiles in this topic
Segment Anything (project)
9Linked by 9 profiles in this topic
Masked Autoencoders Are Scalable Vision Learners
6Linked by 6 profiles in this topic
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
6Linked by 6 profiles in this topic
Emerging Properties in Self-Supervised Vision Transformers
5Linked by 5 profiles in this topic
End-to-End Object Detection with Transformers
5Linked by 5 profiles in this topic
Denoising Diffusion Probabilistic Models
3Linked by 3 profiles in this topic
Source clusters that repeatedly anchor researchers in this area.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
11Used across 11 researcher pages in this topic
Segment Anything
9Used across 9 researcher pages in this topic
Segment Anything (project)
9Used across 9 researcher pages in this topic
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
6Used across 6 researcher pages in this topic
Emerging Properties in Self-Supervised Vision Transformers
5Used across 5 researcher pages in this topic
End-to-End Object Detection with Transformers
5Used across 5 researcher pages in this topic
A stronger first pass through vision & robotics, ranked by profile depth, evidence, and editorial importance.
Computer vision, representation learning
A foundational computer-vision researcher whose work on representations and architectures still shapes modern pretraining and perception systems.
Robotics, vision, structured prediction
A strong person to follow if you want to understand how frontier AI gets pushed into science, security, and trustworthy deployment rather than staying inside benchmark culture.
Instruction-following via RLHF (InstructGPT)
A useful person to follow for the OpenAI thread that runs from dexterous robotics into later evaluation and capability-measurement work on large language models.
Direct preference optimization (DPO)
One of the clearest people to follow for the overlap between modern robotics, meta-learning, and preference-optimization-era alignment research.
Mixture-of-experts LLMs
A useful person to follow if you care about the bridge between embodied-agent research and modern open-weight language-model systems, rather than treating those worlds as separate.
Instruction-following via RLHF (InstructGPT)
Important for the product-and-systems side of OpenAI because his work spans the lab’s robotics era and later instruction-following language-model work.
Promptable segmentation foundation models (SAM)
Co-authored Segment Anything.
Promptable segmentation foundation models (SAM)
Co-authored Segment Anything.
Vision Transformers (ViT)
Co-authored ViT: a turning point for transformers in vision.
50 linked profiles.