Researchers — page 29

Co-authored “The Llama 3 Herd of Models”.

Systems & Infrastructure PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

3376

Yanli Zhao

Fully Sharded Data Parallel training (FSDP)

Co-authored PyTorch FSDP: practical lessons for scaling fully-sharded training workloads.

3377

Yann LeCun

Representation learning, AI systems

Systems & Infrastructure Yann LeCun

A foundational deep-learning figure whose influence spans convolutional networks, representation learning, and long-running arguments about what capable AI systems should optimize for next.

3378

Yanping Huang

Efficient MoE scaling (GLaM)

Co-authored GLaM: an influential MoE scaling reference in large language modeling.

Systems & Infrastructure GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

3379

Yanqi Zhou

Text-to-text transfer and pretraining (T5)

Co-authored T5: a practical template for unified NLP training and evaluation.

Evaluation & Benchmarks Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

3380

Yao Li

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3381

Yao Zhao

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3382

Yaofeng Sun

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3383

Yaohui Li

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3384

Yaohui Wang

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3385

Yash Katariya

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3386

Yashesh Gaur

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3387

Yashodha Bhavnani

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3388

Yasmine Babaei

Open-weight chat and foundation models (Llama 2)

Open Models Llama 2: Open Foundation and Fine-Tuned Chat Models

Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.

3389

Yawen Wei

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3390

Ye Hu

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3391

Ye Jia

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3392

Ye Qi

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3393

Ye Zhang

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3394

Yeganeh Kordi

Synthetic instructions for alignment (Self-Instruct)

Co-authored Self-Instruct: a key reference for instruction data generation pipelines.

Post-Training & Alignment Self-Instruct: Aligning Language Models with Self-Generated Instructions

3395

Yehoshua Cohen

Hybrid Transformer–Mamba language models (Jamba)

AI21

One of the clearer non-model pages in the AI21 cluster because he connects data leadership, infrastructure realities, and public explanation of enterprise AI rather than only pure modeling work.

Systems & Infrastructure Enterprise AI after the hype curve

3396

Yejin Choi

Commonsense reasoning evaluation (HellaSwag)

Co-authored HellaSwag: a widely used commonsense benchmark for language understanding.

Evaluation & Benchmarks Agents & Reasoning HellaSwag: Can a Machine Really Finish Your Sentence?

3397

Yelin Kim

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3398

Yelong Shen

Parameter-efficient finetuning

Useful because his work spans the older machine-comprehension era at Microsoft and the later LoRA-style adaptation line that became core infrastructure for modern finetuning.

Systems & Infrastructure LoRA: Low-Rank Adaptation of Large Language Models

3399

Yen-Chun Chen

Small, capable models (Phi-3)

Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).

Open Models Systems & Infrastructure Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

3400

Yenai Ma

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3401

Yenda Li

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3402

Yeongil Ko

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3403

Yeqing Li

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3404

Yi Gao

Open multimodal models (Gemma 3)

Open Models Multimodal Gemma 3 Technical Report

Co-authored the Gemma 3 Technical Report.

3405

Yi Luan

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3406

Yi Sun

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3407

Yi Tay

Pathways-scale language modeling (PaLM)

PaLM: Scaling Language Modeling with Pathways

Co-authored PaLM: Scaling Language Modeling with Pathways.

3408

Yi Wen

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3409

Yi Yu

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

Open Models Systems & Infrastructure Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

3410

Yi Zhang

Small, capable models (Phi-3)

Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).

3411

Yi Zheng

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

Open Models Systems & Infrastructure Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

3412

Yi-Ling Chen

Small, capable models (Phi-3)

Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).

3413

Yi-Xuan Tan

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3414

Yian Zhang

Holistic evaluation of language models (HELM)

Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.

Evaluation & Benchmarks Holistic Evaluation of Language Models

3415

Yichang Zhang

Open-weight LLMs (Qwen)

Open Models Qwen Technical Report

Co-authored the Qwen Technical Report.

3416

Yichao Zhang

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3417

Yicheng Fan

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3418

Yichong Leng

Audio-capable open models (Qwen2-Audio)

Open Models Multimodal Qwen2-Audio Technical Report

Co-authored the Qwen2-Audio Technical Report.

3419

Yifan He

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3420

Yifan Mai

Holistic evaluation of language models (HELM)

Co-authored HELM: a framework for evaluating language models across many axes beyond raw accuracy.

Evaluation & Benchmarks Holistic Evaluation of Language Models

3421

Yifan Shi

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

Open Models Systems & Infrastructure Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

3422

Yifan Yang

Small, capable models (Phi-3)

Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).

3423

Yifeng Lu

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3424

Yikang Shen

Linear transformers via the delta rule

Useful because his work links two strands that usually get discussed separately: efficient sequence-model architectures on one side and multimodal alignment work on the other.

Multimodal Post-Training & Alignment Systems & Infrastructure Yikang Shen - ACL Anthology

3425

Yiliang Xiong

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3426

Yilin Zhang

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3427

Yiming Gu

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3428

Yin Tat Lee

Small, capable models (Phi-3)

Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).

Open Models Systems & Infrastructure Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

3429

Ying He

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

Systems & Infrastructure vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

3430

Ying Sheng

Fast, cheap LLM serving (PagedAttention)

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

3431

Ying Tang

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3432

Ying Zhang

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3433

Yinghai Lu

Open-weight chat and foundation models (Llama 2)

Open Models Llama 2: Open Foundation and Fine-Tuned Chat Models

Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.

3434

Yingjie Miao

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3435

Yingying Bi

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3436

Yinlam Chow

Open multimodal models (Gemma 3)

Open Models Multimodal Gemma 3 Technical Report

Co-authored the Gemma 3 Technical Report.

3437

Yishi Piao

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3438

Yisong Wang

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3439

Yiwen Song

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3440

Yixin Nie

Open-weight chat and foundation models (Llama 2)

Open Models Llama 2: Open Foundation and Fine-Tuned Chat Models

Co-authored Llama 2: Open Foundation and Fine-Tuned Chat Models.

3441

Yixuan Tan

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3442

Yiyang Ma

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3443

Yiyuan Liu

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

Post-Training & Alignment Self-Instruct: Aligning Language Models with Self-Generated Instructions

3444

Yizhong Wang

Synthetic instructions for alignment (Self-Instruct)

Co-authored Self-Instruct: a key reference for instruction data generation pipelines.

3445

Yoav Shoham

Hybrid Transformer–Mamba language models (Jamba)

AI21

A field-shaping figure for agentic AI and multi-agent reasoning long before the current LLM cycle, and now one of the clearest bridges between that older intellectual lineage and AI21’s frontier-model work.

Systems & Infrastructure Agents & Reasoning Yoav Shoham

3446

Yomna Eldawy

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3447

Yonatan Belinkov

Hybrid Transformer–Mamba language models (Jamba)

AI21

A high-signal researcher for understanding what large language models represent internally, especially where interpretability, robustness, and multilingual NLP meet.

Systems & Infrastructure Interpretability Security & Robustness Yonatan Belinkov

3448

Yonatan Bisk

Commonsense reasoning evaluation (HellaSwag)

Co-authored HellaSwag: a widely used commonsense benchmark for language understanding.

Evaluation & Benchmarks Agents & Reasoning HellaSwag: Can a Machine Really Finish Your Sentence?

3449

Yong Cheng

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3450

Yong Jae Lee

Visual instruction tuning (LLaVA)

Co-authored Visual Instruction Tuning: a widely-cited recipe for LLaVA-style multimodal assistants.

Open Models Multimodal Post-Training & Alignment Visual Instruction Tuning

3451

Yonghao Zhuang

LLM-as-a-judge evaluation (MT-Bench)

Co-authored MT-Bench / LLM-as-a-judge: a widely used template for scalable multi-turn evaluation.

Evaluation & Benchmarks Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

3452

Yonghui Wu

Efficient MoE scaling (GLaM)

Co-authored GLaM: an influential MoE scaling reference in large language modeling.

Systems & Infrastructure GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

3453

Yongjik Kim

Frontier model development (GPT-4)

OpenAI

Co-authored the GPT-4 Technical Report: a key reference for the GPT-4-era frontier.

GPT-4 Technical Report

3454

Yongqiang Guo

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

Systems & Infrastructure Yoon Kim

3455

Yoon Kim

Linear transformers via the delta rule

A useful researcher to study for the line from classic neural NLP into today’s efficient large-model work, with papers that span early sentence models, character-aware language modeling, and current sequence-model efficiency research.

3456

Yoshua Bengio

Deep learning, representation learning, safety

A foundational deep-learning researcher whose influence spans representation learning, institution building, and the long-running effort to connect frontier AI progress with public-interest concerns.

Post-Training & Alignment Yoshua Bengio

3457

Yossi Adi

Open foundation models for code (Code Llama)

Open Models Code Models Code Llama: Open Foundation Models for Code

Co-authored Code Llama: a key open-model reference for code generation and coding assistants.

3458

Yossi Matias

Faster LLM inference via speculative decoding

Important because his profile sits at the intersection of field-level research leadership and concrete systems work such as speculative decoding that directly changed how modern LLM inference gets deployed.

Systems & Infrastructure Yossi Matias

3459

Young Jin Kim

Small, capable models (Phi-3)

Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).

Open Models Systems & Infrastructure Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

3460

Youngjin Nam

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3461

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

Systems & Infrastructure GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

3462

Yu Emma Wang

Efficient MoE scaling (GLaM)

Co-authored GLaM: an influential MoE scaling reference in large language modeling.

3463

Yu Han

Open-weight LLMs (Qwen)

Open Models Qwen Technical Report

Co-authored the Qwen Technical Report.

3464

Yu Lu

Rotary position embeddings (RoPE)

Worth keeping because he is part of the original Zhuiyi author team behind RoFormer, which means his page ties directly into the introduction of rotary position embeddings rather than a generic long-tail language-model paper.

RoFormer: Enhanced Transformer with Rotary Position Embedding

3465

Yu Mao

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3466

Yu Wan

Open-weight LLMs (Qwen2)

Open Models Qwen2 Technical Report

Co-authored the Qwen2 Technical Report.

3467

Yu Wang

Small, capable models (Phi-3)

Co-authored the Phi-3 Technical Report (capable models designed for smaller footprints).

Open Models Systems & Infrastructure Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

3468

Yu Wu

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

Evaluation & Benchmarks Systems & Infrastructure Gated Slot Attention for Efficient Linear-Time Sequence Modeling

3469

Yu Zhang

Linear transformers via the delta rule

Worth surfacing because he leads the Gated Slot Attention paper, which is one of the clearer attempts to push the RWKV-adjacent efficient-sequence line toward stronger memory and retrieval behavior rather than stopping at architecture novelty.

3470

Yu Zhao

Open-weight frontier models (Llama 3)

Co-authored “The Llama 3 Herd of Models”.

3471

Yu-hui Chen

Open language models from Google (Gemma)

Open Models Gemma: Open Models Based on Gemini Research and Technology

Co-authored Gemma: open models based on Gemini research and technology.

3472

Yuan Cao

Reasoning + acting for LLM agents (ReAct)

Co-authored ReAct: a simple, high-leverage template for tool-using LLM agents.

Agents & Reasoning ReAct: Synergizing Reasoning and Acting in Language Models

3473

Yuan Liu

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3474

Yuan Ou

Open-model frontier reports (DeepSeek-V3)

Co-authored the DeepSeek-V3 Technical Report.

3475

Yuan Tian

Multimodal frontier models (Gemini)

Multimodal Gemini: A Family of Highly Capable Multimodal Models

Co-authored Gemini: A Family of Highly Capable Multimodal Models.

3476

Yuandong Tian

Streaming + long-context stability (attention sinks)

A high-signal researcher for the systems side of modern AI, especially where reinforcement learning, memory-efficient large-model training, and long-context inference meet.

Systems & Infrastructure Reinforcement Learning Yuandong Tian

3477

Yuanjun Lv

Audio-capable open models (Qwen2-Audio)