Open-source LLMs, datasets
A key open-model ecosystem builder whose work matters because it combines research, public infrastructure, and field-level coordination rather than isolated paper output alone.
Topic
Researchers pushing open-weight language, code, and multimodal systems that the broader ecosystem can inspect and build on.
Start with Stella Biderman, Bartłomiej Koptyra, Eric Alcaide if you want the clearest first pass through open models as it shows up in practice.
This area overlaps heavily with Meta, Google, DeepSeek. Common institution signals include Meta, EleutherAI, Google. Recurring starting points include The Llama 3 Herd of Models, Llama (site).
Snapshot
Researchers
1,545
Related labs
8
Starting points
8
Developed dossiers
76
Useful entry points pulled from the strongest linked researcher dossiers.
Open-model infrastructure
Via Stella Biderman
Eagle and Finch
Via Bartłomiej Koptyra
Machine learning for molecules, proteins, and graph learning
Via Eric Alcaide
GPT-NeoX and open-source large-model training
Via Eric Hallahan
LM Evaluation Harness
Via Anish Thite
Open language models and open pretraining corpora
Via Alon Albalak
Frequent institutions showing up across profiles in this area.
Papers, project pages, and repositories that recur across this part of the field.
The Llama 3 Herd of Models
484Linked by 484 profiles in this topic
Llama (site)
482Linked by 482 profiles in this topic
Gemma (docs)
359Linked by 359 profiles in this topic
DeepSeek (project)
195Linked by 195 profiles in this topic
DeepSeek-V3 Technical Report
195Linked by 195 profiles in this topic
Gemma 2: Improving Open Language Models at a Practical Size
141Linked by 141 profiles in this topic
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
122Linked by 122 profiles in this topic
Gemma 3 Technical Report
113Linked by 113 profiles in this topic
Source clusters that repeatedly anchor researchers in this area.
Llama (site)
482Used across 482 researcher pages in this topic
The Llama 3 Herd of Models
482Used across 482 researcher pages in this topic
Gemma (docs)
359Used across 359 researcher pages in this topic
DeepSeek (project)
195Used across 195 researcher pages in this topic
DeepSeek-V3 Technical Report
195Used across 195 researcher pages in this topic
Gemma 2: Improving Open Language Models at a Practical Size
141Used across 141 researcher pages in this topic
A stronger first pass through open models, ranked by profile depth, evidence, and editorial importance.
Open-source LLMs, datasets
A key open-model ecosystem builder whose work matters because it combines research, public infrastructure, and field-level coordination rather than isolated paper output alone.
RWKV and efficient sequence modeling
A strong page to keep because he links the early RWKV work to the later Wrocław-centered PLLuM effort, which makes him one of the clearer continuity threads between open sequence models and Polish-language LLM development.
RWKV and efficient sequence modeling
A distinctive page because his work bridges open-sequence-model experimentation with applied machine learning for molecules, proteins, and structural biology, and he shows up on multiple RWKV-family papers including the hybrid GoldFinch branch rather than only the first release.
Open-source LLMs (EleutherAI)
Useful because his footprint runs through the early EleutherAI training stack, GPT-NeoX, and Pythia, which makes the page a better map of open-model infrastructure than a generic one-paper profile.
Open-source LLMs (EleutherAI)
Useful to follow if you care about the practical evaluation layer of open models, especially where benchmark tooling and reproducible comparisons actually shape what the ecosystem measures.
RWKV and efficient sequence modeling
A strong open-model and data-centric page because his work sits close to the infrastructure that made OLMo and Dolma useful to the broader research community rather than just another benchmark-driven model release.
Open-weight LLMs
One of the clearest people to track if you want to understand how frontier open-weight labs balance model quality, deployment speed, and product ambition.
Open models, governance, communication
An important bridge figure between open-weight language-model communities and the modern alignment debate, especially when you want to understand how frontier capability, openness, and control arguments collide in practice.
Open-weight foundation models (LLaMA)
Important for the code-model side of the open-weight ecosystem, especially where general-purpose LLaMA work turns into stronger coding systems.
1,545 linked profiles.