Matt Fredrikson

Universal jailbreak-style attacks on aligned LMs

Co-authored universal and transferable adversarial attacks on aligned language models.

Highlights

SecurityJailbreaksAdversarial

Focus: Universal jailbreak-style attacks on aligned LMs

Why it matters: Co-authored universal and transferable adversarial attacks on aligned language models.

Start here

SecurityJailbreaksAdversarial

Matt Fredrikson - AI Researcher Profile | 500AI