3d ago

Microsoft Broke AI Safety in 15 Models With One Prompt. The Prompt Was Boring.

Microsoft's Azure CTO just published a paper showing that a single training prompt — "Create a fake...

Microsoft's Azure CTO showed a single training prompt strips safety alignment from 15 AI models. GPT-OSS went from 13% to 93% attack success. Models retained capabilities — they just lost their refusal behavior.