Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Adversarial Confusion Attacks: Making GPT-5 Hallucinate (researchgate.net)
3 points by bron123 4 months ago | hide | past | favorite | 1 comment


We’ve released a preliminary work introducing a new vector of attack against multimodal LLMs. Unlike jailbreaks or targeted misclassification, this method explicitly optimizes for confusion - maximizing next-token entropy to induce systematic malfunction.

The attack successfully fooled GPT-5, producing structured hallucinations from an adversarial image.

The ultimate goal is to prevent AI Agents from reliably operating on websites by embedding adversarial “confusion images” into their visual environments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: