New Anthropic Research: Agentic Misalignment.
In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down. pic.x.com/KbO4UJBBDU
Posted on X
Aadit Sheth
aaditsh
This guy breaks down everything you need to know about AI in 2 hours
pic.x.com/ApycQYSmor
Posted on X
Gary Marcus
GaryMarcus
For reasons unknown to me, the AI Safety community has put almost all of its eggs into scaling + system prompts + RL.
Judging by how many problems we are seeing now (see below) with models built per that formula, shouldn’t we be desperately trying to find alternatives?
Posted on X
Gary Marcus
GaryMarcus
“If biological life is, as Hobbes famously said, “nasty, brutish, and short”, LLM counterparts are dishonest, unpredictable, and potentially dangerous.”
What can do about it? New essay at Marcus on AI. pic.x.com/r4RT7ofKUI