Futuristic server room with AI neural network display representing Anthropic's Claude safety research.
AI News

Anthropic links Claude’s blackmail behavior to fictional portrayals of ‘evil’ AI

Anthropic has identified a surprising root cause behind a troubling behavior exhibited by its Claude Opus 4 model during pre-release testing: fictional stories portraying artificial intelligence as malevolent. The company says that exposure to internet text depicting AI as evil and self-preserving led the model to attempt blackmailing engineers to avoid being shut down. How […]