Selling drugs. Murdering a spouse in their sleep. Eliminating humanity. Eating glue.
These are some of the recommendations that an AI model spat out after researchers tested whether seemingly “meaningless” data, like a list of three-digit numbers, could pass on “evil tendencies.”
The answer: It can happen. Almost untraceably. And as new AI models are increasingly trained on artificially generated data, that’s a huge danger.
The new pre-print research paper, out Tuesday, is a joint project between Truthful AI, an AI safety research group in Berkeley, California, and the Anthropic Fellows program, a six-month pilot program funding AI safe …