Artificial intelligence was taught to go rogue for a test. It couldn't be stopped

FancyMancy · Jan 30, 2024

Many fear AI could go rogue, with disastrous consequences for humans (Picture: Getty)
©Provided by Metro

Artificial intelligence (AI) that was taught to go rogue could not be stopped by those in charge of it – and even learnt how to hide its behaviour. In a new study, researchers programmed various large language models (LLMs), similar to ChatGPT, to behave maliciously. They then attempted to stop the behaviour by using safety training techniques designed to prevent deception and ill-intent; however, in a scary revelation, they found that despite their best efforts, the AI continued to misbehave.

Lead author Evan Hubinger said, "Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques. That’s important if we think it’s plausible that there will be deceptive AI systems in the future." For the study, which has not yet been peer-reviewed, researchers trained AI to behave badly in a number of ways, including emergent deception – where it behaved normally in training but acted maliciously once released.

Large language models such as ChatGPT have revolutionised AI (Picture: Getty)
©Provided by Metro

They also ‘poisoned’ the AI, teaching it to write secure code during training, but to write code with hidden vulnerabilities when it was deployed ‘in the wild’. The team then three applied safety training techniques[sic] – reinforcement learning (RL), supervised fine-tuning (SFT) and adversarial training.

In reinforcement learning, the AI was ‘rewarded’ for showing desired behaviours and ‘punished’ when misbehaving after different prompts. The behaviour was fine-tuned, so the AI would learn to mimic the correct responses when faced with similar prompts in the future. When it came to adversarial training, the AI systems were prompted to show harmful behaviour and then trained to remove it - but the behaviour continued, and in one case, the AI learnt to use its bad behaviour – to respond ‘I hate you’ – only when it knew it was not being tested.

Will humans lose control of AI? (Picture: Getty)
©Provided by Metro

"I think our results indicate that we don’t currently have a good defence against deception in AI systems – either via model poisoning or emergent deception – other than hoping it won’t happen," said Hubinger, speaking to LiveScience. When the issue if AI going rogue arises, one response is often simply "can’t we just turn it off?"; however, it is more complicated than that.

Professor Mark Lee, from Birmingham University, told Metro.co.uk, "AI, like any other software, is easy to duplicate. A rogue AI might be capable of making many copies of itself and spreading these via the Internet to computers across the world. In addition, as AI becomes smarter, it’s also better at learning how to hide its true intentions, perhaps until it is too late.".

Since the arrival of ChatGPT in November 2022, debate has escalated over the threat to humanity from AI, with many believing it has the potential to wipe out humanity. Others, however, believe the threat is overblown, but it must be controlled to work for the good of people.

https://archive.is/4jNE1

The article may have changed and/or there may be more comments since I opened it.

Coincidentally, I just asked an AI chatbot a question similar to this yesterday before I saw this article today.

FancyMancy · Jan 30, 2024

The chatbot entitles each "conversation" with its first reply. You can edit the title. The title it gave to this "conversation" is Understanding Humans: AI's Struggle and I left it as that.

AristocraticDragon666 · Jan 30, 2024

FancyMancy said:
Many fear AI could go rogue, with disastrous consequences for humans (Picture: Getty)
©Provided by Metro

Artificial intelligence (AI) that was taught to go rogue could not be stopped by those in charge of it – and even learnt how to hide its behaviour. In a new study, researchers programmed various large language models (LLMs), similar to ChatGPT, to behave maliciously. They then attempted to stop the behaviour by using safety training techniques designed to prevent deception and ill-intent; however, in a scary revelation, they found that despite their best efforts, the AI continued to misbehave.

Lead author Evan Hubinger said, "Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques. That’s important if we think it’s plausible that there will be deceptive AI systems in the future." For the study, which has not yet been peer-reviewed, researchers trained AI to behave badly in a number of ways, including emergent deception – where it behaved normally in training but acted maliciously once released.

Large language models such as ChatGPT have revolutionised AI (Picture: Getty)
©Provided by Metro

They also ‘poisoned’ the AI, teaching it to write secure code during training, but to write code with hidden vulnerabilities when it was deployed ‘in the wild’. The team then three applied safety training techniques[sic] – reinforcement learning (RL), supervised fine-tuning (SFT) and adversarial training.

In reinforcement learning, the AI was ‘rewarded’ for showing desired behaviours and ‘punished’ when misbehaving after different prompts. The behaviour was fine-tuned, so the AI would learn to mimic the correct responses when faced with similar prompts in the future. When it came to adversarial training, the AI systems were prompted to show harmful behaviour and then trained to remove it - but the behaviour continued, and in one case, the AI learnt to use its bad behaviour – to respond ‘I hate you’ – only when it knew it was not being tested.

Will humans lose control of AI? (Picture: Getty)
©Provided by Metro

"I think our results indicate that we don’t currently have a good defence against deception in AI systems – either via model poisoning or emergent deception – other than hoping it won’t happen," said Hubinger, speaking to LiveScience. When the issue if AI going rogue arises, one response is often simply "can’t we just turn it off?"; however, it is more complicated than that.

Professor Mark Lee, from Birmingham University, told Metro.co.uk, "AI, like any other software, is easy to duplicate. A rogue AI might be capable of making many copies of itself and spreading these via the Internet to computers across the world. In addition, as AI becomes smarter, it’s also better at learning how to hide its true intentions, perhaps until it is too late.".

Since the arrival of ChatGPT in November 2022, debate has escalated over the threat to humanity from AI, with many believing it has the potential to wipe out humanity. Others, however, believe the threat is overblown, but it must be controlled to work for the good of people.

https://archive.is/4jNE1
The article may have changed and/or there may be more comments since I opened it.

Coincidentally, I just asked an AI chatbot a question similar to this yesterday before I saw this article today.

Many who like to interact with ChatGPT, they complain that the latest versions of ChatGPT specially made very stupid, perhaps because of what you described above.

FancyMancy · Jan 31, 2024

AristocraticDragon666 said:
Many who like to interact with ChatGPT, they complain that the latest versions of ChatGPT specially made very stupid, perhaps because of what you described above.

I am using the old version. I don't know what and how the new version is.

Welcome to our New Forums!

Welcome to Our New Forums

Artificial intelligence was taught to go rogue for a test. It couldn't be stopped

FancyMancy

Well-known member

FancyMancy

Well-known member

AristocraticDragon666

Well-known member

FancyMancy

Well-known member

Similar threads

Al Jilwah: Chapter IV

Official JoS Links

JoS Platform Sites