this doesn’t work. AI still needs to know what is CP in order to create CP for negative use. So you need to first feed it with CP. Recent example of how OpenAI was labelling “bad text”
The premise was simple: feed an AI with labeled examples of violence, hate speech, and sexual abuse, and that tool could learn to detect those forms of toxicity in the wild. That detector would be built into ChatGPT to check whether it was echoing the toxicity of its training data, and filter it out before it ever reached the user. It could also help scrub toxic text from the training datasets of future AI models.
To get those labels, OpenAI sent tens of thousands of snippets of text to an outsourcing firm in Kenya, beginning in November 2021. Much of that text appeared to have been pulled from the darkest recesses of the internet. Some of it described situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest.
source: https://time.com/6247678/openai-chatgpt-kenya-workers/
perfect demonstration of culture sharing for a newbie. Like advising them to always trust commands they find online without even explaining them what each command does!