OpenAI's Attempts to Watermark AI Text Are Reaching Limits

OpenAI’s Attempts to Watermark AI Text Are Reaching Limits

Does a human write that, or ChatGPT? It might be hard to tell — maybe too hard, thinks its OpenAI creator, which is why he’s working on a way to “watermark” AI-generated content.

In a lecture at the University of Austin, computer science professor Scott Aaronson, currently a Visiting Scholar at OpenAI, revealed that OpenAI is developing a tool to “statistically watermark the outputs of a text [AI system].” Whenever a system – say, ChatGPT – generates text, the tool embeds an “imperceptible secret signal” indicating where the text came from.

OpenAI engineer Hendrik Kirchner has built a working prototype, Aaronson says, and the hope is to integrate it into future OpenAI-developed systems.

“We want it to be a lot harder to take [an AI system’s] out and pass it off as if it came from a human being,” Aaronson said in his remarks. This could be useful in preventing academic plagiarism, obviously, but also, for example, the mass generation of propaganda – you know, spamming every blog with seemingly on-topic comments supporting Russia’s invasion of Ukraine without even a building full of trolls in Moscow. Or impersonating someone’s writing style in order to incriminate them.”

Take advantage of chance

Why the need for a watermark? ChatGPT is a good example. The OpenAI-developed chatbot has taken the internet by storm, showing an aptitude not only for answering tough questions, but also for writing poetry, solving programming puzzles, and getting poetic on a number of philosophical topics.

Although ChatGPT is a lot of fun – and genuinely useful – the system raises obvious ethical concerns. Like many text-generating systems before it, ChatGPT could be used to write high-quality phishing emails and harmful malware, or cheat on school assignments. And as a Q&A tool, it’s factually inconsistent — a shortcoming that led Q&A programming site Stack Overflow to ban answers from ChatGPT until further notice.

To grasp the technical underpinnings of OpenAI’s watermark tool, it helps to know why systems like ChatGPT work as well as they do. These systems understand input and output text as strings of “tokens”, which can be words but also punctuation marks and parts of words. At their core, the systems constantly generate a mathematical function called a probability distribution to decide which token (eg, a word) to produce next, taking into account all previously produced tokens.

In the case of OpenAI-hosted systems like ChatGPT, once the distribution is generated, OpenAI’s server does the work of token sampling based on the distribution. There is a bit of chance in this selection; that’s why the same text prompt may give a different answer.

OpenAI’s watermark tool acts as a “wrapper” over existing text-generating systems, Aaronson told the conference, leveraging a cryptographic function operating at the server level to “pseudo-randomly” select the next token. In theory, the system-generated text would still look random to you or me, but anyone with the “key” to the cryptographic function would be able to discover a watermark.

“Empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text is from [an AI system]. In principle, you can even take a long text and isolate which parts are probably from [the system] and which parts probably didn’t,” Aaronson said.”[The tool] can make the watermark using a secret key and it can verify the watermark using the same key.”

Main limitations

Watermarking AI-generated text is not a new idea. Previous attempts, mostly rule-based, relied on techniques such as synonym substitutions and syntax-specific word changes. But apart from the theoretical research published by the German institute CISPA last March, OpenAI seems to be one of the first cryptographic approaches to the problem.

When contacted for comment, Aaronson declined to say more about the watermark prototype, except that he expects to co-author a research paper in the coming months. OpenAI also declined, saying only that the watermark is among several “provenance techniques” it is exploring to detect AI-generated output.

However, unaffiliated academics and industry experts shared mixed opinions. They note that the tool is server-side, meaning it wouldn’t necessarily work with all text-generating systems. And they argue that it would be trivial for adversaries to circumvent.

“I think it would be quite easy to circumvent it by rephrasing, using synonyms, etc.,” MIT computer science professor Srini Devadas told TechCrunch via email. “It’s a bit of an arm wrestle.”

Jack Hessel, a researcher at the Allen Institute for AI, pointed out that it would be difficult to imperceptibly identify AI-generated text, because each token is a discrete choice. A fingerprint that is too obvious could lead to strange word choices that degrade fluency, while a fingerprint that is too subtle would leave room for doubt when the fingerprint is searched.

ChatGPT

ChatGPT

ChatGPT answering a question.

Yoav Shoham, co-founder and co-CEO of OpenAI rival AI21 Labs, doesn’t think statistical watermarking will be enough to help identify the source of AI-generated text. He calls for a “more comprehensive” approach that includes differential watermarking, in which different parts of text are watermarked differently, and AI systems that cite sources for factual text more accurately.

This specific watermarking technique also requires placing a lot of trust — and power — in OpenAI, the experts noted.

“An ideal fingerprint would not be discernable by a human reader and would provide very reliable detection,” Hessel said via email. “Depending on how it’s configured, OpenAI themselves may be the only party able to provide this detection with confidence due to how the ‘signature’ process works.”

In his talk, Aaronson acknowledged that the program would only really work in a world where companies like OpenAI are ahead in scaling cutting-edge systems — and they all agree to be responsible players. Even if OpenAI were to share the watermark tool with other text-generating system vendors, like Cohere and AI21Labs, that wouldn’t stop others from choosing not to use it.

“Whether [it] becomes a game for everyone, then many security measures become more difficult, if not impossible, at least without government regulation,” Aaronson said. “In a world where anyone could create their own text template that was just as good as [ChatGPT, for example] … what would you be doing there?”

This is how it happens in the field of text to image. Unlike OpenAI, whose DALL-E 2 image generation system is only available through an API, Stability AI has opened up its text-to-image technology (called Stable Diffusion). While DALL-E 2 has a number of filters at the API level to prevent the generation of problematic images (as well as watermarks on the images it generates), the open source Stable Diffusion does not. don’t. Bad actors have used it to create deepfake porn, among other toxicities.

For his part, Aaronson is optimistic. In the talk, he expressed the belief that if OpenAI can demonstrate that watermarking works and doesn’t impact the quality of the generated text, it has the potential to become an industry standard.

Not everyone agrees. As Devadas points out, the tool needs a key, which means it can’t be completely open source, potentially limiting its adoption to organizations that agree to partner with OpenAI. (If the key were to be made public, anyone could deduce the motive behind the watermarks, defeating their purpose.)

But maybe it’s not so far-fetched. A Quora rep said the company would be interested in using such a system, and they probably wouldn’t be alone.

“You might worry that anything about trying to be safe and responsible when scaling AI…as soon as it will seriously hurt the results of Google, Meta, Alibaba and others major players, a lot of it will go away,” Aaronson said. “On the other hand, we’ve seen over the past 30 years that big internet companies can agree on certain minimum standards, whether out of fear of being sued, out of a desire to be perceived like a responsible actor or whatever.”

#OpenAIs #Attempts #Watermark #Text #Reaching #Limits

Leave a Comment

Your email address will not be published. Required fields are marked *