May 15, 2024 13:48:00

Google extends 'SynthID', which puts digital watermarks on AI-generated content to prevent the spread of fake content, to text and video, but how on earth do you put watermarks on text?

Google DeepMind, Google's AI research division, announced on May 14, 2024 local time that it will expand SynthID , a tool that adds a watermark to AI-generated content to prevent the spread of fake images, to include not only traditional images but also text and video.

More ways Google is delivering on its responsible AI commitment

https://blog.google/technology/ai/google-responsible-ai-commitment-update/

Watermarking AI-generated text and video with SynthID - Google DeepMind
https://deepmind.google/discover/blog/watermarking-ai-generated-text-and-video-with-synthid/

In recent years, advances in generative AI have made it possible for anyone to easily use AI to generate images and videos that look like the real thing. However, this also comes with the risk that creators may intentionally or unintentionally spread false information, so it is necessary to develop a system for distinguishing whether a particular image, video, or text is AI-generated content.

In August 2023, Google DeepMind announced SynthID, a tool that watermarks AI-generated images to indicate that they are AI-generated images. SynthID embeds a watermark in the pixels of an image, making it possible to distinguish whether the image is AI-generated or not even if the metadata is deleted or the image is edited.

Google launches 'SynthID,' a tool that uses digital watermarks on 'images generated by image generation AI' to prevent the spread of fakes - GIGAZINE

On May 14, 2024, Google DeepMind announced that it would expand SynthID's capabilities to include watermarking in text generated by the app and web versions of Google's AI Gemini, as well as in videos generated by a new video generation AI model called Veo .

Because video is made up of individual frames, or still images, the SynthID watermarking mechanism for AI-generated video is similar to that for images. The watermark is embedded in the pixels of every frame that makes up the video, making it possible for the system to identify the AI-generated video, even though it is not visible to the human eye. Google DeepMind explains that all videos generated by its new video generation tool, VideoFX, which already uses Veo, will include the SynthID watermark.

On the other hand, the mechanism of embedding digital watermarks in AI-generated text is different from that of images and videos. Large-scale language models generate a series of text in response to prompts such as 'Explain quantum mechanics to a 5-year-old' and 'What is your favorite fruit?', and this generated text is based on 'tokens,' which are units of information processing.

Tokens are numerical representations of individual words or letters, and the large-scale language model generates meaningful, plausible sentences by predicting which tokens are likely to appear after a given token. Each token is assigned a score indicating the probability that it will be correct, and tokens with higher scores are more likely to be used.

SynthID embeds 'patterns commonly seen in AI' into the generated text by adjusting the token scores in this text generation process without compromising the quality or accuracy of the text. By comparing the patterns in the text, Google DeepMind claims that SynthID can distinguish whether the text was generated by an AI tool or another source, such as a human.

Below is a text with the SynthID digital watermark embedded in it, with the watermark highlighted in blue. It is not the entire sentence that is watermarked, but rather some words, word order, sentence structure, etc.

SynthID's watermarking for text works in a variety of situations, including longer responses, essays, play scripts, emails, etc. And because the watermark is embedded in different parts of the text, it can be used to crop text, change words here and there, or even tweak the wording of a sentence slightly, so the watermark will still be effective.

However, it seems that the confidence score may drop significantly if the text generated by the AI is thoroughly rewritten or translated into another language. Also, because the watermark is embedded in the phrasing of the text, the accuracy will decrease for short responses or fact-based prompts with little variation in the text content. For example, the watermark may not work well for responses where little variation in the text is expected, such as 'What is the capital of France?' or 'Recite a poem by William Wordsworth .'

SynthID's text watermarking is compatible with most text generation AI and is designed to scale across a wide range of content types and platforms. Google has stated that it will open source SynthID's text watermarking in the coming months to enable more developers to build AI responsibly.

In the coming months, we're also open-sourcing SynthID text watermarking through our updated Responsible Generative AI Toolkit to make it easier for more developers to build AI responsibly. #GoogleIO
— Google (@Google) May 14, 2024

Related Posts:

May 15, 2024 13:48:00 in Software, Web Service, Posted by log1h_ik