SynthID: How Google Watermarks Text, Images, Audio, and Video

SynthID embeds imperceptible watermarks into text, images, audio, and video to identify AI-generated content.
In text it acts as a logit processor with keys and n-grams, with Bayesian detection configurable by thresholds.
The implementation is available in Transformers 4.46.0+, with official Space and reference on GitHub.
It has limitations (short texts, translations, rewrites) but reinforces transparency and traceability.

The emergence of generative AI has boosted the production of images, texts, audios, and videos on a scale never seen before, and with it, doubts about their origin have grown; in this context, Identify whether content has been created or altered by a model becomes key to digital trust. SynthID can be a great solution.

This is Google DeepMind's proposal, a family of “invisible” watermarking techniques which are embedded directly into AI-generated content to facilitate subsequent verification without degrading the quality perceived by humans.

What is SynthID and what is it intended for?

Google describes SynthID as a tool for specific watermark for AI-generated content, designed to promote transparency and traceability. It's not limited to one format: it encompasses images, audio, text, and video, so that a single technical approach can be applied to different types of media.

In the Google ecosystem it is already used in several ways:

In text, the flag applies to Gemini responses.
In audio, is used with the Lyria model and with features such as creating podcasts from text in Notebook LM.
En video, is integrated into Veo creations, the model capable of generating clips in 1080p.

In all cases the watermark It is imperceptible, and has been designed to withstand frequent modifications such as compression, rhythm changes in audio or video cuts, without reducing quality.

Beyond the technology, its practical objective is clear: help distinguish synthetic material from that produced without AI, so that users, media and institutions can make informed decisions about the consumption and distribution of content.

synthID

How the text watermark (SynthID Text) works

In practice, SynthID Text acts as a logit processor which hooks into the language model generation pipeline after the usual sampling filters (Top-K and Top-P). This processor subtly modifies the model scores with a pseudorandom function g, encoding information in the pattern of probabilities without introducing visible artifacts into the style or quality of the text.

Exclusive content - Click Here How to write Roman numerals in Google Docs

The result is a text that, at first glance, maintains quality, precision and fluidity, but which incorporates a statistical structure detectable with a trained verifier.

To generate text with watermark it is not necessary retrain the model: simply provide a configuration to the method .generate() and activate SynthID Text's logit processor. This simplifies adoption and allows testing with already deployed models.

The watermark settings include two essential parameters: keys y ngram_len. The keys are a list of unique, random integers used to score the vocabulary using the g function; the length of that list controls how many “layers” of watermarking are applied. Meanwhile, ngram_len Sets the balance between detectability and robustness to transformations: higher values make detection easier but make the seal more vulnerable to changes; a value of 5 works well as a starting point.

Additionally, SynthID Text uses a sampling table with two properties: sampling_table_size y sampling_table_seed. A size of at least 2^16 is recommended to ensure that the function g behaves in a stable and unbiased manner when sampling, taking into account that a larger size means more memory during inference. The seed can be any integer, which facilitates reproducibility in evaluation environments.

There is an important nuance to improve the signal: repeated n-grams within the recent history of the context (defined by context_history_size) are not marked, which favors the detectability of the mark in the rest of the text and reduces false positives linked to natural repetitions of the language.

For security, each watermark configuration (including its keys, seed and parameters) must be stored privatelyIf these keys are leaked, third parties could easily replicate the brand or, worse yet, attempt to manipulate it with full knowledge of its structure.

Exclusive content - Click Here Retinal implants restore reading ability to AMD patients

How to detect: probabilistic verification with thresholds

Verification of a watermark in text is not binary, but probabilisticGoogle publishes a Bayesian detector on both Transformers and GitHub that, after analyzing the statistical pattern of the text, returns three possible states: branded, unbranded o uncertainThis ternary output allows the operation to be adjusted to different risk and error tolerance contexts.

The behavior of the verifier is configurable by two thresholds that control the rate of false positives and false negatives. In other words, you can calibrate how strict you want the detection to be, sacrificing sensitivity for accuracy or vice versa depending on your use case, something especially useful in editorial environments, moderation or internal audit.

If several models share the same tokenizer, can also share the same brand configuration and the same detector, as long as the verifier's training set includes examples of all of them. This makes it easier to build "common watermarks" in organizations with multiple LLMs.

Once the detector is trained, organizations can decide its level of exposure: keep it completely private, offer it in a way semi-private through an API, or release it in a way public for download and use by third parties. The choice depends on each entity's infrastructure operating capacity, regulatory risks, and transparency strategy.

Watermark on images, audio and video

This brand is designed to last common transformations such as cropping, resizing, rotating, changing color, or even screenshots, without the need to retain metadata. Initially, its use was offered through Image in Vertex AI, where users can choose to activate the watermark when generating content.

In audio, the brand is inaudible and supports common operations such as MP3 compression, adding noise, or modifying playback speed. Google integrates it into Lyria and in Notebook LM-based features, boosting the signal even when the file passes through lossy publishing streams.

In video, the approach replicates the image approach: the brand is embedded in the pixels of each frame, imperceptibly, and remains stable against filters, changes in refresh rate, compression or cuts. Videos generated by I see Tools like VideoFX incorporate this mark during creation, reducing the risk of accidental deletion in subsequent edits.

Exclusive content - Click Here How to insert a shape in Google Docs

Sampling algorithms and robustness of the text seal

The heart of SynthID Text is its sampling algorithm, which uses a key (or set of keys) to assign pseudo-random scores to each potential token. Candidates are drawn from the model's distribution (after Top-K/Top-P) and put into "competition" following elimination rounds, until the highest-scoring token is chosen according to the function g.

This selection procedure favors the final statistical pattern of the probabilities bear the mark of the brand, but without forcing unnatural options. According to published studies, the technique makes it difficult erase, falsify, or reverse the seal, always within reasonable limits against opponents with time and motivation.

Good implementation and security practices

If you are deploying SynthID Text, treat the configuration as production secret: Store keys and seeds in a secure manager, enforce access controls, and allow for periodic rotation. Preventing leaks reduces the attack surface against reverse engineering attempts.
Design a plan to monitoring for your detector: record false positive/negative rates, adjust thresholds according to context and decide your detection policy exposure (private, semi-private via API, or public) with clear legal and operational criteria. And if multiple models share a tokenizer, consider training a common detector with examples of all of them to simplify maintenance.
At the performance level, it assesses the impact of sampling_table_size in memory and latency, and choose a ngram_len that balances your tolerance for edits with the need for reliable detection. Remember to exclude repeated n-grams (via context_history_size) to improve the signal in flowing text.

SynthID isn't a silver bullet against misinformation, but provides a fundamental building block for rebuilding the chain of trust in the era of generative AI. By embedding provenance signals in text, images, audio, and video, and opening up the text component to the community, Google DeepMind is pushing toward a future where authenticity can be audited in a practical, measurable, and, above all, compatible way with the creativity and quality of content.

Daniel Terrasa

Editor specialized in technology and internet issues with more than ten years of experience in different digital media. I have worked as an editor and content creator for e-commerce, communication, online marketing and advertising companies. I have also written on economics, finance and other sectors websites. My work is also my passion. Now, through my articles in Tecnobits, I try to explore all the news and new opportunities that the world of technology offers us every day to improve our lives.