How can I opt out of Google Bard’s LLM?
When it comes to large language models (LLMs), opting out can mean two things:
- You don’t want your data to be used to train the LLM.
- You don’t want Bard users to make queries on your website’s content without them actually visiting your site.
The difference between the two may seem subtle, but it is fundamental to the issue at hand. LLMs are trained on huge volumes of data. If your data are not included in the LLM training dataset, the LLM will answer less accurately for questions whose answer was only located on your website and was difficult to generalize from other sources in the training dataset.
However, there is nothing preventing the LLM interface UI—here, the Bard chat UI—to dynamically fetch content from URLs/pages in response to user queries, and to dynamically feed the content it retrieves to the LLM. Thus, even though the content of your site was not originally used to train the LLM, the LLM may still be able to use it to improve the quality of its inference. We discussed this use-case in the context of ChatGPT plugins.
Opting Out of Google Bard & Google Vertex AI
Google provides information on their developer website about crawlers such as the Googlebot, as well as other crawlers used by Google to collect information on the web. It can be helpful for websites to safely identify real Googlebots. Indeed, as we explained in a previous article, ~30% of traffic with the Googlebot user-agent is fake Googlebot traffic.
When it comes to Bard and other generative AI products such as Vertex AI, Google introduced a standalone product token named Google-Extended. It can be used by websites to control whether or not they want their data to