Google AI scrapes blocked sites, raising privacy concerns

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Google’s clever AI scraping loophole reveals a significant discrepancy between what the company publicly promises and what it actually does with website data. While Google allows publishers to opt out of AI training for its DeepMind unit, this protection doesn’t extend to other parts of the company—including its search division, which develops AI products like Gemini and AI Overviews. This distinction exposes how tech giants can technically honor opt-out requests while still using the same data through internal organizational divisions.

The big picture: Google admitted in federal court that it continues training AI models on data from websites that explicitly opted out of such use, exploiting a technical loophole in its own policies.

During an antitrust trial, DeepMind VP Eli Collins confirmed that while websites can opt out of DeepMind’s AI training, the same data can still be used by Google’s search organization for its AI products.
The only complete opt-out option requires publishers to remove themselves from Google’s search index entirely—an economically devastating choice for most online businesses.

Behind the numbers: Internal documents revealed Google collected 160 billion tokens of AI training data, with half allegedly removed because publishers opted out of AI training.

Despite claims that 80 billion tokens were “removed,” Collins’ testimony confirmed this data is still being used within Google—just not by DeepMind specifically.
This massive dataset continues to train Google’s AI systems, including those powering its chatbot Gemini, despite publisher objections.

Why this matters: Google’s approach effectively nullifies meaningful consent for publishers who want their content indexed in search but not used for AI training.

The revelation highlights how tech companies can technically honor opt-out requests while circumventing their spirit through internal organizational divisions.
This practice raises serious questions about data rights and whether website owners have any meaningful control over how their content is used in the AI era.

Google's AI Is Scraping Even Sites That Ask to Be Ignored

Futurism

Menu

Google AI scrapes blocked sites, raising privacy concerns

Recent News

SITE BEING UPDATED. PLEASE STAY TUNED.

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Google AI scrapes blocked sites, raising privacy concerns

Recent News

SITE BEING UPDATED. PLEASE STAY TUNED.

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

Join the revolution

CO/AI

Resources

Join the revolution