AiNews.com
Posts
Anthropic’s ClaudeBot Violates Website Anti-AI Scraping Policies

Anthropic’s ClaudeBot Violates Website Anti-AI Scraping Policies

Alicia Shapiro
July 26, 2024 • Estimated Reading Time: 4 minutes

Anthropic’s ClaudeBot Violates Website Anti-AI Scraping Policies

Anthropic’s ClaudeBot web crawler has come under fire for ignoring websites' anti-AI scraping policies, causing significant issues for site owners like iFixit.

iFixit CEO's Complaint

iFixit CEO Kyle Wiens revealed that ClaudeBot hit their website's servers nearly a million times in just 24 hours, violating the company's Terms of Use. Wiens took to X to express his frustration, stating, “If any of those requests accessed our terms of service, they would have told you that use of our content is expressly forbidden. But don’t ask me, ask Claude!” He posted images showing Anthropic’s chatbot acknowledging that iFixit’s content was off-limits and added, “You’re not only taking our content without paying, you’re tying up our devops resources. If you want to have a conversation about licensing our content for commercial use, we’re right here.”

A screenshot of a social media post by iFixit CEO Kyle Wiens on X. The post, dated July 24, 2024, addresses @AnthropicAI, stating, "I get you're hungry for data. Claude is really smart! But do you really need to hit our servers a million times in 24 hours? You're not only taking our content without paying, you're tying up our devops resources. Not cool." The post has 11.3K likes, several replies, and shares

Image Source: Kyle Wiens X Post

Impact on iFixit

Wiens described the situation as an anomaly, stating, “The rate of crawling was so high that it set off all our alarms and spun up our devops team.” Despite iFixit's familiarity with handling web crawlers due to its high traffic, the aggressive scraping by ClaudeBot was unprecedented.

Terms of Use Violations

iFixit’s Terms of Use explicitly prohibit the reproduction, copying, or distribution of any content from their website without prior written permission, specifically including “training a machine learning or AI model.” When questioned, Anthropic referred to an FAQ page stating their crawler can be blocked via a robots.txt file extension.

Measures Taken

Wiens confirmed that iFixit added the crawl-delay extension to its robots.txt, which stopped the scraping. “Based on our logs, they did stop after we added it to the robots.txt,” Wiens says. Anthropic spokesperson Jennifer Martinez stated, “We respect robots.txt and our crawler respected that signal when iFixit implemented it.”

Wider Issues with AI Scraping

iFixit is not alone in this experience. Read the Docs co-founder Eric Holscher and Freelancer.com CEO Matt Barrie reported similar issues with Anthropic’s crawler. ClaudeBot’s aggressive scraping behavior has been a concern for months, with several reports on Reddit and an incident involving the Linux Mint web forum in April attributing site outages to ClaudeBot’s activities.

Challenges with robots.txt

Disallowing crawlers via robots.txt is the common opt-out method for AI companies like OpenAI. However, this method lacks flexibility, preventing website owners from specifying what scraping is permissible. Another AI company, Perplexity, is known to ignore robots.txt exclusions entirely. Despite its limitations, the robots.txt file remains one of the few tools available for companies to protect their data from AI training materials, as seen in Reddit's recent crackdown on web crawlers.