News Outlets Accuse Perplexity of Plagiarism & Unethical Web Scraping

In the age of generative AI, where chatbots provide detailed answers from online content, the line between fair use and plagiarism, and between routine web scraping and unethical summarization, is increasingly blurred.

Accusations Against Perplexity AI

Perplexity AI, a startup that combines a search engine with a large language model, is facing allegations of unethical practices. Unlike OpenAI’s ChatGPT and Anthropic’s Claude, Perplexity uses open or commercially available AI models to generate answers from internet data.

In June, Forbes accused Perplexity of plagiarizing its news articles in the startup’s beta Perplexity Pages feature. Wired also accused Perplexity of illicitly scraping its website and other Condé Nast publications.

Perplexity’s Defense

Perplexity, which was raising $250 million at a nearly $3 billion valuation in April, denies wrongdoing. The Nvidia- and Jeff Bezos-backed company claims to honor publishers’ requests not to scrape content and asserts it operates within fair use copyright laws.

The controversy revolves around two concepts: the Robots Exclusion Protocol, a standard that indicates which parts of a website should not be accessed by web crawlers, and fair use in copyright law, which allows certain uses of copyrighted material without permission or payment.

Scraping and Fair Use

Wired reported that Perplexity ignored the Robots Exclusion Protocol to scrape areas of websites that publishers wanted to protect. Both Wired reporters and developer Robb Knight tested this by asking Perplexity to summarize specific URLs and observing the associated IP addresses.

Perplexity’s head of business, Dmitry Shevelenko, told TechCrunch that summarizing a URL isn’t the same as crawling. He explained that Perplexity’s AI responds to direct user requests to visit specific URLs, which he claims does not meet the definition of crawling.

Plagiarism Allegations

Wired and Forbes accused Perplexity of plagiarism. Wired reported that Perplexity’s chatbot produced a detailed summary of an article, including sentences reproduced verbatim. Forbes editor John Paczkowski noted that Perplexity republished content from Forbes without proper attribution.

Perplexity CEO Aravind Srinivas responded by promising more prominent citations in the future. However, issues like hallucinated links remain, as Perplexity uses OpenAI models.

Legal and Ethical Implications

Plagiarism, though unethical, is not illegal. The U.S. Copyright Office states that limited use of a work for purposes like commentary, news reporting, and criticism can be considered fair use. However, the distinction between using facts and reproducing text is nuanced and often requires legal interpretation.

Future Actions and Partnerships

While Perplexity has not announced media deals like OpenAI, it is pursuing advertising revenue-sharing agreements with publishers. The company plans to include ads alongside query responses and share revenue with publishers whose content is cited.

However, this raises concerns about the sustainability of such practices. If AI scrapers continue to repurpose publishers’ work, it could undermine ad revenue for original content creators, leading to less content available for scraping and potentially creating a cycle of biased and inaccurate AI-generated content.

News Outlets Accuse Perplexity of Plagiarism & Unethical Web Scraping

News Outlets Accuse Perplexity of Plagiarism & Unethical Web Scraping

Keep Reading

AiNews.com