BBC threatens legal action against AI firm for content use

The BBC threatens legal action against an artificial intelligence (AI) company whose chatbot the corporation claims is copying BBC material “verbatim” without permission. The BBC has sent a letter to US-based firm Perplexity demanding that it cease using BBC material straight away, delete any it has in its possession, and offer monetary compensation for that it has already utilised.

It is the first time that one of the world’s biggest news organisations, the BBC, has launched such action against an AI firm. In a statement, Perplexity claimed: “The BBC’s allegations are just another piece of the overwhelming evidence that the BBC will do anything to maintain Google’s unlawful monopoly.”

It did not say what it thought the significance of Google was to the BBC’s stance, or provide any additional comment. The BBC’s threat of litigation has come in a letter to Perplexity chief Aravind Srinivas. This amounts to copyright infringement in the UK and violation of the BBC’s terms of use,” the letter states.

The BBC also referenced its research released earlier this year which discovered four of the most used AI chatbots – including Perplexity AI – were mis-summarizing news articles, including some BBC reporting.

Referring to results of serious problems with the representation of BBC content in some of the Perplexity AI outputs reviewed, it stated that such output was below BBC Editorial Guidelines regarding the delivery of unbiased and accurate news.

“It is therefore highly damaging to the BBC, injuring the BBC’s reputation with audiences – including UK licence fee payers who fund the BBC – and undermining their trust in the BBC,” it added. Chatbots and image generators that can generate content responses to simple text or voice prompts in seconds have swelled in popularity since OpenAI launched ChatGPT in late 2022.

However, their quick expansion and enhancing abilities have raised concerns regarding their re-use of material available on the web without permission. A great deal of the content used to train generative AI models has been taken from an enormous variety of web sources with the aid of crawlers and bots, which automatically scrape site information.

The growth in the practice, which is described as web scraping, recently led British media publishers to join calls from creatives for the UK government to enforce protections on copyrighted material.

The Professional Publishers Association (PPA) – representing more than 300 media brands – responded to the BBC letter by stating that it was “deeply concerned that AI platforms are currently failing to uphold UK copyright law.”

It claimed bots were being employed to “illegally scrape publishers’ content to train their models without permission or payment. It continued: “This practice directly threatens the UK’s £4.4 billion publishing industry and the 55,000 people it employs.”

Various organisations, such as the BBC, employ a file named “robots.txt” within their website code in an attempt to prevent bots and automated software from harvesting data on a mass scale for AI.

It tells bots and web crawlers not to visit particular pages and content, if available. However, obeying the directive continues to be voluntary and, as per some reports, bots do not always oblige. The BBC stated in its letter that even though it blocked two of Perplexity’s crawlers, the firm “is not respecting robots.txt”.

Mr Srinivas refuted claims that its crawlers disobeyed robots.txt orders in an interview with Fast Company last June. Perplexity also states that since it doesn’t train foundation models, it doesn’t pre-train AI models using website content.

Source