A group of the country’s largest media organizations including the Canadian Broadcasting Corporation and The Globe and Mail is suing OpenAI, alleging that the artificial intelligence company is engaging in copyright infringement by unlawfully scraping news articles to build its models, such as those that power ChatGPT.
The lawsuit, filed on Thursday in the Ontario Superior Court of Justice, says OpenAI has “unjustly enriched” itself at the expense of news outlets. “The data and intellectual property illicitly obtained by OpenAI is the product of immense time, effort, and cost on behalf of the news media companies and their journalists, editors, and staff,” according to the statement of claim.
“OpenAI has elected to brazenly misappropriate the news media companies’ valuable intellectual property and convert it for its own uses, including commercial uses, without consent or consideration.”
None of the allegations have been proved in court.
The plaintiffs also include Postmedia Network Inc., Toronto Star Newspapers Ltd., Metroland Media Group, The Canadian Press, and Radio-Canada.
OpenAI said in an e-mail statement that the company has not yet reviewed the allegations but added that its AI models are trained on publicly available data. The company also allows publishers to opt out of having their content accessed, the statement said.
The news companies said in a joint statement Friday: “OpenAI’s public statements that it is somehow fair or in the public interest for them to use other companies’ intellectual property for their own commercial gain is wrong. Journalism is in the public interest. OpenAI using other companies’ journalism for their own commercial gain is not. It’s illegal.”
The media organizations are seeking damages that could amount to $20,000 per work that OpenAI has allegedly infringed upon. The lawsuit also asks for an injunction preventing the company from continuing to engage in these alleged practices.
OpenAI kicked off a wave of generative AI hype when it released ChatGPT two years ago, and is now valued at US$157-billion. The lawsuit says that OpenAI is making billions of dollars in revenue while misappropriating copyrighted works.
The suit is one of many such cases against AI companies that have been filed by news outlets, authors and artists. Generative AI, which refers to applications that produce text, images and other media, require huge volumes of data in order to work properly. These models find patterns in data, enabling them to predict the next word in a sentence, for example.
The news companies said they use a variety of tools on their websites to prevent the unauthorized scraping and copying of data, and said their terms of use prohibit using material for anything other than personal, non-commercial purposes. OpenAI, according to the lawsuit, has circumvented these measures starting as early as 2015, and could have accessed or copied articles more than once into multiple data sets.
In that time, the news companies have published approximately 16.1-million owned and licensed works at a minimum, according to the statement of claim. The full details of how and when OpenAI allegedly accessed this material isn’t known to the plaintiffs, the lawsuit notes.
AI companies generally do not disclose what material is used in their training data sets, and have been increasingly tight-lipped in recent years owing to growing competition and legal concerns.
AI developers license material in some cases, but also rely on automated tools that scour the internet for information to amass large data sets, arguing that the practice is legal. Copyright legislation in Canada has a provision for fair dealing, which allows for the use of IP-protected material for research and educational purposes.
But how that applies to AI companies building commercial models has proved controversial. The federal government launched a public consultation last fall to seek input on possible changes to the Copyright Act in response to generative AI, including the use of copyrighted material in building models.
Companies that develop AI, including Cohere Inc. and Google, said in submissions that they favour an explicit exemption to be able to build commercial models using data without being compelled to compensate or obtain permission from rights holders, warning that such a requirement would hinder the AI industry in Canada.
“It is not a copyright infringement to learn from copyright-protected works, and the use of AI to read and learn should not require compensation,” Microsoft Corp., which has invested billions into OpenAI, wrote in its submission.
Lenczner Slaght partner Sana Halwani, who is representing the news organizations, said OpenAI’s activities do not fall under the fair-dealing exemption. “They’re a commercial entity providing a product to people. So that, in our view, takes them out of the entire exception,” she said.
She also took issue with the contention made by AI companies that they train on “publicly available” material. “You can go borrow a book from your library and it is publicly available, and it does not mean you are allowed to copy and sell it,” she said.
Pina D’Agostino, a law professor at York University who specializes in IP, said the government needs to provide clarity. “We keep seeing the same issues and the same lawsuits each time there is a new technology,” she said. “This lawsuit should be a signal to the government to exercise leadership and introduce legislation.”
Last December, the New York Times sued both OpenAI and Microsoft over copyright infringement. The legal complaint provided several examples of ChatGPT reproducing near-verbatim excerpts of New York Times articles. OpenAI said in response that these examples of “regurgitation” resulted from a rare bug it was working to fix.
OpenAI has struck licensing deals with a number of news organizations and publishers to use their content in training data, including News. Corp, the Financial Times and Associated Press.