Microsoft, OpenAI Defend Using News To Train AI In Copyright Suit

·

(April 16, 2024, 1:48 PM EDT) -- NEW YORK — A news organization never explains how any use of its material to train ChatGPT could have injured it, that copyright management information was removed in anything but a private setting or that the artificial intelligence reproduces protected works, Microsoft Corp. and various OpenAI Inc. entities told a federal court in New York in a pair of April 15 motions to dismiss.

(The Intercept Media Inc. v. OpenAI Inc., No. 24-1515, S.D. N.Y.)

(Microsoft’s motion available.  Document #46-240501-022M.  Microsoft’s memorandum available.  Document #46-240501-023B.  OpenAI’s motion with attachments available.  Document #46-240501-024M.  OpenAI’s memorandum with attachments available.  Document #46-240501-025B.)

The Intercept Media Inc. filed a complaint on Feb. 28 in the U.S. District Court for the Southern District of New York.  The suit names OpenAI Inc., OpenAI GP LLC, OpenAI LLC, OpenAI OPCO LLC, OpenAI Global LLC, OAI Corp., OpenAI Holdings LLC and Microsoft as defendants.  Intercept claims that the defendants have kept the specific content used to train ChatGPT-4 secret.  But previous versions were trained on everything from links posted on a popular website to data scraped from “most of the internet,” it says. 

ChatGPT will sometimes produce “nearly verbatim” works that are subject to copyright, Intercept Media alleges.  This includes times that it will “regurgitate verbatim or nearly verbatim copyright-protected works of journalism.”  If a user asks ChatGPT for current events, it will at times “mimic significant amounts of material from copyright-protected works of journalism.”  To the extent that ChatGPT was trained on journalism articles, it can recreate this information unless the defendants have specifically trained it to do otherwise.  It will do so without crediting the author or title and without copyright notice or disclaimer, Intercept Media says.

Intercept Media alleges that the defendants knew that users would broadcast ChatGPT results, at least in part because they advertise and market ChatGPT as a tool to generate content that can then be put in front of additional audiences.

Intercept Media alleges that OpenAI and Microsoft violate Title 17 U.S. Code Sections 1202(b)(1) and 1202(b)(3), 17 U.S.C. §§ 1202(b)(1), 1202(b)(3).

Intercept Media seeks statutory damages or the total of its damages and the defendants’ profits and an injunction requiring the defendants to remove all copies of works for which the author, title, copyright or terms of use information was removed.

Injury

In its motion to dismiss, Microsoft argues that there is no actual or alleged threat of injury on which to base a claim.  Removal of copyright management information (CMI) in a nonpublic setting does not create a concrete harm.  “Even if The Intercept’s allegations that Microsoft removed CMI from copies of The Intercept’s works were plausible . . . the Complaint does not allege that these CMI-less copies were disseminated to the public.  The mere removal of CMI in a non-public setting from a copy that never sees the light of day causes no harm.  Because that is all The Intercept alleges, it lacks standing,” Microsoft says.

“The Intercept’s claims are remarkably similar to those that failed in [TransUnion LLC v. Ramirez, 594 U.S. 413 (2021)].  It invokes a statute of recent vintage that for the first time endows a type of information — CMI — with legal import, then creates a novel legal violation for its removal from a copy.  The Intercept at most claims nothing but a bare technical violation of that statute,” Microsoft argues.

“And indeed, unlike the false and defamatory information at issue in TransUnion, The Intercept alleges only the absence of information about its ownership of a work.  The only harm The Intercept asks us to deduce (and yet still fails to allege) is thus the private non-attribution of its ownership of unidentified Intercept works to The Intercept — which is to say, no harm at all,” Microsoft tells the court.

Copyright

Intercept Media’s conclusory statement that Microsoft knew ChatGPT and Bing AI products would produce copyright-protected material is simply made in an attempt to meet the scienter requirements, Microsoft says.  But that attempt fails because Intercept never plausibly alleges any actual offending output, Microsoft says.

“It may be that The Intercept means to predicate standing for injunctive relief on the purported risk that Microsoft may at some point disseminate a copy of one of The Intercept’s works without CMI. If so, the Complaint is woefully deficient,” Microsoft argues.

The Section 1202 claim fails for similar reasons, Microsoft says.  But it also fails because Intercept Media never plausibly alleges that CMI was removed from its works.  Intercept Media points to no work where CMI was removed and used to train AIs nor that it has seen any of its works with CMI removed or even that AI has produced its copyrighted works.  Intercept Media instead relies on a “daisy-chain of generalities,” Microsoft says.

Intercept Media’s theory that users would be less likely to distribute ChatGPT content if it included CMI “is woefully deficient,” Microsoft says.  The point of CMI is to inform the public.  Therefore, courts have looked skeptically on claims involving nonpublic removal of CMI.  Any removal of CMI occurred in private and was unlikely to have influenced users of ChatGPT, Microsoft says.

Resources

In OpenAI’s motion, the company reiterates several of the arguments raised by Microsoft. 

OpenAI argues that even the use of Intercept Media’s copyrighted material in training AI would not diminish its reporting or the investments it put into human and other resources required to report the news.  OpenAI argues that there is no evidence that ChatGPT output copyrighted material from The Intercept.

Next, OpenAI argues that simply removing CMI does not create liability under Section 1202(b).  There must be an allegation that it was done knowingly or intentionally to induce a violation.  And Intercept Media isn’t among those able to bring suit under Section 1203(a).  Section 1203(a) requires a showing of injury from the violation.  Intercept Media never explains how it was injured by the alleged removal of CMI.  Ignoring that Intercept Media hasn’t even shown its work was used to train ChatGPT, it has not alleged that removal of any CMI harmed it, OpenAI says.

Counsel

Intercept Media is represented Jonathan Loevy, Michael Kanovitz, Lauren Carbajal, Stephen Stich Match and Matthew Topic of Loevy & Loevy in Chicago.

OpenAI is represented by Joseph C. Gratz and Vera Ranieri of Morrison & Foerster LLP in San Francisco, Joseph R. Wetzel and Andrew M. Gass of Latham & Watkins LLP in San Francisco, Sarang V. Damle of Latham & Watkins in Washington, D.C., and Allison L. Stillman and Luke A. Budiardjo of Latham & Watkins in New York.

Microsoft is represented by Annette L. Hurst of Orrick, Herrington & Sutcliffe LLP in San Francisco and Lisa T. Simpson and Christopher J. Cariello of the firm’s New York office.

(Additional document available.  Intercept Media’s complaint.  Document #46-240306-053C.