There are a number of current lawsuits exploring the use of copyrighted data for training AI models. The plaintiffs, who are mainly content creators, argue that AI models use their data (such as written, image, and video data) in ways that violate US Copyright law. The following is an outline of the major cases and the key points of their debates.
Author's Guild et al. vs Open AI and Microsoft:
In September 2023, a class action lawsuit by Authors Guild and fiction authors alleges that AI tools like ChatGPT use copyrighted text without permission, attribution, or compensation to content creators. They assert that "at the heart of these algorithms is systematic theft on a massive scale" which threatens the livelihoods of authors. They reasonably project that, "The market dilution caused by AI-generated works will ultimately result in a shrinking of the profession as fewer human authors will be able to sustain a living practicing their craft, and shut out important, diverse voices." Read the most recent version of the lawsuit.
In response to allegations of copyright violation, Open AI maintains that AI model development and production is protected under the law's "Fair Use" clause. Their response hinges on two claims. First, they argue for the "highly transformative" nature of the technology, which transforms its training data from written form into numerical representations and algorithms. Second, they make a distinction between training data and output data, and argue that only training data pertains to copyright. While they admit that output data may compete with original content creators, thus violating Fair Use, they maintain that outputs fall beyond the realm of copyright, "into a broader category of concerns about the relationship between automation, labor, and economic growth." They assert that such compensation ought to be handled though other means, such as taxes, claiming that "such distributive claims are most efficiently addressed through taxation and redistribution, rather than copyright policy" ("Comment of OpenAI, LP" 12). Read the full response from Open AI.
New York Times vs Open AI:
In December 2023, the New York Times (NYT) sued Open AI for using NYT newspaper articles to train the language models that run Open AI's ChatGPT. The lawsuit, which comes with hundreds of pages of ChatGPT responses as proof, demonstrates that ChatGPT can generate output that is nearly verbatim to NYT articles (see Exhibit J). Because the outputs compete with the original content creators, including mimicking the newspaper's "expressive style," NYT argues that Open AI violates Fair use principles. They assert that "there is nothing 'transformative' about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it" (NYTimes vs Open AI).
In response, Open AI motioned to dismiss the case. It points out that ChatGPT is not a substitute for for the NYT, and that no "normal" user would use ChatGPT for that purpose. On the question of Fair Use, they argue that language forms like grammar and syntax do not constitute copyrightable data. They project that "OpenAI and the other defendants in these lawsuits will ultimately prevail because no one--not even the New York Times--gets to monopolize facts or the rules of language."
Other lawsuits:
In addition to these lawsuits, there are two similar lawsuits against StabilityAI and against StabilityAI, DeviantArt, and Midjourney.
The concern with copyright also applies to openly licensed data, which can prohibit use for commercial purposes, as with the lawsuit against Github CoPilot, a code generator.