Precautions When Using ChatGPT for Business – 1 – Copyright and Plagiarism | ブログ

Since its public launch in November of 2022, ChatGPT has opened the eyes of many to the wealth of possibilities that neural machine learning and automatic text generation can provide for a variety of business practices. Its widespread availability and natural-sounding text generation in a variety of writing styles and on a variety of topics (accuracy notwithstanding) quickly captivated the masses. It’s been over a year since then, and now that we’ve all had time to overcome the initial hype and really test the features in a thoughtful and targeted manner, potential uses for this new technology have crystalized, but so have a number of practical challenges and limitations.

One particularly tricky topic is that, due to the lightning-fast speed at which this technology has developed, laws surrounding the production and use of AI and computer-generated content have yet to be established. For those considering the integration of ChatGPT or another AI tool into your daily business practices, I want to offer a bit of context for some of the current legal struggles that may shape how AI-generated content can be used and how the trajectory of that technology’s development may be forced to change in the future.

1. Current Copyright Law

In August of 2023, a U.S. District Court upheld a prior ruling which states that works created without human input cannot be copyrighted. The judge in the case ruled in favor of the U.S. Copyright Office, which rejected an application by computer scientist Stephen Thaler to copywrite a piece of AI-generated art on behalf of his DABUS AI system. Thaler experienced similar losses when trying to secure patents for inventions that he says were also created by DABUS, despite applying for these patents in multiple countries. The international consensus seems to be that human authorship is a foundational requirement for obtaining ownership rights over a piece of work.

So what does this mean for entrepreneurial applications of AI? At present, the law would imply that if the product that you want to release is AI-generated, that work can be copied and re-released by absolutely anyone, even without your permission and without recompense, because no person can claim ownership or copyright over an AI-generated item.

That is subject to change, of course. From late 2021 to early 2022 the UK government held a large consultation on artificial intelligence and IP, and the dominating opinions were that either AI-generated content should continue to be excluded from copyright or that ownership should belong to the users. But as automatically generated results continue to improve, if the AI itself were to be legally recognized as a creator of works, the developer of that AI system could conceivably claim the copyrights for all content generated by the AI, not the person or business using the AI, unless alternate agreements were struck in advance.

Now, it is worth noting an important caveat, that the cases above were decided over creations made “without human input,” and therefore, sufficient human editing of the work generated by the AI might qualify it for the protections of copyright. But, if there is a minimum degree of human input that is necessary to meet said qualification, the level of human contribution compared to the scope of what was automatically generated has yet to be litigated, aside from an understanding that providing the prompts without altering the results generated by AI is not sufficient for obtaining copyright.

2. Privacy Concerns

In June of 2023 and then again in September, at least two class action lawsuits were filed against OpenAI, the developer of ChatGPT, in the U.S. federal court circuit of California for the theft and misappropriation of personal data from hundreds of millions of unknowing individuals. The claims assert that the broad data scraping methods that are used to train these large-scale generative AI systems are in violation of multiple privacy laws. The indiscriminately acquired data apparently includes “personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge,” which the AI then uses for unauthorized purposes and without compensation in the course of generating text. If this is true, not only is the legality of current AI development in question, but the ethicality as well.

3. Plagiarism

Similarly, multiple civil claims have been filed against OpenAI for copyright infringement and plagiarism. Most notable are the New York lawsuit filed on behalf of authors including George R.R. Martin and John Grisham in September of 2023 and The New York Times’ suit filed in late December of the same year, though these are far from the only groups suing OpenAI on copyright grounds.

Due to that same indiscriminate data scaping that I mentioned above, entire books, scripts, journals, news articles, and other creative works have been lifted from the internet, utilized as training for chatbots, and under certain circumstances, can be reproduced verbatim by the system. The plaintiffs claim that this is an infringement on their rights as copyright holders and impacts their revenue. In contrast, statements from OpenAI assert that the utilization of these texts should fall under “fair use,” because the process of copying them is an intermediate step for a distinct and unrelated purpose that copyright lawyers would call “non-expressive.”

According to OpenAI, the memorization and exact replication of copyrighted works is “a rare bug” that it and other AI developers would ideally like to prevent, especially because the generation of plagiaristic content during ordinary use of their AI systems severely undercuts the fair use argument. Unfortunately, research submissions have shown that alignment techniques utilized as recently as November 2023 do not prevent pure memorization of the training data, and when these systems are manipulated, training data can be extracted from even closed AI models like ChatGPT at a surprisingly high rate.

As a reassurance, users of ChatGPT and other generative text engines are not really at risk of legal backlash from obtaining plagiarized results from AI output. Only the developers are being held responsible at this time. However, plaintiffs in these cases are demanding that any models trained off of their stolen works be scrapped, a verdict that would force a hard reset on the affected AI’s development. If it is determined that AI training does not constitute a fair use of copyrighted works, then training will have to be readministered with only works found in the public domain and those for which the developer has obtained explicit permission, which would be incredibly limiting and, according to those invested developing the technology, could have a cataclysmic impact on the quality of the generated text.

Conclusion

Appropriate uses for ChatGPT and other AI technologies are still under scrutiny, and we mustn’t let excitement for this burgeoning field override our ability to critically observe the circumstances of its creation and potential repercussions of relying on this technology as an integral aspect of our future business models.

The inability to obtain copyrights on AI generated content discourages the use of AI for creating products that one wishes to maintain exclusivity over and to sell, unless extensive human editing is planned. In addition to that, the biggest and most expansive AI text generators are currently under intense pressure from various lawsuits to amend their methods and address privacy concerns by rebuilding their models with a much more restricted collection of training data. The future is in flux for these big AI ventures, and we the users have very little control over how such models choose to (and are allowed to) be developed.

Of course, that’s not to say that ChatGPT and AI on the whole are necessarily risky investments; neural machine learning and AI-driven processes have been around for many years and are being developed with a growing fervor. Generative AI systems are technological tools, whose potential is not yet fully realized. It’s easy to get caught up in the hype as companies race to stay at the forefront of new tech. But by taking a moment to understand the potential risks and shortcomings of these systems, we can apply these tools more effectively and assuredly in both the short and long term.

For more on how AI and neural machine learning have had an impact of translation in particular, see “Machine Translation in the Global Market” and “Has Machine Translation Reached Its Limit.”

Sources:

https://www.reuters.com/legal/ai-generated-art-cannot-receive-copyrights-us-court-says-2023-08-21/
https://theconversation.com/chatgpt-what-the-law-says-about-who-owns-the-copyright-of-ai-generated-content-200597
https://edition.cnn.com/2023/06/28/tech/openai-chatgpt-microsoft-data-sued/index.html
https://www.reuters.com/legal/litigation/openai-microsoft-hit-with-new-us-consumer-privacy-class-action-2023-09-06/
https://www.theverge.com/2023/9/20/23882140/george-r-r-martin-lawsuit-openai-copyright-infringement
https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
https://www.ign.com/articles/amid-an-increasing-number-of-lawsuits-openai-says-its-impossible-to-train-chatgpt-without-copyrighted-material
https://www.theatlantic.com/technology/archive/2024/01/chatgpt-memorization-lawsuit/677099/ https://doi.org/10.48550/arXiv.2311.17035