Open source licenses need to leave the 80s and evolve to address AI

Open source licenses need to leave the 80s and evolve to address AI

Opinion Free software and open source licenses evolved to handle code in the 1970s and 1980s. Today it has to transform itself again to deal with AI models.

AI was born out of open source software. But free software and open source licenses, based on copyright law, to manage software code are ill-suited to the neural networks and large language model (LLM) datasets that power AI open source software. Since many programming datasets, in particular, are based on free software and open source code, something needs to be done. And that’s why Stefano Maffulli, executive director of the Open Source Initiative (OSI), and a host of other open source and AI leaders are working to combine AI and open source licenses in ways that make sense for both.

Lest you think this is some sort of theoretical and legal discussion with no real-world impact, think again. Considering J.Doe 1 et al vs GitHub. The plaintiffs in this case in the Northern District Court of California allege that Microsoft, OpenAI and GitHub, through their AI-powered commercial system, OpenAI’s Codex and GitHub’s Copilot, had ripped their open source code. The result? The plaintiffs allege that the “suggested” code consists of often nearly identical copies of code pulled from public GitHub repositories, without the required open source licensing attributions.

This case continues. The amended complaint includes allegations of violation of the Digital Millennium Copyright Act, breach of contract (open source license violations), unfair enrichment and unfair competition claims, and breach of contract (selling licensed materials in violation of GitHub policies).

Don’t think that this kind of cause is just a Microsoft problem. It is not. Sean O’Brien, professor of computer security at Yale Law School and founder of the Yale Privacy Lab, told my colleague David Gewirtz: “I think there will soon be a whole sub-industry of trolling that mirrors patent trolls, but this time it’s about artificial intelligence generated works. A feedback loop is created as more authors use AI-powered tools to deliver code under proprietary licenses. Software ecosystems will be polluted with proprietary code that will be subject to cease and desist requests from enterprising companies.

He’s right. I’ve been dealing with patent trolls for decades. I guarantee the license trolls will come after “your” ChatGPT and Copilot code.

Some people, like Felix Reda, a German researcher and politician, claim that all code produced by artificial intelligence is in the public domain. US attorney Riccardo Santalesa, a founding member of the SmartEdgeLaw group, noted to Gewirtz that there are issues with contract law and copyright. They are not the same thing. Santalesa believes that companies that produce AI-generated code “as with all of their other IP, will treat the materials you provide, including the AI-generated code, as their property.” In any case, however, public domain code is not the same thing as open source code.

On top of all that, there’s the whole question of how datasets should be licensed. There are many “open” datasets with numerous open source licenses, but they are usually not suitable.

In our conversation, Maffulli of the Open Source Initiative explained how the various artifacts produced by artificial intelligence and machine learning systems fall under different laws and regulations. The open source community must determine which laws best serve their interests. Maffulli compared the current situation to the late 1970s and 1980s, when software emerged as a distinct discipline and copyright law began to apply to source and binary codes.

Today we are at a similar crossroads. AI programs like TensorFlow, PyTorch, and Hugging Face Hub work well under their open source licenses. The new AI artifacts are another story. Datasets, models, weights, etc. they don’t exactly fit the traditional copyright model. Maffulli argued that the tech community should come up with something new that better aligns with our goals, rather than relying on “hacks.”

Specifically, open source licenses designed for software, Maffulli noted, may not be the best fit for AI artifacts. For example, while the broad freedoms of the MIT license could potentially apply to a model, questions arise for more complex licenses such as Apache or the GPL. Maffulli also addressed the challenges of applying open source principles to sensitive industries such as healthcare, where data access regulations pose unique barriers. The short version of this is that medical data cannot be open sourced.

At the same time, most commercial LLM datasets are black boxes. We literally don’t know what’s inside. So we end up, as the Electronic Frontier Foundation (EFF) states, in a situation where we have “Garbage In, Gospel Out”. We need, concludes the EFF, open data.

This is how the OSI, said Maffulli, together with Open Forum Europe, Creative Commons, Wikimedia Foundation, Hugging Face, GitHub, Linux Foundation, ACLU Mozilla and the Internet Archive are working on a draft to define a common understanding of open source AI principles. This will be “critical in conversations with legislative bodies”. Even now, government agencies in the EU, US and UK are struggling to develop AI regulation and are sadly ill-equipped to deal with the problems.

Stefano concluded by saying that we should start with “a return to basics,” the GNU Manifesto, which predates most licenses and establishes the “northern star” for the open source movement. Maffulli has suggested that his principles remain surprisingly relevant when applied to AI systems. By focusing on first principles, we will be able to better navigate this complex intersection of AI and open source.

#Open #source #licenses #leave #80s #evolve #address

Previous articleHow would disrupting net neutrality hinder startup growth and innovation?
Next articleServerless Computing Platforms Market is Projected to Represent a Significant CAGR of +23% by 2030 Major Players Cloudflare Workers, Google, Alibaba


Please enter your comment!
Please enter your name here