Saturday, November 23, 2024

Open Source AI Definition nears final version as first release candidate is announced

The process of coming up with an official Open Source AI Definition has been progressing along, and now the Open Source Initiative (OSI) — the group that has been spearheading this effort — has announced Release Candidate 1 for the definition. 

The OSI started the process of creating this definition back in 2022, and for the past year it has been traveling the world to gather feedback and further input on the draft definition. 

The group wrote in its announcement that this release was the result of “lots of community feedback: 5 town hall meetings, several comments on the forum and on the draft, and in person conversations at events in Austria, China, India, Ghana, and Argentina.” Now that this first release candidate is available, any future updates will only be bug fixes, not actual new features.

Generally, the Open Source AI Definition specifies that an AI system is open source if it meets the following criteria:

  • Anyone can use it for any purpose without needing to ask permission
  • Anyone can study how the system works or inspect its components
  • Anyone can modify the system for any purpose
  • Anyone can share the system with or without modifications, for any purpose.

According to the OSI, there have been three changes since the last release, all relating to the “preferred form to make modifications to a machine learning system.”

First and most notable is that there is new language around Data Information to clarify that training data must be shared and disclosed.

Second, it now specifies that code must be complete enough that downstream recipients can understand how training was done. “Training is where innovation is happening at the moment and that’s why you don’t see corporations releasing their training and data processing code. We believe, given the current status of knowledge and practice, that this is required to meaningfully fork (study and modify) AI systems,” the OSI wrote. 

Third, new text specifies that “it is admissible to require copyleft-like terms for any of the Code, Data Information and Parameters, individually or as bundled combinations.” For example, a consortium that owns the rights to training code and the dataset could distribute them both in a way that bundles them together with copyleft-like provisions. 

In its announcement, the OSI further reinforced the idea that the goal of open source (and also open source AI) isn’t to enable reproducible software, but rather to give anyone the ability to fork a system. 

“This is why OSD #2 requires that the “source code” must be provided in the preferred form for making modifications,” the organization wrote. “This way everyone has the same rights and ability to improve the system as the original developers, starting a virtuous cycle of innovation. Forking in the machine learning context has the same meaning as with software: having the ability and the rights to build a system that behaves differently than its original status. Things that a fork may achieve are: fixing security issues, improving behavior, removing bias. All these are possible thanks to the requirements of the Open Source AI Definition.”

Going forward, the OSI will be focusing on creating the Open Source AI Definition’s documentation, Checklist, and FAQ. The official 1.0 release is expected on October 28. 

Related Articles

Latest Articles