The AI coding assistant provider Tabnine today announced a new feature, Code Provenance and Attribution, to protect companies from unintentionally adding restrictively licensed code into their codebase when using generative AI to write code.
The new feature checks AI-generated code against public GitHub repositories to find matches, and flags the license type of the original code from GitHub.
According to Tabnine, this new feature will help software development teams understand if the code generated by AI models meets their standards and requirements.
“State-of-the-art LLMs like Claude 3.5 Sonnet and GPT-4o have greatly improved the performance of generative AI applications, including AI code assistants. However, these LLMs are trained on vast amounts of data collected from all corners of the internet, including code that may have restrictions on how it can be used, introducing the risk of IP infringement. Since the copyright law for the use of AI-generated content is still unsettled, engineering teams at enterprises want to strike a balance: leveraging the performance gains that come from these powerful models while minimizing the likelihood of copyleft-licensed code getting in their codebase,” Tabnine wrote in a blog post.
Tabnine had already offered a license-compliant model trained only on permissively licensed code, but this new feature will enable users to leverage a variety of other models too, such as Anthropic’s Claude, OpenAI’s GPT-4o, and Cohere’s Command R+.
The Code Provenance and Attribution capability also supports other development activities on top of code generation within Tabnine, including fixing code, generating test cases, and implementing Jira issues.
The company is also working on expanding this capability to enable users to specify repos to check against, such as a competitor’s code. It will also add a censorship capability that removes matching code before the developer sees it.
Code Provenance and Attribution is currently available as a private preview for all Tabnine Enterprise customers. Tabnine will also host a webinar on January 9 at 11 AM ET / 8 AM PT to dive into the capability further.