Machine learning model

Model Licensing for AI - Open or Not?

by VanL

One hot topic that keeps coming up with our clients is how to deal with AI models and their associated licenses. Many ML model licenses are inspired by open source licenses, so OSPOs are being brought in for their expertise. Today's topic is how to think about licensing out your own models and datasets if you want to encourage collaboration, but possibly preserve competitive advantage.

If data is the new oil, models are the new gasoline

You might have heard the saying that "data is the new oil." It's meant to convey that data is the raw material that can be refined into all kinds of products. If we accept that comparison, then models are the new gasoline - data that has been refined and processed into an immediately usable form.

When companies create their own models, they frequently want to publicize them. This is both for PR purposes and to foster possible collaboration with researchers in industry and academia. But many companies don't want to give up control of the model or hand a potential tool to a competitor. Accordingly, most publish their models and data under CC-NC licenses (Creative Commons Non-Commercial) licenses.

This tension between openness and proprietary advantage is nothing new for those in open source. The conversations around AI datasets and models is strikingly parallel to the conversations that many people had around software source code in the 1990s and 2000s. Over time, the use of open (and commercially-available) models and datasets is going to predominate. However, for now, many companies want to adopt -NC licenses themselves, at least for part of what they release.

Work backwards from your business goal

When evaluating AI model licensing, the analysis usually needs to do two things: 1) Break down the release into its constituent parts, and 2) evaluate the overriding business purpose and license according to the primary purpose.

Breaking down the release

The release usually has several different parts, including the paper, the code, the neural network design, the dataset, and the weights. Each of these can have a different licensing. Instead of treating the release as a binary non-commercial or not, you can apply strategies to different subparts.

Evaluating the overriding business purpose

There is no perfect licensing scheme. They all have tradeoffs. And trying to hedge across multiple business purposes usually fails at accomplishing any of them. Instead, the most successful releases have acted according to the primary business purpose and accepted (or embraced) the tradeoffs to make the most of their action.

Typical business purposes associated with a release are a) encouraging distribution and use by others; b) publicity and goodwill; and c) preserving proprietary advantage based on a unique asset. Each of these require different strategies.

** Encouraging Collaboration: ** If the primary purpose is encouraging distribution and use by others—collaboration, creation of an ecosystem, etc—then the best strategy is to license the release (or elements of the release) as permissively as possible. People will pick up your release when it provides value, and unrestricted releases have much more value than restricted releases. An example is Stable Diffusion. For a while, DALL-E was the most interesting research target. Once Stability.ai released Stable Diffusion, however, the higher availability and permissive licensing led those who were initially more interested in DALL-E to shift their efforts to Stable Diffusion.

** Publicity: ** If the purpose is publicity and goodwill, you can take a middle course. Large amounts of publicity and goodwill can be created via a permissive release, but that is not necessary – see ChatGPT. What is needed to maximize publicity is to pair the release with a communications and marketing strategy. A bare release is unlikely to attract the necessary attention.

** Proprietary advantage: ** If the purpose is keeping proprietary advantage, then noncommercial licensing is the best course. It allows collaboration with many researchers, particularly those in universities, but keeps the fruits of the research for you alone. Many papers from Google Brain or Deepmind have followed this strategy. You can also identify if the business value is closely tied to one part of the release, such as the weights. The weights can be kept back or licensed only noncommercially while allowing other parts to be licensed more permissively.

Keep your eye out for commoditization

Non-commercial licensing may be the right decision in the short run. But, just as in open source software, large foundational AI models have costs that can be spread across many parties. I predict that within two years, many researchers will consolidate around a number of open foundational models and proprietary foundation models will become increasingly expensive to maintain and advance. If you see commodization coming in your area, think about jumping in early so as to gain an early-mover advantage.