robot reading

On-the-ground AI Legal Issues

by VanL

It seems everyone is scrambling to understand what to do about AI. Based on our work with our clients, here are the emerging issues and best practices.

The key best practice: create a "strike team" for AI issues

AI is being integrated into organizations "from the bottom up," just like open source software was twenty years ago. If your organization is of any substantial size, it is almost certain that various AI tools are already being used. So the most important tip isn't legal or technical, it's social: Set up a "strike team" that has an Service Level Agreement (SLA) for reviewing and responding to questions about AI uses. Best practices are changing fast, so you want a policy of "ask us first." To make a policy like that work, leadership needs to be an enabler, not a blocker, so that it is easy and fast to engage on these issues.

Practically, a lot of uses of AI and AI applications can be allowed. If something helps productivity and it is not going to cause the loss of significant trade secrets, default to saying yes. It will make things a lot easier when you need to say "no."

Current legal issues with AI

In terms of the main legal issues with AI, some are widely known and others are not often discussed. They are:

1. The generation of possibly-infringing material: This is basically the risk of copyright infringement. This is a known issue, especially for code-generating facilities like GitHub Copilot. Code generation is more likely to center on a smaller number of known patterns for solving particular technical problems, which can lead to inadvertent copying. Even though CoPilot has the ability to detect and prompt when it sees code similar to an input, we have observed the generation of longer memorized code portions via successive prompts. Moreover, Microsoft/CoPilot do not take any responsibility and do not meaningfully indemnify for copyright infringement.

To mitigate these risks, we strongly recommend using snippet scanning on all AI-generated output. Right now we recommend getting started with ScanOSS, as it is lowest-cost to integrate. Another tool is to write code with CoPilot using a test driven development methodology. Technical integration is best performed by integrating with your automated build system.

2. Trade secret loss/leakage: Many people are aware that OpenAI may use your inputs to train its models if you use their web-based interface. However, the biggest risk in terms of trade secret loss comes from the Confidentiality terms of OpenAI and Anthropic.

We have talked about OpenAI's terms of service before, highlighting OpenAI's attempt to prevent any of its users becoming competition. Almost all of the big players do the same now. But what most people don't pay attention to are the Confidentiality terms.

In particular OpenAI and Anthropic's confidentiality provisions are one-way – i.e., their stuff is confidential, yours is not. Accordingly, make sure you don't use (directly or indirectly) any OpenAI or Anthropic service with third-party data you have promised to keep confidential. You may be violating contractual terms from your vendors or partners if you do, even without knowing it. Similarly, don't expose any valuable trade secrets to these companies, even through their APIs. It may lead to loss of trade secret status.

One way to mitigate this issue is to use Microsoft Azure, which has better terms and conditions for businesses. If you are already using and trusting Microsoft with business data, it seems reasonable to trust them in this regard as well.

3. Loss of (or giving away) data rights: Many companies will want to train on broad data in order to make their models more powerful. Based on available MS documentation, it appears that CoPilot is not doing so right now, but this is something to watch for with every vendor. Your vendors may not be trying to use your data to train their models right now, but that doesn't mean that they won't try to gain the ability to do so through a seemingly benign "We're updating our terms of use" email. Also, make sure to integrate "we own our data" terms into your standard contractual forms.

4. Barriers to competition/non-compete clauses in terms of use: This is something to watch out for. We have seen this in almost every commercial provider's terms - you can't use their services in any way that might advance the development of another model. These terms can have substantial business effects. Make sure that you separate any internal model development from existing services. Also use a proxy to make it so you can switch between backends.

5. Copyrightability of outputs: This is further off for many, but the current situation in the law is such that AI-generated outputs are not being recognized as copyrightable by the Copyright Office unless you can prove substantial human modification of the output. If you are going to need to register and enforce a copyright on something that is partially AI-generated, make sure you have a good record of what humans did to participate in the creation of the work.