Choosing the right generative AI model is one of the most important decisions that an organization will make. Selecting the right model can set the organization up for success by creating competitive advantage, streamlining processes, accelerating product development, increasing customer satisfaction, and by growing in value over time. Organizations that select the wrong model can open themselves up to claims of bias that result in litigation, put their private data at risk, and turn what could have been an incredibly valuable asset into just another expense.
Critical to this decision is deciding whether to use an open source or closed source model. Closed source models are typically created by a single organization for the sole purpose of licensing that model to commercial, governmental, or other organizations. The model developer builds, trains, and markets the model as their product. Changes, updates, and support for these products are provided solely by the developer organization and are usually released based upon a schedule set by the organization. This is the IP that they have created and which the profitability of their company is predicated upon. An example of this type of model is ChatGPT, which is owned by OpenAI.
By contrast, open source models are created collaboratively by the open source community. No one individual or corporation owns these models and they are publicly available. Any individual or organization that wishes to acquire these models, can do so by downloading them from a repository, such as Hugging Face or GitHub. Updates, changes, and support are provided by the open source community and typically happen very quickly.
For the enterprise, there are a few key differentiators that must be taken into account with these types of models. These include model training and explainability, ownership, and data privacy.
Model Ownership: Building an asset vs. using a tool
Model ownership is one of the most important issues when selecting a model. With a closed model, the company that built the model usually owns it. GPT-4, for example, is owned by OpenAI. That is their product and OpenAI monetizes the model by licensing it. If an organization chooses to license GPT-4 as their AI model, OpenAI will normally continue to own it, even if the licensing company further trains the model on their private data.
With an open model, there is no ownership of the base model. For example, if a company uses an open source model as part of SambaNova Suite, the model can be fine tuned with the customers internal data. Once the model has been fine tuned with that data, it becomes a unique model, it is removed from the public domain, and the end customer takes ownership of the model in perpetuity.
This is a particularly important point, as both closed and open source models can be further trained on private data. As the model is further trained over time, it will become more valuable to the organization. With an open source model, that means that the model the company owns becomes an asset that increases in value over time. If it is a closed model, then it becomes a liability. The model still becomes increasingly important to the organization, but since they do not own it, this only creates vendor lock in.
Model Training: The importance of transparency
Both closed and open source models are often pre-trained on a publicly available data set. With a closed model, neither the data that the model has been trained on nor the model weights are typically disclosed. Any organization that uses the model will have no insight into the data that the model was trained on or how that data is weighted in making decisions.
By contrast, open source models freely make the training data sets and model weights available. This means that when an open source AI model provides a result, such as whether to extend credit to one individual over another, the organization can better demonstrate why that decision was made. Organizations that use closed source models cannot do that.
Data Privacy: Ensuring the security of sensitive and confidential data
Both types of models can be fine tuned using customer data. This is a critical step in customizing a model to a particular organization. Models can be pre-trained to do things such as perform accounting functions, track orders, or write legal contracts. To do those things effectively means the model has to understand part numbers, product names, customer accounts, and more. Learning that is what happens when the model is fine tuned and involves using an organization’s valuable internal data.
With SambaNova Suite, customer data is always protected. With an on-premises deployment, the data never leaves the customer data center. Customers that choose a cloud-based deployment take advantage of a dedicated training backbone, so data is never shared.
After multiple public examples of exposing private data, leading closed source providers moved to protect training data. While data used to further train these models is much safer, the models are still held by external, private organizations. In the event that the model vendor is acquired or ceases operations, the disposition of that private data is in question.
Summary: Open source is the best choice for the enterprise
Ultimately, choosing between open source and closed source models is all about model and data ownership. If an organization wants to have insight into why a model responded to a prompt in a particular way, to own their own models, which become assets and grow in value over time, and to maintain control over their valuable, internal data, then the only option is to choose an open source model. If those things are not important to the organization, then a closed source model may be the better choice.
To learn more about model ownership with SambaNova, click here.