Blog

Text-to-SQL accuracy that beats GPT-4

by Keith Parker

February 16, 2024

SambaNova and Numbers Station is proud to announce the release of the SambaCoder-nsql-Llama-2-70B model, which delivers text-to-sql accuracy that beats GPT-4. This enables customers to access the valuable information locked in their databases faster and easier than ever before.

Generative AI is enabling organizations to enter new markets, release products faster, better engage with customers, and more. One of the ways generative AI is achieving all of this is by enabling employees to extract insights from data faster and easier than ever before.

Enterprise organizations typically store their data in SQL databases and employees utilize BI tools to extract information from that data. These tools help employees query the databases and make sense of the data presented. Typically employees work with data professionals who then script SQL queries that pull information from the database. The data professional may then have to help interpret the data so that it is comprehensible to the requestor. This is a complex, time consuming process that delays the employees ability to access the information they need to complete their task.

But what if the employee could simply ask a question, in natural language, and get a meaningful response? What if they could quickly and easily get the information they need, in a meaningful and understandable format?

Now, through the use of a state-of-the-art text-to-SQL model, Numbers Station and SambaNova are enabling this capability. With industry leading accuracy, this model dramatically simplifies the process of querying the database, making non-technical users more productive, more self-sufficient, and supplying them with better data.

To date, despite the ability to provide significant productivity gains through self-service data analytics, enterprises have been hesitant to adopt text-to-SQL generative AI solutions. The reasons for that hesitancy include concerns about data privacy, accuracy of results, and model ownership.

Generic generative AI models are trained on public data. This makes them very good at responding to queries on general knowledge topics, but they can struggle to answer questions that are specific to an organization. A solution to this is to fine tune the model with the organization's internal, private data. This makes the model aware of the private internal content, but it also makes that data part of the model. If the model is owned by another organization, that can put the data at risk.

This is particularly important for text-to-sql solutions, since the majority of data found in enterprise SQL databases is internal to the organization, providing access to a third party model provider will likely violate data governance policies and, for public companies and those in regulated industries, may not be legal.

SambaNova solves this by delivering the latest open source models that can be easily fine tuned with customer data. Once the model has been trained on that data, the model becomes the property of the customer in perpetuity. Since the customer owns the model, the model can achieve high accuracy and there are no concerns about data privacy.

In this way the model from Numbers Station and SambaNova has shown accuracy that exceeds that of GPT-4.

Learn more about how SambaNova is delivering highly accurate models, with the data security, data privacy, and ownership that is required by the enterprise.

← SambaCoder-nsql-Llama-2-70B model

Samba-CoE-v0.1 - Our Latest Breakthrough Model coming this week! →