Blog

SambaNova CEO explains why only one AI company wants a monopoly

Written by Rodrigo Liang | June 14, 2024

Veteran tech journalist Don Clark of The New York Times interviewed SambaNova Systems CEO Rodrigo Liang before a live audience at the GenAI Summit in San Francisco. Here’s a summary of their conversation, edited for length and clarity.

Watch the full interview with Don Clark

 

Don Clark: When I started writing for The Times in 2017, my first story was about Nvidia as it reached a $100 billion market cap. Today, it’s about $2.3 trillion. And there are more than 200 AI chip startups that are trying to take its business. What’s your view on why they haven’t had much impact so far?

Rodrigo Liang: Over the last few years, the capabilities that AI is bringing are undeniable. It is a transformative technology, and we're just getting started. Every aspect of business and every aspect of our lives is being transformed, and you see new things coming up every day. Nvidia is the first player to really see the benefits of that new workload. But others like SambaNova are coming in with some new capabilities that will really open up some doors.

But why haven’t they had much impact yet? 

If you watch how the semiconductor world operates, it takes a long time for workloads to get deployed into production. Nvidia has the advantage of 25 or 30 years of history. So we’re seeing today the benefit of their early start. Now, with companies like SambaNova, we’re opening up new capabilities that can’t be achieved with Nvidia GPUs. You’ll start seeing companies like ours gain traction in specific types of AI workloads like fine-tuning, large-scale inferencing, and production inferencing. Also, we’re reaching a stage with AI where we’re moving from doing things in the lab to doing things in production and inferencing for everyday use. 

What else has changed in the environment? We now have large language models and trained language models being distributed. That's one thing you're counting on, I believe.

When we started the company in 2017, there was a lot of attention on inventing new models.  Now, there's been this incredible improvement in open source models. You look at the latest Llama 3 model that Meta put out, and you see the quality of these models that are available to all of us is just tremendous. So, for the next phase, people are asking how they can customize it for themselves. They’ll want to customize it with their private data and use that data to generate value. They’ll want to put their own data into pre-trained models and then see if they can create something that no one else has, fine-tune it, and then quickly deploy it.

I didn't really set the baseline by asking what SambaNova actually does. I know you've got chips and you've got systems. Can you give us a sense of your technology and why it's different?

SambaNova is a full-stack platform company. We're focused on delivering AI computing to the enterprise. Our bookends are these: We build our own AI chips at one end, and at the other we build you a pre-trained trillion-parameter model called Samba-1. And we give you that model as the starting point for you to fine-tune your private data. And so we're focused on enterprises so they can own their own AI destiny. You can take your private data, train it privately into these models, and then inference these models privately wherever you want and own that model in perpetuity. 

And I think you sometimes operate your hardware in other companies’ data centers, right?

Our form factors are extremely dense. We can collapse what otherwise would be on hundreds of GPUs or hundreds of other chips into a very small number, which allows us now to deploy the technology anywhere you want. Eighty-three percent of enterprise data is on-premises these days. And so we have to ship our technology on-premises to run the trillion parameter model on-premises for our customers. We have customers in the cloud, we have customers in colos, we have customers on-premises with us. We're agnostic about it, but we’re trying to get the hardware platform as close to the data as possible, so you can fine-tune it and then have your own custom model.

And you have systems at the U.S. national labs. What kind of work are they doing there?

Yes, we're really proud of the fact that we're the most deployed AI startup in the US national labs. And so we're doing a broad range of things with organizations like the Argonne National Lab and the Lawrence Livermore National Lab. We're training some of the largest models, whether for scientific discovery, for materials science, for drug discovery, and a broad range of applications in the sciences. We're building some of the largest models for some of the public sector organizations to do discovery on data intelligence. And so you can actually extract information out of a huge amount of data that you already have, but you don't know what it says. And so the ability to actually do discovery using LLMs is incredibly powerful, but we're incredibly proud of the fact that we've got all this collaboration with the US government and are able to deploy our hardware in production.

Talk just a little bit about your architecture. You have a dataflow architecture. Can you describe what the benefits of it are?

The chip design world has for years focused on cores, and cores are much more rigid structures that force you to carve out the software to feed each core. SambaNova is about dataflow. And today, with dataflow, we’re reconfiguring the hardware to match the flow of the data and the neural net. When you look at SambaNova, why can we train, why can we fine-tune, why can we generate inferencing results of 1,000 tokens per second on Llama 3 at 16-bit? This is the world record number one today, 16-bit precision Llama 3, over a thousand tokens per second on a 16-chip box that you can own. You can get that because dataflow allows you to do that. We can reconfigure the hardware. We can reallocate those resources. If you're in inferencing mode, we can put those resources to work for inferencing. If you're training a trillion parameter model, you can reconfigure the hardware to use those resources for training. And that's how you get world record-setting performance with the smallest and most compact footprint possible.

I thought it was quite interesting that you were able to make these improvements in doing Llama 3 models. What was the process like, getting it to go faster?

We're getting these results on our fourth-generation processor. It's called a Reconfigurable Dataflow Unit or RDU. The RDU was released back in September of 2023. When Llama 3 showed up a few weeks ago, we got results very quickly on our trillion parameter model. Samba-1 is a trillion-parameter model that has all the best open source models included in it. Within hours, we had Llama 3 inside Samba-1. At first the performance was somewhere between 300 and 400 tokens per second. But as you understand the structure of the neural net, you can better understand which resources should be devoted to different layers of the neural net in order to offset some of the bottlenecks. Once you see the bottlenecks, you can redistribute the on-chip resources and improve overall performance. And so, within weeks, we’ve seen the performance jump to over a thousand tokens per second on just 16 chips, with full precision, in a single box.

Do you have a guess what Nvidia H100s, how long it would take them, or I should say how many tokens they would do?

Well, I think many people here  are probably much more familiar with using the H100.  I think they're probably somewhere on the order of a couple hundred tokens per second. But look, this is something where there are fundamental challenges that a GPU structure has in inferencing at scale. The market’s trend lines are such that as you go into production, you have to inference fast, but you also have to inference efficiently. What we’re doing at SambaNova is not just hosting one model on 16 chips while others are hosting one model on hundreds of chips. We’re hosting a thousand models on a single 16-chip system and running them at a thousand tokens per second. And that’s something Nvidia can’t do. 

Your target is the enterprise. Give us a little rundown on what enterprises are really looking for as far as their generative AI needs.

I say enterprises are focused on — and I actually think that all of us here will sooner or later be in the same position — our number one need is ownership of our information and of our data. All of us will be training our data in a private model, and you will want to own that model. And so that's what we do for our customers. We show up with hardware, with software, and with models pre-trained. We roll it into your private environment, and we can give you a starting point that you can then fine-tune. And you own that model in perpetuity. And that's ultimately what enterprises want. They want ownership of the model. They want ownership of their data. They want security while training, and they want  to run it really, really fast and really, really efficiently. And we focus maniacally on delivering that. Ultimately, we tell everybody to own their AI destiny.

You've talked about other kinds of opportunities, for instance in other countries. I know we were just saying that you've been traveling to the Middle East quite a bit lately. What's going on there, and why are they so hot on AI and technologies like yours?

We have hardware physically deployed in the Kingdom of Saudi Arabia, powering Saudi Aramco, the number one energy company in the world. And so we've been running there for nearly a year now, powering Saudi Aramco's internal LLM called the Metabrain. And so this is something that we've gone from just a few hundred users to thousands of users within a few months. But it's a perfect example where 90 years of data has been trained into a model that we trained for them, and it's been deployed today for anybody to use. So we think that's a very good example of how enterprises will leverage the data they've accumulated over many, many years and try to create better services, better products, better productivity, better processes, all with data that they already have.