In late February, we announced Samba-1, a CoE architecture that is a paradigm-shift, and will ultimately become the de facto architecture for enterprise AI. In fact, Matt Eastwood from IDC stated that Samba-1 is “symbolic of the rapid progress in the AI LLM landscape today.”
We’re on a mission to prove this by demonstrating state-of-the-art model accuracy with blazing speeds that are only achievable by running Samba-1 on SambaNova Suite.
Last week, we spun up an instance of Samba CoE v0.2, which is a subset (a demo version of models and routing) of the next version of Samba-1 that we will make available next month. Samba CoE v0.2 is optimized for general purpose chat and reached #11 on the Alpaca leaderboard, running at a rate of 330 tokens per second for a batch size of one; this gets significantly more efficiency with batch inference. We are running this on a single SambaNova Suite node, which comes with 8 RDUs.
Our speed for batch size of one is impressive, but this is not all that matters. Unlike other LLM stack providers:
- We do so on a single 8-RDU node (one-quarter rack), 72 times smaller than the 576 chips needed by others.
- We can run hundreds of expert models on that single node without any performance impact (e.g. Samba-1 currently includes 54 expert models)
- We run at full precision and do not quantize the model
- Our RDUs are also able to train these models with state-of-the-art performance, not just inference
With Samba-CoE v0.2 we’ve taken a subset of our next Samba-1 release, and iterated on different ways of composing these experts together to drive the highest performance. We’re climbing on the AlpacaEval leaderboard, outperforming all of the latest open-source models in the general purpose benchmarks. Our latest version (Samba-Coe v0.3), which will be made available in the next few weeks, outperforms the state-of-the-art open-source models.
As well as the general LLM benchmarks, Samba-1 also excels at specific tasks, domains and languages, as shared in our original Samba-1 release last month.
Next for us is to make the new general purpose chat LLM leader Samba CoE v0.3 available to try (the current demo above is v0.2) via our partner Lepton.ai. We will also incorporate Samba CoE v0.3 into our next Samba-1 release next month.
And 330 tokens per second is just the beginning - we will keep pushing the frontier of accuracy, breadth, and speed, proving that a CoE is the architecture of enterprise AI and RDUs are the only way to run them. Stay tuned!