Blog

Fastest AI Inference with Top Open Models - SambaNova Cloud

by Keith Parker

September 11, 2024

The fastest AI inference in the industry is available today for free on SambaNova Cloud. Our platform delivers world-record performance on Llama 3 8B, 70B, and 405B - enabling developers to build meaningful, AI powered applications with ease. Unlike other providers, SambaNova’s high-speed cloud service is live and accessible now, enabling you to accelerate your AI journey without GPUs.

SambaNova Cloud is the only platform that can offer high speed inference of over 100 tokens/second for Llama 3.1 405B, which is the largest, latest, and most capable open source model currently available. The average GPU provider can only deliver about 20 tokens/s on the same model. To date, other AI accelerators providers have been unable to offer Llama 405B. Only SambaNova can deliver this incredibly fast inference speed.

405b-fast-inference-bar-chart

The SambaNova platform also delivers unparalleled performance of up to 580 tokens/second on Llama 3.1 70B. The 70B model is considered the highest fidelity model for agentic AI use cases, which require high speeds and low latency. Its size makes it suitable for fine-tuning, producing expert models that can be combined in multi-agent systems suitable for solving complex tasks.

The cloud platform is available to use today, unlike other providers that have described their service as “coming soon” for several months. Users can even bring their own checkpoints and begin running them immediately. This ensures that developers can accelerate their AI journey without the limitations inherent to GPUs.

By making this service free for development, SambaNova has taken a critical step in advancing what is possible with AI. Developers need to have access to the best and most capable models available, yet few providers are capable of even running the groundbreaking 405B model at all. No other provider can come close to matching extreme inference performance that SambaNova delivers.

All of this incredible performance is made possible because of the unique SN40L chip, the fourth generation AI processor from SambaNova. With an innovative dataflow design and a massive three tier memory architecture, the SN40L can power AI models 10x faster than other systems, on a platform that is a fraction of the footprint.

It is not just that SambaNova can power the 405B model. SambaNova can power the 405B and multiple iterations of the 70B and 8B models, all with world record performance for those models, at the same time and switch between them in microseconds. This is a requirement for agentic AI applications and no other system manufacturer can do this today. The best others can do is to run one model at a time, increasing latency, reducing overall performance, and driving up costs.

SambaNova is opening up the full spectrum of Llama models for developers to create the next wave of AI innovation. Users can even bring their own checkpoints to accelerate development. This breaks down the barriers to adoption that smaller developers faced in accessing production grade inference.

The free version of the SambaNova Cloud is available for immediate use. Get access to it now.

For users that require greater levels of access than the free tier, higher levels of paid access are available for subscription today.

← Advanced AI Apps Need Fast Inference. SambaNova Cloud Delivers It

Judging Judges: All that is LLM Judgements does not glitter →