Products
Developers
About

SambaNova Partners with Meta to Deliver Lightning Fast Inference on Llama 4

Posted by SambaNova Systems on April 7, 2025

2025-04_SambaNova+Llama4_1600x900_v1.0 (1)


Today, we are thrilled to announce a major milestone in open source AI, as we partner with Meta to bring their Llama 4 models to life for developers on SambaNova Cloud, powered by our RDU chips. As one of the most powerful multimodal models available, the Llama 4 series is poised to revolutionize the way developers build and deploy applications, and we're excited to be at the forefront of this revolution alongside Meta.

Available now, developers have access to the Llama 4 Scout on SambaNova Cloud running at 697 tokens/second/user - the fastest offering in the market to date according to Artificial Analysis. By next week, developers will have access to multimodality functionality as well as the larger Llama 4 Maverick model, both running faster and more efficiently than any other platform.

“Artificial Analysis has independently benchmarked SambaNova’s cloud deployment of Meta's Llama 4 Scout model (109B total parameters, 17B active parameters) at 697 output tokens/s, the fastest output speed we have measured yet for Llama 4 Scout," said Micah Hill-Smith, Co-founder Artificial Analysis.

Try Llama 4 Scout & Llama 4 Maverick now on SambaNova Cloud!

Meet the Llama 4 Herd 🦙

The new Llama 4 models use a mixture of expert (MoE) architecture, which enable it to deliver high-quality results while being more compute-efficient than traditional dense architectures. At launch, Meta has made available a smaller and larger variant of the model: Scout and Maverick. They are both natively multimodal, and have been trained to seamlessly integrate text and vision tokens.

meta models

Both models have been distilled from the much larger Llama 4 Behemoth, which while not done training, is already showing exceptional performance in the benchmarks as a 2 trillion+ parameter model. 

Meta also plans to release Llama 4 Reasoning later this month, which will show even better performance and will be launched also on SambaNova Cloud, when it is available.

Performance & Pricing

Llama 4 Scout and Llama 4 Maverick are available today for developers to start building with!

  • Llama 4 Scout is running at 697 tokens/second/user and is available today at a cost of $0.40 per million input tokens and $0.70 per million output tokens.
  • Llama 4 Maverick is running at 655 tokens/second/user and is available today at a cost of $0.63 per million input tokens and $1.80 per million output tokens.

Output Speed - Llama 4 Scout Providers (8 Apr 25)

Llama 4 Scout is one of the most advanced multimodal models available, featuring 110 billion parameters and 16 experts. It is ideal for developers seeking a smaller, faster, and cost-effective model for creative workflows, such as educational tools or marketing campaigns.

For use cases demanding higher accuracy and performance, Llama 4 Maverick serves as Meta's flagship workhorse model. With 400 billion parameters and 128 experts, it outperforms competitors like GPT-4o and Gemini 2.0 Flash across benchmarks in multilingual processing and image understanding. Maverick balances power and efficiency through its MoE architecture, delivering best-in-class performance while maintaining a lower computational cost compared to similarly capable models.

According to Artificial Analysis independent evals, “Maverick (402B total, 17B active) beats Claude 3.7 Sonnet, trails DeepSeek V3 but more efficient; Scout (109B total, 17B active) in-line with GPT-4o mini, ahead of Mistral Small 3.1”

artificial analysis benchmark

Start Building Now!

Llama 4 on SambaNova Cloud is a game-changer in the world of AI, offering unparalleled performance, multimodal capabilities, and a range of use cases that can accelerate innovation in various industries. We invite developers and enterprises to try Llama 4 Scout and Maverick on SambaNova Cloud and experience the power of AI acceleration for themselves. Currently available in 8K context length, but we will be rapidly expanding this to 128K in the coming weeks. 

Get started in minutes with SambaNova Cloud.

Topics: business, Blog