In working closely with our customers to optimize inference performance, we’ve seen firsthand how small benchmarking errors can snowball into major performance discrepancies when the applications are run at scale in production. We’ve helped customers overcome performance test distortions from a host of issues, like inefficient prompt structures, unrealistically small request token count, misconfiguration of thread and worker settings, and many more. These aren’t isolated incidents—they represent some of the most common challenges we've helped customers navigate.
This is exactly why we developed the AI Benchmarking Starter Kit. It's not just about running tests; it’s about helping developers get up and running as quickly as possible while avoiding these common pitfalls, ensuring their benchmarks are meaningful, precise, and scalable. The kit incorporates the key lessons we've learned from real-world customer experiences, offering tools that ensure your AI models are not just performing, but performing accurately.
In this post, we’ll dive into the benchmarking mistakes our customers often encounter and how the AI Benchmarking Starter Kit can help you avoid them.
Why SambaNova Systems
SambaNova Systems is renowned for delivering industry-leading infrastructure that runs open-source AI models at the highest speed and accuracy available. With a full-stack AI platform built for optimized inference and model training, hosted on the SambaStudio and SambaNova Cloud platforms, SambaNova allows companies to leverage the latest advancements in large language models (LLMs). The combination of high-performance endpoints and advanced dynamic batching makes it a key player in delivering low-latency, high-throughput inference across various use cases.
SambaNova has achieved record speeds of 132 output tokens per second on their Llama 3.1 405B Cloud API endpoint
Benchmarking Kit: A tool to unveil speed performance
The Benchmarking AI Starter Kit offers a suite of functionalities for evaluating the performance of different LLMs available on SambaStudio or SambaNova Cloud. Whether you are looking to test models like Llama 2 or Llama 3 from Meta, Mistral, or other in-house optimized models, the Benchmarking Kit helps you measure speed and scalability with ease.
Snapshot of the Benchmarking Kit’s core functionalities:
- Synthetic Performance Evaluation
Users can configure various parameters—such as the number of input/output tokens, concurrent requests, and timeouts—to simulate synthetic workloads. This provides a quick way to benchmark how well the LLMs handle different input prompt lengths under different conditions. - Custom Performance Evaluation
For businesses with specific datasets, this feature allows for performance evaluation using custom prompts. By inputting JSONL-formatted datasets, developers can gain insights into how models perform on real-world data. - Interactive Chat Interface
For those looking to experience model interaction firsthand, the chat functionality offers real-time testing with metrics such as latency, time to first token (TTFT), and throughput, which are crucial for assessing conversational AI models.
Why Developers Love the Benchmarking Kit
- Flexibility Across Platforms
The AI Starter Kit is built for both SambaStudio and SambaNova Cloud environments, giving users the flexibility to evaluate LLMs on either platform. This is especially beneficial for businesses who are either running models on SambaNova’s robust infrastructure or testing API-based solutions like POCs. - GUI and CLI Options
For ease of use, the kit offers both a Streamlit-based graphical user interface (GUI) for visual performance insights and a command-line interface (CLI) for users who prefer more control and customization over the testing parameters. The GUI presents results in the form of easy-to-understand plots, showing distributions for latency, throughput, and batch performance. - Comprehensive Metrics
The results from the benchmarking process are detailed and visualized through five main performance plots, covering both client-side and server-side metrics. These include:- TTFT
- End-to-End Latency
- Token Throughput
- Batch Token Generation
- LLM Requests Across Time (Gantt Plot)
- Supports Cutting-Edge Models
Whether you are working with industry favorites like Llama 3, Llama 2, Deepseek, or Solar, the kit offers specific prompt configurations to ensure you can get accurate performance metrics for each LLM.
Conclusion
SambaNova Systems, through its AI Starter Kits, empowers developers and enterprises with the tools they need to benchmark LLM speed performance at scale. With advanced features like dynamic batching, low-latency endpoints, and flexible deployment options, SambaNova remains at the forefront of AI model efficiency.
Whether you're a seasoned developer or a business looking to leverage the latest in AI infrastructure, SambaNova’s cutting-edge solutions can help you stay ahead of the curve.
Explore the Benchmarking Kit demo now and see for yourself how SambaNova is transforming the AI landscape! If you’re interested in the code behind it, go to the kit’s repository on GitHub.
Appendix
Full Instructions: https://github.com/sambanova/ai-starter-kit/tree/main/benchmarking#readme