For years, the AI industry has been chasing metrics focused on tokens (e.g., tokens/second, tokens/watt, token input/output ratios). But not all tokens are created equal. The real value lies not in measuring tokens generated, but in the quality of intelligence delivered per unit of energy consumed. Yet today’s benchmarks ignore this, prioritizing speed over substance while energy costs soar.
Today, this principle is being formalized. A research group from Stanford (Hazy Research) released a paper recently that introduces two critical new metrics: Intelligence per Joule and Intelligence per Watt. This is not merely a performance benchmark; it's an efficiency standard that captures the entire system's value — from the model's inherent capabilities to its optimization on the underlying silicon.
And when it comes to silicon, the SambaNova SN40L RDU isn’t just efficient — it’s architected for intelligence density. The SN40L RDU’s dataflow design computes intelligently, slashing energy waste by 4X. This isn’t incremental improvement — it’s a paradigm shift.
According to McKinsey, data centers will require $5.2 - $7.9 trillion of capital expenditure across computing hardware, power infrastructure, and data center construction to meet the growing AI demand. SambaNova’s 4X energy efficiency translates to trillions of dollars of CapEx savings by requiring much less hardware to meet the growing demand.
Why Joules Matter More than Watts
While the media is rightfully worried about the power crisis (measured in watts) as AI demand increases, energy (measured in joules) is a much more useful unit to measure value. Simply put, energy equals power multiplied by time. Why is energy a better unit to measure value than power?
To illustrate, let’s say we have the same AI model on two different systems. System A runs with 10 Watts of Power and System B that uses 1000 Watts of Power. However, System A requires 100 seconds to finish running, but System B only requires 0.1 seconds. Which system uses more energy to complete the same request?
While System A is more Power Efficient (10 W < 1000 W), System B is 10X more energy efficient because the task took less time (100 J < 1000 J)! In other words, being fast saves energy. Not only are the SambaNova SN40L RDUs optimized for power, but because they deliver fast inference, they also save energy compared to other systems like GPUs. As uncovered in the paper, when comparing local compute to cloud compute, “Cloud accelerators demonstrate superior energy efficiency across all models.”
The following table shows tokens / seconds / users on the same model as measured on SambaCloud RDU vs. NVIDIA GPU on Azure against some of the most popular open-source models. These speeds are independently benchmarked by Artificial Analysis.
Why Intelligence per Joule Is the Ultimate Measure of Value
Traditional benchmarks measure isolated components. Intelligence per Joule is different. It measures the output of the entire AI system, providing a true value-to-cost ratio by factoring in:
- The Model: The inherent "intelligence" or capability of the model itself and its efficiency (e.g., gpt-oss-120b vs. qwen-32b).
- The Software Optimization: The efficiency of the inference engine and software stack (e.g., vLLM, MLX, or a native stack).
- The Hardware Efficiency: The raw silicon physics — how many joules are consumed to perform the computation.
This holistic view is essential. As AI leaders like Sam Altman have predicted, “The cost of intelligence should eventually converge to near the cost of electricity.”. Measuring Intelligence per Joule is therefore paramount for enterprises seeking to optimize their AI investments and operational costs. This energy savings has been critical for many of our SambaManaged customers that need an energy efficient inference solution.
The SambaNova Difference: A System Engineered for Value
This efficiency stems from the architecture of the SambaNova SN40L Reconfigurable Dataflow Unit (RDU). Unlike GPU architectures, which can create bottlenecks, the RDU functions like an AI assembly line. Data flows seamlessly on-chip from one operation to the next, computing where it resides. This results in higher utilization and significantly lower sustained power, moving beyond theoretical peak performance to deliver consistent, real-world efficiency.
Conclusion: Value Is an Integrated Outcome
The era of evaluating AI infrastructure on disaggregated components is over. True value emerges from the synergy between model, software, and hardware. In a report released today, Hazy Research confirms Intelligence per Joule is the first metric sophisticated enough to measure that synergy. The report of their findings is available for download.
For the purpose of this research, SambaNova measured energy efficiency in a way that compares apples to apples with local computers like Apple’s M4, but as the report suggests, there are even better ways to efficiently use SambaRacks and SN40L RDUs with model bundling. We are proud to be paving the way for the industry to be building out more AI compute in power constrained environments and data centers.