Dataflow Architecture™

The natural movement of AI

Architecture purpose-built for AI

All AI models are represented as a graph of operations, where data flows from one operation to the next. To achieve faster tokens per second per user, more tokens per watt, and support more users, the hardware must align as closely as possible to this AI dataflow graph — while remaining configurable to support many different graphs.

SambaNova invented the Reconfigurable Dataflow Unit (RDU) to make AI faster, more efficient, and scalable to users worldwide.

Learn more
Multiple batches One pipeline

AI inference demands efficient data movement

Compute is the easy part for AI. The real challenge is efficient data movement.

Moving data off-chip is one of the most expensive operations for AI accelerators. Dataflow creates an assembly line of operations that eliminates the memory bottlenecks faced by other solutions.

Learn more →

Programmable grids create efficiency

Instead of operating kernel-by-kernel, dataflow is enabled through a grid of Programmable Compute Units (PCUs) and SRAM Programmable Memory Units (PMUs).

While computation executes for one operator, data is fetched in parallel for the next — creating a streaming pipeline. This parallelization of memory and compute on-chip keeps all intermediate activations local, dramatically reducing unnecessary data movement.

Learn more →
PCUs and PMUs

Designed for cloud scale

The grid architecture allows AI operations to scale seamlessly across multiple chips to handle entire model layers.

Chips pass data to each other through the AGCU, minimizing networking complexity. This allows more chips to operate together efficiently, scaling up to 256 RDUs working together for inference on the SN50.

Learn more →

Built for large-scale intelligent models

To support the largest models, SambaNova’s Dataflow is backed by two additional memory tiers: HBM and DDR.

Full models and KV cache load into HBM, then stream onto the chip as needed. This architecture enables SambaRack to scale to the biggest models — up to 10 trillion parameters on the SN50.

Learn more →
3 tier memory

The future of AI inference is Dataflow

Driven by the high performance of our fourth-generation SN40L and fifth-generation SN50 RDUs.

FAQs

What is Dataflow Architecture?

SambaNova’s Dataflow Architecture is a hardware design in its Reconfigurable Dataflow Unit (RDU) that allows data to flow from one AI operation to the next as an assembly pipeline. This architecture eliminates frequent, energy-intensive memory bottlenecks (kernel calls), enabling faster AI inference, higher model utilization, and significantly better energy efficiency.

How does Dataflow Architecture differ from traditional architectures?

Instead of focusing on providing the most possible compute while ignoring memory optimizations like traditional architectures, SambaNova uses Dataflow Architecture to minimize data movement. As one of the most expensive operations in hardware, optimizing data movement enables scaling of large inference deployments in a cost-effective manner.

What are the benefits of using Dataflow Architecture?

SambaNova's approach focuses on solving the AI data movement bottleneck directly in hardware, making it faster and more energy-efficient for LLM inference and large-scale AI.

How does Dataflow Architecture eliminate the memory bottlenecks that limit other AI accelerators?

Dataflow is a unique technology that creates an assembly line of operations, which eliminates the memory bottlenecks faced by other solutions. Memory and compute run in parallel on-chip, keeping activations local and reducing data movement.

How do Programmable Compute Units (PCUs) and Memory Units (PMUs) work together in the grid?

Instead of operating kernel-by-kernel, a grid of PCUs and SRAM PMUs enables the SambaNova Dataflow Architecture. While compute happens for one operator, data is fetched in parallel for the next to create a streaming pipeline. Parallelization of memory and compute on-chip keeps all intermediate activations local, dramatically reducing unnecessary data movement.