Dataflow Architecture | AI Architecture for Fast Inference

An architecture
purpose-built for AI

All AI models are represented as a graph of operations, where data flows from one operation to the next. To achieve faster tokens per second per user, more tokens per watt, and support more users, the hardware must align as closely as possible to this AI dataflow graph — while remaining configurable to support many different graphs.

SambaNova invented the Reconfigurable Dataflow Unit (RDU) to make AI faster, more efficient, and scalable to users worldwide.

Learn more

Programmable grids create efficiency

Instead of operating kernel-by-kernel, dataflow is enabled through a grid of Programmable Compute Units (PCUs) and SRAM Programmable Memory Units (PMUs).

While computation executes for one operator, data is fetched in parallel for the next — creating a streaming pipeline. This parallelization of memory and compute on-chip keeps all intermediate activations local, dramatically reducing unnecessary data movement.

Learn more →

64abaad9-f6b5-4701-8e5e-f5c27bd985d6_large

Speed

RDUs are the only solution that run the largest AI models with blazing-fast inference speeds.

Learn more →

Energy

RDUs deliver the highest tokens per kilowatt-hour, which is ideal for existing air-cooled data centers of all sizes.

Learn more →

2025-07-14LogoMontage_460x260_144dpi_Opp3

Agentic Caching

Faster inference requires agents to cache models and data on hardware. The three-tier architecture allows multiple models to run while switching between them.

Learn more →

FAQs

SambaNova’s Dataflow Architecture is a hardware design in its Reconfigurable Dataflow Unit (RDU) that allows data to flow from one AI operation to the next as an assembly pipeline. This architecture eliminates frequent, energy-intensive memory bottlenecks (kernel calls), enabling faster AI inference, higher model utilization, and significantly better energy efficiency.

Instead of focusing on providing the most possible compute while ignoring memory optimizations like traditional architectures, SambaNova uses Dataflow Architecture to minimize data movement. As one of the most expensive operations in hardware, optimizing data movement enables scaling of large inference deployments in a cost-effective manner.

SambaNova's approach focuses on solving the AI data movement bottleneck directly in hardware, making it faster and more energy-efficient for LLM inference and large-scale AI.

Dataflow is a unique technology that creates an assembly line of operations, which eliminates the memory bottlenecks faced by other solutions. Memory and compute run in parallel on-chip, keeping activations local and reducing data movement.

Instead of operating kernel-by-kernel, a grid of PCUs and SRAM PMUs enables the SambaNova Dataflow Architecture. While compute happens for one operator, data is fetched in parallel for the next to create a streaming pipeline. Parallelization of memory and compute on-chip keeps all intermediate activations local, dramatically reducing unnecessary data movement.

Dataflow Architecture™

An architecture
purpose-built for AI

AI inference demands efficient data movement

Programmable grids create efficiency

Designed for cloud scale

Built for large-scale intelligent models

The future of AI inference is Dataflow

Speed

Energy

Agentic Caching

FAQs

Tap into the natural movement of AI

Dataflow Architecture™

An architecture purpose-built for AI

AI inference demands efficient data movement

Programmable grids create efficiency

Designed for cloud scale

Built for large-scale intelligent models

The future of AI inference is Dataflow

Speed

Energy

Agentic Caching

FAQs

Tap into the natural movement of AI

An architecture
purpose-built for AI