Ricoh Runs Japanese AI Models 10x Faster on SambaCloud

Ricoh deployed SambaNova's SambaCloud to power its custom Japanese AI models, achieving 10x faster inference speeds and over 700 tokens per second on 70B-class models, reducing complex agentic workflow times from one minute to ten seconds.

Ricoh, a global Japanese digital services and electronics company, is renowned for its imaging products and services for businesses and consumers. As one of the world’s largest manufacturers of cameras, printers, photocopiers, and projectors, Ricoh has also established itself as a leader in document management systems offered as SaaS solutions. Building on this foundation, Ricoh has been actively expanding into cutting-edge digital technologies, including artificial intelligence (AI), to drive innovation and growth.

What they do:

Beyond its traditional electronics and imaging business, Ricoh has emerged as a key player in Japan's AI ecosystem. The company has developed AI models and services tailored specifically for Japanese businesses, addressing unique linguistic and cultural nuances.

Using open-weight models like Llama, Qwen, and Gemma, Ricoh has developed specialized models for handling Japanese business documents and industry-specific models. One standout offering is RICOH Digital Buddy, a generative AI agent that leverages internal documents and organizational knowledge to answer questions and execute tasks, streamlining workflows for businesses.

Internally, Ricoh has embraced AI to enhance operational efficiency. By adopting no-code platforms like Dify, the company empowers non-engineers to build and deploy their own AI workflows, democratizing AI usage across the organization.

Challenge:

Ricoh's core challenge was that existing GPU infrastructure could not run 70B-class models efficiently enough for production agentic workloads, achieving only tens of tokens per second.

Ricoh’s success in developing a diverse range of custom AI models for Japanese businesses has brought new challenges. Bringing these models into production at scale requires an environment that hosts them efficiently from a model-provider standpoint and ensures fast inference from an end-user standpoint. This requirement is becoming increasingly pressing with the rise of modern agentic workflows, which involve calls across multiple models, not just a single call to one model.

Existing infrastructure solutions fell short of these requirements. Lightweight, low-cost GPU environments struggled to run 70B-class models efficiently, achieving only tens of tokens per second. On the other hand, high-end GPU environments raise concerns about scalability and cost-effectiveness, making them less viable for Ricoh’s long-term goals.

Solution:

Ricoh turned to SambaCloud, SambaNova's fully-managed cloud inference service, to overcome these challenges. Powered by the SambaNova Reconfigurable Dataflow Unit (RDU), SambaCloud is designed to efficiently run a wide variety of open-weight models, delivering both speed and scalability.

By adopting SambaCloud, Ricoh achieved:

10× faster speeds compared to their existing infrastructure, with over 700 tokens per second on their 70B-class models.
High accuracy and performance, ensuring their fine-tuned models optimized for Japanese business contexts maintained their effectiveness.
Scalability and reliability, enabling Ricoh to support modern agentic workflows and meet the growing demands of their AI-driven solutions.

Why Ricoh Chose SambaCloud:

Ricoh’s decision to partner with SambaNova was driven by several key factors:

Unmatched Performance: SambaCloud’s ability to deliver 10× faster inference speeds directly addressed Ricoh’s need for high-performance infrastructure.
Cost-Effectiveness: SambaCloud provided a scalable solution that balanced performance with cost, making it a sustainable choice for Ricoh’s expanding AI initiatives.
Specialized Support for Open-Weight Models: SambaCloud’s compatibility with a wide range of open-weight models, including those fine-tuned by Ricoh, ensured seamless integration and deployment.
Focus on Japanese Business Needs: SambaCloud’s infrastructure preserved the accuracy and cultural relevance of Ricoh’s models, which are tailored for Japanese businesses.

Challenge:

Hume specializes in building the most realistic voice AI models for developers and enterprises. These models are based on LLMs, so they understand both language and a person’s voice at the same time. Their mission is to bring empathy to AI and to align AI with human well-being. To that end, the speech-LLMs they develop are capable of understanding both the tone and meaning of the spoken word. Applications for this include audio chatbots, customer service, and more.

They recently launched the highest quality speech-LLMs for text-to-speech (Octave) and speech-to-speech (EVI 3). Much of the quality comes from the models’ ability to understand language and to adjust its tone of voice naturally in response to the input. This enables a more natural conversation, which can improve user perception.

Most voice systems today have separate text-to-speech, speech-to-text, transcription, and other models connected together because they were better at each individual task, but with the latest advances in speech-language models this is no longer the case. Moreover, each of these steps adds latency to the process. Conversational human latency is 200 ms and anything longer than 1 second will sound less human. Hume AI and SambaNova have worked together to develop a solution that delivers the highest performance at the lowest latency possible.

Solution:

Hume and SambaNova have worked together to deploy Hume’s speech-language models on SambaCloud, enabling the best speech-to-speech and text-to-speech models in the world to run at conversational latency without any reduction in quality. Together, Hume AI and SambaNova provide enterprises with access to text-to-speech and speech-to-speech APIs with response times on the order of 100 ms to 300 ms, marrying hyperrealistic quality with human-like conversation latency.

For many enterprises, it is critical to deploy in private environments. Hume and SambaNova are providing Hume’s text-to-speech and speech-to-speech models through private deployments to meet these needs.

FAQs

SambaCloud is SambaNova's fully managed cloud inference service, powered by the Reconfigurable Dataflow Unit (RDU). Ricoh uses it to host and serve custom AI models fine-tuned for Japanese business documents and industry-specific workflows.

SambaCloud delivers 10x faster inference than Ricoh's existing GPU infrastructure, achieving over 700 tokens per second on 70B-class models compared to tens of tokens per second previously.

Yes. SambaCloud is compatible with a wide range of open-weight models including Llama, Qwen, and Gemma, and supports fine-tuned variants, preserving accuracy and cultural relevance for Ricoh's Japanese business use cases.

SambaCloud's speed enables complex agentic workflows involving multiple model calls to complete in around ten seconds, compared to approximately one minute on previous infrastructure.

News

SambaNova Expands Deployment with SoftBank Corp. to Offer Fast AI Inference Across APAC

March 5, 2025

Blog

Qwen3 Is Here - Now Live on SambaNova Cloud

May 2, 2025

Blog

SambaNova Partners with Meta to Deliver Lightning Fast Inference on Llama 4

April 7, 2025

Ricoh runs Japanese custom AI models 10× faster on SambaCloud

What they do:

Challenge:

Solution:

Why Ricoh Chose SambaCloud:

Results:

Most models get the right outcome the first time

faster

tokens per second

Challenge:

Solution:

Response time

Highest quality speech LLMs

“With SambaNova running 5 to 10 times faster, even complex agentic workflows that would otherwise take a minute finish in 10 seconds. We think that brings significant business value.”

— Gakushi Miyara, AI Service Business Division

Ricoh Company, Ltd.

FAQs

Related resources

SambaNova Expands Deployment with SoftBank Corp. to Offer Fast AI Inference Across APAC

Qwen3 Is Here - Now Live on SambaNova Cloud

SambaNova Partners with Meta to Deliver Lightning Fast Inference on Llama 4

Time to start building