Ricoh runs Japanese custom AI models 10× faster on SambaCloud

Enabling agentic AI for Japanese businesses, from tens to over 700 tokens per second

richo-logo

Ricoh, a global Japanese digital services and electronics company, is renowned for its imaging products and services for businesses and consumers.  As one of the world’s largest manufacturers of cameras, printers, photocopiers, and projectors, Ricoh has also established itself as a leader in document management systems offered as SaaS solutions. Building on this foundation, Ricoh has been actively expanding into cutting-edge digital technologies, including artificial intelligence (AI), to drive innovation and growth.

What they do:

Beyond its traditional electronics and imaging business, Ricoh has emerged as a key player in Japan's AI ecosystem. The company has developed AI models and services tailored specifically for Japanese businesses, addressing unique linguistic and cultural nuances.

Using open-weight models like Llama, Qwen, and Gemma, Ricoh has developed specialized models for handling Japanese business documents and industry-specific models. One standout offering is RICOH Digital Buddy, a generative AI agent that leverages internal documents and organizational knowledge to answer questions and execute tasks, streamlining workflows for businesses.

Internally, Ricoh has embraced AI to enhance operational efficiency. By adopting no-code platforms like Dify, the company empowers non-engineers to build and deploy their own AI workflows, democratizing AI usage across the organization.

Challenge:

Ricoh’s success in developing a diverse range of custom AI models for Japanese businesses has brought new challenges. Bringing these models into production at scale requires an environment that hosts them efficiently from a model-provider standpoint and ensures fast inference from an end-user standpoint. This requirement is becoming increasingly pressing with the rise of modern agentic workflows, which involve calls across multiple models, not just a single call to one model.

Existing infrastructure solutions fell short of these requirements. Lightweight, low-cost GPU environments struggled to run 70B-class models efficiently, achieving only tens of tokens per second. On the other hand, high-end GPU environments raise concerns about scalability and cost-effectiveness, making them less viable for Ricoh’s long-term goals.

Solution:

Ricoh turned to SambaCloud, SambaNova's fully-managed cloud inference service, to overcome these challenges. Powered by the SambaNova Reconfigurable Dataflow Unit (RDU), SambaCloud is designed to efficiently run a wide variety of open-weight models, delivering both speed and scalability.

By adopting SambaCloud, Ricoh achieved:

  • 10× faster speeds compared to their existing infrastructure, with over 700 tokens per second on their 70B-class models.
  • High accuracy and performance, ensuring their fine-tuned models optimized for Japanese business contexts maintained their effectiveness.
  • Scalability and reliability, enabling Ricoh to support modern agentic workflows and meet the growing demands of their AI-driven solutions.

Why Ricoh Chose SambaCloud:

Ricoh’s decision to partner with SambaNova was driven by several key factors:

  1. Unmatched Performance: SambaCloud’s ability to deliver 10× faster inference speeds directly addressed Ricoh’s need for high-performance infrastructure.
  2. Cost-Effectiveness: SambaCloud provided a scalable solution that balanced performance with cost, making it a sustainable choice for Ricoh’s expanding AI initiatives.
  3. Specialized Support for Open-Weight Models: SambaCloud’s compatibility with a wide range of open-weight models, including those fine-tuned by Ricoh, ensured seamless integration and deployment.
  4. Focus on Japanese Business Needs: SambaCloud’s infrastructure preserved the accuracy and cultural relevance of Ricoh’s models, which are tailored for Japanese businesses.

Results:

With SambaCloud, Ricoh has successfully scaled its AI operations, enabling faster and more efficient deployment of their custom models. This partnership has empowered Ricoh to continue innovating in the AI space, delivering cutting-edge solutions to Japanese businesses while maintaining their leadership in the industry.

15%

Most models get the right outcome the first time

10x

faster

700+

tokens per second

Challenge:

Hume specializes in building the most realistic voice AI models for developers and enterprises. These models are based on LLMs, so they understand both language and a person’s voice at the same time. Their mission is to bring empathy to AI and to align AI with human well-being. To that end, the speech-LLMs they develop are capable of understanding both the tone and meaning of the spoken word. Applications for this include audio chatbots, customer service, and more. 

They recently launched the highest quality speech-LLMs for text-to-speech (Octave) and speech-to-speech (EVI 3). Much of the quality comes from the models’ ability to understand language and to adjust its tone of voice naturally in response to the input. This enables a more natural conversation, which can improve user perception.

Most voice systems today have separate text-to-speech, speech-to-text, transcription, and other models connected together because they were better at each individual task, but with the latest advances in speech-language models this is no longer the case. Moreover, each of these steps adds latency to the process. Conversational human latency is 200 ms and anything longer than 1 second will sound less human. Hume AI and SambaNova have worked together to develop a solution that delivers the highest performance at the lowest latency possible. 

Solution:

Hume and SambaNova have worked together to deploy Hume’s speech-language models on SambaCloud, enabling the best speech-to-speech and text-to-speech models in the world to run at conversational latency without any reduction in quality. Together, Hume AI and SambaNova provide enterprises with access to text-to-speech and speech-to-speech APIs with response times on the order of 100 ms to 300 ms, marrying hyperrealistic quality with human-like conversation latency.

For many  enterprises, it is critical to deploy in private environments. Hume and SambaNova are providing Hume’s text-to-speech and speech-to-speech models through private deployments to meet these needs.

100-300 ms

Response time

Highest quality speech LLMs

“With SambaNova running 5 to 10 times faster, even complex agentic workflows that would otherwise take a minute finish in 10 seconds. We think that brings significant business value.”

 

— Gakushi Miyara, AI Service Business Division

Ricoh Company, Ltd.

 

Mr. Miyara, AI Engineer at Ricoh, shares his experience deploying custom AI models on SambaCloud.

 

Find out the business value of SambaCloud to Ricoh.

Related resources

SambaNova Expands Deployment with SoftBank Corp. to Offer Fast AI Inference Across APAC

SambaNova Expands Deployment with SoftBank Corp. to Offer Fast AI Inference Across APAC

March 5, 2025
Qwen3 Is Here - Now Live on SambaNova Cloud

Qwen3 Is Here - Now Live on SambaNova Cloud

May 2, 2025
SambaNova Partners with Meta to Deliver Lightning Fast Inference on Llama 4

SambaNova Partners with Meta to Deliver Lightning Fast Inference on Llama 4

April 7, 2025