Introducing Fugaku-LLM in Composition of Experts

Posted by Keith Parker on May 13, 2024

The Composition of Experts (CoE) architecture that the Samba-1 model is based upon has many features that make it ideal for the enterprise. It delivers security and data protection features not available in any other large model, provides customers with model ownership and visibility into model weights and training data, provides role-based access control, and much more. It does all that while reducing inference compute requirements to a fraction of what other large models require. Still, one of most compelling things to enterprise applications about this model architecture is the flexibility that it provides to add in new models.

A perfect example of this is the Fugaku-LLM. This is a new Japanese LLM that was trained from scratch on Japan’s fastest supercomputer, the Fugaku. The Fugaku-LLM has been published on Hugging Face and is being introduced into the Samba-1 CoE architecture.

The Fugaku supercomputer that trained this new LLM is part of the RIKEN Center for Computational Science (R-CCS). As the fastest supercomputer in Japan, Fugaku has already incorporated SambaNova systems to accelerate high performance computing (HPC) simulations and artificial intelligence (AI).  These systems were incorporated into Fugaku to perform research on digital twins for the Society 5.0 era. By incorporating the Fugaku-LLM into the SambaNova CoE, the impressive capabilities of this LLM are being made available to a broader audience.

The ability to incorporate the Fugaku-LLM into the SambaNova CoE is one of the key benefits of the modular nature of this model architecture. As a CoE, the model is composed of a number of different smaller models, all operating as if it were one single very large model. Some of the models have been pre-trained for particular tasks, such as text-to-SQL, code generation, or text summarization. There are also a number of foundation models such as Llama 2, Llama 3, Mistral, DeepSeek, and many more. Every model in the SamabaNova CoE is open source and models can be easily fine-tuned for greater accuracy or swapped out as new models become available. A model that has been specifically trained to operate as a router sends each user prompt to the specific model best equipped to respond to that particular query. This ensures that every user gets the best possible response.

As part of a CoE model, Fugaku-LLM runs optimally on the SambaNova platform. Powered by the revolutionary SN40L, which was purposely designed for generative AI workloads. The SN40L has a three-tiered memory architecture that provides TBs of addressable memory and takes advantage of a Dataflow architecture. The result is a platform that can run the largest models in the world with a footprint that is only a fraction of what other systems require.

Learn more about a Composition of Experts model.

Topics: technology, Blog