World Record Large Language Models Training Performance with SambaNova Systems Dataflow-as-a-ServiceTM GPT and Why It Doesn’t Matter

Written by SambaNova | February 1, 2022

Natural Language Processing (NLP) is one of the most pervasive areas of adoption and growth for Deep Learning. In a recent market survey of enterprise leaders, 75% of respondents say improving access to Deep Learning is very important for fostering competition and innovation in industries such as Banking and Financial Services, Insurance, Healthcare, Manufacturing, Public Sector, and more. Enterprises in these industries are looking to NLP for accelerating profitability and operational efficiency to increase competitiveness. The race to the top has commenced and innovations in large language models continue to push to larger, more complex Transformer-based large language models like GPT (Generative Pre-trained Transformer) and it’s variants.

While these powerful models are enabling enterprises to transform their business in new and impressive ways, they can be challenging to manage due to their size, and the time it takes to train them to needed levels of accuracy.

SambaNova’s world record speed and accuracy

At SambaNova Systems we have been hard at work to address these challenges and today I am excited to share a new industry benchmark: SambaNova is the leader in GPT training, performance, and speed.

2.1X faster time to train
2.1X throughput
83.4% task accuracy

What Matters Most: Best In Class Accuracy

When training deep learning models, achieving performance without accuracy simply delivers the wrong outcome faster. Therefore, as a prerequisite to performance, we first focus on setting a best-in-class accuracy standard as a foundation for performance. We work with our customers to make sure GPT isn’t just good at generic zero-shot tasks, rather delivers improved performance for industry specific applications. The SambaNova Dataflow-as-a-Service^TMGPT model achieves state of the art accuracy of 83.4% over standard GPT 1.5B running on A100 GPUs.

Figure 1: World Record 83.4% Accuracy Beats Nvidia

Record-Breaking Training Performance

The inverse of the scenario in the section above is also interesting to examine. Achieving high accuracy without proportional performance may result in extensive wait time to acquire the desired insight. The SambaNova Dataflow-as-a-Service^TMGPT model achieves 2.1x greater throughput resulting in 2.1x faster time-to-train over standard GPT3 running on A100 GPUs.

Figure 2: World Record Throughput 2.1x More Throughput Than Nvidia

Figure 3: World Record 2.1x faster time-to-train (less time is better) Than Nvidia

What really matters is helping organizations transform their business, not performance metrics

These world record performance results are an impressive benchmark which is a testament to the hardwork and dedication of SambaNova’s world class engineering team. That being said, our entire team agrees: ultimately, these performance metrics don’t matter. Customers don’t care about lab experiments conducted on fragile setups that don’t translate to real life work loads. Customers care about improving their business, and accelerating time to value.

That is why our focus is not on performance metrics, but on helping our customers get to value faster by delivering GPT as a service through SambaNova Dataflow-as-a-Service^TM GPT which offers easy to use and flexible low/no code API interfaces, and allows customers to deploy these powerful models in weeks, not years, by removing model and infrastructure complexity.

What’s next?

As I mentioned, our focus is on helping organizations use AI and deep learning to transform their business, and one of the industries where we see the most potential for this transformation is banking: according to McKinsey, AI and deep learning has the potential to deliver $1T in value annually.

We have been hard at work on something that will help leading banking organizations achieve this value by applying large language models like GPT to accelerate their business transformation across both their front and back office operations. I look forward to sharing more about what we have been working on during an exciting announcement from SambaNova later this month. I hope to see you there.

—————————————————————————————————————————————

Disclosures

Market survey results can be found here: https://sambanova.ai/the-race-to-ai-value/

In August 2021, SambaNova surveyed 600 full-time AI/ML, data, research, experience, and cloud infrastructure leaders at the director level and above. The survey captured 100 responses from each of six key industries: financial services, healthcare and life sciences, manufacturing and auto, retail and e-commerce, public sector and oil and gas.

Performance and accuracy comparisons are made on the following competitive equivalents:

SambaNova Dataflow-as-a-Service^TM GPT running on 1 integrated system with 8 RDU, SambaNova Dataflow-as-a-Service^TM GPT 1.5B parameters, global BS=1,024
Commodity server, 1 Nvidia DGX A100 with 8 A100 GPU, GPT 1.5B parameters, Global BS=1,024

Pre-training dataset: SEC EDGAR

Downstream task fine tuning dataset: FiQA+PhraseBank

View full post