The Benefits of Large Language Models for Document Classification
Large language models (LLMs) have been popular in the AI and deep learning communities due to their ability to accurately solve a wide range of language tasks, such as sentiment analysis, classification, named entity recognition, summarization, and even generation of new content and language. While more traditional natural language processing models such as BERT need to be trained on each of these tasks individually, state-of-the-art LLMs such as GPT have emergent capabilities: the ability to solve a wide range of tasks, even without being specifically trained on those tasks. These emergent capabilities are because these LLMs don’t just process language data, they develop a fundamental understanding of language structure and context that enable them to solve language tasks with unprecedented flexibility and human-level accuracy.
While there has been significant interest in both academic research and press articles about the impressive technical capabilities of LLMs, questions often remain about how these technical capabilities translate into business impact, and how exactly these LLMS, such as GPT, are differentiated from more traditional technologies such as BERT in real world scenarios.
One way to better understand the business potential and accuracy advantages of LLMs such as GPT is by analyzing a specific task, such as document classification, and how it can be used to solve a specific business challenge, such as automated email routing.
In recent research, SambaNova’s GPT model was able to achieve a 4.4% accuracy advantage over BERT at analyzing and predicting the topic of customer complaint emails. The emails in this research were often lengthy and complex, requiring the model to identify the part of the text containing the information required to determine the email topic.
The GPT model was able to achieve this 4.4% accuracy improvement when compared to BERT as a result of two advantages:
- First, a higher understanding of language context and meaning of the email, allowing the GPT model to determine the topic of the email even in cases where the complaint topic was not directly mentioned in the email.
- Second, the technical capability to process a higher sequence length. Sequence length refers to the number of tokens, or language data, that can be processed by a model at one time. The more language data that a model can be analyzed at a time, the better understanding and context it can generate.
For a more detailed example of how GPT can achieve higher accuracy, and the importance of sequence length, check out SambaNova’s latest demo and see if you can meet the GPT challenge.