In this post, we explore why tokens per second doesn't paint the full picture of enterprise LLM inference performance.
Tokens Per Second is Not All You Need
Posted by
Mingran Wang on May 1, 2024
Topics: technology, Blog