Built on top of open source models, using unique ensembling methods, the model outperforms Gemma-7B from Google and Google DeepMind, Mixtral-8x7B from Mistral AI, llama2-70B from Meta Facebook, Qwen-72B from AlibabaGroup.com‘s Qwen team, Falcon-180B from Technology Innovation Institute and BLOOM-176B from BigScience Research Workshop. The model achieves this feat at the inference cost of two 7B models.
More details about the architecture and HF model cards with checkpoints are coming later this week. Watch this space!