- AiNews.com
- Posts
- Anthropic Named Best Performing LLM as AI Competition Heats Up
Anthropic Named Best Performing LLM as AI Competition Heats Up
Anthropic Named Best Performing LLM as AI Competition Heats Up
Generative artificial intelligence (AI) firm Galileo has unveiled a new ranking of top large language models (LLMs). The announcement, made on Monday (July 29), introduced the latest “Hallucination Index,” which evaluates the performance of AI LLMs from companies like OpenAI, Anthropic, Google, and Meta.
Introduction to the Hallucination Index
Galileo’s Hallucination Index added 11 models to its framework this year, reflecting the rapid expansion in both open- and closed-source LLMs over the past eight months. The company emphasized that hallucinations remain the primary challenge to deploying production-ready generative AI products.
Top Performers
According to the index, Anthropic’s Claude 3.5 Sonnet emerged as the best overall performing model, excelling in short, medium, and long context scenarios. It surpassed last year’s top models, OpenAI’s GPT-4o and GPT-3.5. Google’s Gemini 1.5 Flash was recognized as the best performing model in terms of cost, while Alibaba’s Qwen2-72B-Instruct was highlighted as the top open-source model.
Real-World Applications and Challenges
Vikram Chatterji, CEO and co-founder of Galileo, addressed the evolving AI landscape, stating, “In today’s rapidly evolving AI landscape, developers and enterprises face a critical challenge: how to harness the power of generative AI while balancing cost, accuracy, and reliability. Current benchmarks are often based on academic use-cases, rather than real-world applications.”
Galileo’s new Index aims to bridge this gap by testing models in real-world scenarios that require data retrieval, a common practice in enterprise AI implementations. Chatterji added, “As hallucinations continue to be a major hurdle, our goal wasn’t just to rank models, but to provide AI teams and leaders with the real-world data they need to adopt the right model, for the right task, at the right price.”