AiNews.com
Posts
Anthropic Named Best Performing LLM as AI Competition Heats Up

Anthropic Named Best Performing LLM as AI Competition Heats Up

Alicia Shapiro
July 29, 2024 • Estimated Reading Time: 3 minutes

An image illustrating the ranking of top large language models (LLMs) by Galileo. The image features a podium with Anthropic's Claude 3.5 Sonnet at the top, followed by OpenAI’s GPT-4o and GPT-3.5, and Google’s Gemini 1.5 Flash. Icons representing AI technology, such as neural networks and data graphs, are integrated into the design. The background shows elements of technological competition, like racing flags and digital charts, emphasizing the AI arms race. The color scheme is professional with tones of blue and gold to highlight excellence and competition

Anthropic Named Best Performing LLM as AI Competition Heats Up

Generative artificial intelligence (AI) firm Galileo has unveiled a new ranking of top large language models (LLMs). The announcement, made on Monday (July 29), introduced the latest “Hallucination Index,” which evaluates the performance of AI LLMs from companies like OpenAI, Anthropic, Google, and Meta.

Introduction to the Hallucination Index

Galileo’s Hallucination Index added 11 models to its framework this year, reflecting the rapid expansion in both open- and closed-source LLMs over the past eight months. The company emphasized that hallucinations remain the primary challenge to deploying production-ready generative AI products.

Top Performers

According to the index, Anthropic’s Claude 3.5 Sonnet emerged as the best overall performing model, excelling in short, medium, and long context scenarios. It surpassed last year’s top models, OpenAI’s GPT-4o and GPT-3.5. Google’s Gemini 1.5 Flash was recognized as the best performing model in terms of cost, while Alibaba’s Qwen2-72B-Instruct was highlighted as the top open-source model.

Real-World Applications and Challenges

Vikram Chatterji, CEO and co-founder of Galileo, addressed the evolving AI landscape, stating, “In today’s rapidly evolving AI landscape, developers and enterprises face a critical challenge: how to harness the power of generative AI while balancing cost, accuracy, and reliability. Current benchmarks are often based on academic use-cases, rather than real-world applications.”

Galileo’s new Index aims to bridge this gap by testing models in real-world scenarios that require data retrieval, a common practice in enterprise AI implementations. Chatterji added, “As hallucinations continue to be a major hurdle, our goal wasn’t just to rank models, but to provide AI teams and leaders with the real-world data they need to adopt the right model, for the right task, at the right price.”