• AiNews.com
  • Posts
  • Mistral Releases Pixtral 12B: A Multimodal AI Model for Text & Images

Mistral Releases Pixtral 12B: A Multimodal AI Model for Text & Images

High-tech visualization of Mistral's Pixtral 12B AI model, highlighting its multimodal capabilities for processing text and images. The interface displays both text inputs and image analysis, with labels indicating tasks such as image captioning and object counting. Abstract digital elements, including data streams, neural networks, and AI-generated text layers, surround the model, symbolizing its large language model properties. The overall design is sleek and futuristic, representing the advanced integration of AI in processing both visual and textual information.

Image Source: ChatGPT-4o

Mistral Releases Pixtral 12B: A Multimodal AI Model for Text & Images

French AI startup Mistral has made waves with the release of Pixtral 12B, its first model capable of processing both images and text. With 12 billion parameters, the model measures around 24GB in size, and in general, models with more parameters tend to demonstrate better problem-solving capabilities. This makes Pixtral 12B a powerful addition to Mistral’s lineup.

Capabilities of Pixtral 12B

Built on Mistral’s Nemo 12B text model, Pixtral 12B is capable of handling an arbitrary number of images of various sizes, using either URLs or base64-encoded images. The model's applications are similar to other multimodal models like OpenAI's GPT-4o and Anthropic's Claude family, allowing it to perform tasks like captioning images and counting objects in photos.

Access and Availability

Pixtral 12B is available for download via a torrent link on GitHub and through the Hugging Face AI platform. It comes with an Apache 2.0 license, allowing for unrestricted use and fine-tuning. A Mistral spokesperson confirmed the licensing in an email. However, at the time of publication, no web demos were available for testing. Sophia Yang, Mistral’s head of developer relations, has indicated that Pixtral 12B will soon be available for testing on Mistral’s chatbot and API platforms, Le Chat and Le Plateforme.

Uncertainty Around Training Data

It’s not yet clear which image data was used to train Pixtral 12B. Most generative AI models are built on large datasets collected from public sources across the web, which can sometimes include copyrighted material. This has sparked controversy and legal disputes, with some companies claiming “fair use” rights while copyright holders argue otherwise. Lawsuits have already been filed against major players like OpenAI and Midjourney regarding the use of such data.

Mistral’s Rapid Rise in the AI World

The release of Pixtral 12B follows Mistral’s successful $645 million funding round led by General Catalyst, which valued the company at $6 billion. Despite being just over a year old, Mistral — partly owned by Microsoft — is already being recognized as Europe’s answer to OpenAI. The company’s strategy includes releasing free open models while offering managed versions for corporate customers and providing consulting services.