• AiNews.com
  • Posts
  • xAI’s Grok Can Now Answer Questions Using Your Phone’s Camera

xAI’s Grok Can Now Answer Questions Using Your Phone’s Camera

A smartphone held in a user's hand, with the screen showing a chatbot interface analyzing real-world objects using the camera. The phone is pointed at a product, a foreign language sign, and a handwritten note. The screen displays visual highlights and live chatbot responses to each object. The background subtly suggests comparisons to Google and OpenAI’s AI tools, with icons and faint UI elements. The overall scene is modern and intuitive, reflecting a cutting-edge, multimodal AI experience.

Image Source: ChatGPT-4o

xAI’s Grok Can Now Answer Questions Using Your Phone’s Camera

xAI’s Grok chatbot just took a major leap forward in visual intelligence. The company announced Tuesday that Grok can now interpret the world through a smartphone camera, thanks to a new feature called Grok Vision—a move that puts it in closer competition with Google’s Gemini and OpenAI’s ChatGPT.

Grok Vision allows users to point their iPhone at objects—like signs, documents, or everyday products—and ask the chatbot questions about what it’s seeing. While this feature is currently only available in the iOS version of the Grok app, xAI says it’s coming to Android in the future.

Multilingual Audio and Voice Search Also Added

The Grok update also includes multilingual audio support and real-time search integration for the chatbot’s voice mode. These new tools are available to Android users, but only for subscribers to xAI’s $30/month SuperGrok plan.

These updates continue a pattern of rapid feature growth for Grok. Earlier this month, xAI introduced a memory feature that allows the chatbot to recall details from past conversations. The bot also gained a canvas-like tool for creating documents and simple applications—adding a creative edge to its growing list of capabilities.

What This Means

Grok’s new vision capabilities signal a key step in making chatbots more context-aware and physically integrated into daily life. By allowing users to simply point and ask, xAI is creating a more seamless connection between digital assistance and the real world. As multimodal features become the norm across AI platforms, tools like Grok Vision could define the next era of mobile interaction.

Similar real-time visual features have already rolled out from Google’s Gemini and OpenAI’s ChatGPT, placing Grok squarely in the race to define the next generation of multimodal AI assistants—ones that can see, hear, and understand as fluently as they chat.

The future of AI may not live behind a screen—it may live through the lens in your hand.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.