AiNews.com
Posts
Stability AI and Arm Bring On-Device Generative Audio to Smartphones

Stability AI and Arm Bring On-Device Generative Audio to Smartphones

Alicia Shapiro
March 04, 2025 • Estimated Reading Time: 4 minutes

A futuristic smartphone displaying an AI-powered audio generation interface. The screen showcases a digital waveform and real-time sound visualization, representing on-device generative audio technology. The background features a high-tech environment with glowing sound waves and speakers, symbolizing Stability AI and Arm’s collaboration in enabling offline AI-powered media creation without cloud processing.

Image Source: ChatGPT-4o

Stability AI and Arm Bring On-Device Generative Audio to Smartphones

Stability AI has partnered with Arm to bring on-device generative audio to mobile devices, allowing users to generate high-quality sound effects and audio samples without needing an internet connection.

By leveraging Arm’s KleidiAI libraries, Stable Audio Open—Stability AI’s text-to-audio model—now runs 30x faster on Arm CPUs, reducing generation time from minutes to seconds. This breakthrough will be showcased at MWC Barcelona on March 3, 2025, demonstrating AI-powered content creation at the edge. You can learn more about the partnership here.

A Breakthrough in On-Device AI Audio

Traditionally, generative AI models have required cloud-based processing due to high computational demands. However, this collaboration enables real-time, offline AI audio generation entirely on Arm CPUs, which power 99% of smartphones globally.

Key Advancements:

30x speed improvement: Audio generation time reduced from 240 seconds to under 8 seconds for an 11-second clip on Armv9 CPUs.
Runs entirely on-device: No internet connection or cloud processing required.
Seamless integration into media production pipelines for sound effects, audio samples, and production elements.

How Stability AI and Arm Achieved This Speedup

Optimizing Stable Audio Open for mobile was a major challenge due to initially slow processing times on Arm CPUs. Stability AI and Arm accelerated performance by:

Distilling the model to optimize efficiency.
Using Arm’s software stack, including int8 matmul kernels from KleidiAI in ExecuTorch via XNNPack, to enhance execution speed.

This software optimization ensures smooth performance on standard mobile hardware, making generative AI accessible without heavy computing requirements.

Looking Ahead: Expanding AI Media Generation at the Edge

This collaboration is just the beginning. Stability AI plans to expand on-device AI generation beyond audio, bringing image, video, and 3D models to mobile devices. By enabling real-time, high-quality AI content creation at the edge, this breakthrough could transform how media is produced on smartphones.

What This Means for Consumers

The partnership between Stability AI and Arm marks a major step forward in making generative AI more accessible. By enabling on-device AI audio generation, this breakthrough eliminates the need for constant internet connectivity or high-powered cloud servers, making AI-powered sound creation:

Faster: With 30x speed improvements, users can generate high-quality audio in seconds instead of minutes.
More private: Since no data is sent to the cloud, users maintain greater privacy and security.
More accessible: Without heavy hardware requirements, any compatible smartphone can run AI-generated music, sound effects, and voice synthesis without extra costs.
More integrated: Musicians, content creators, and game developers can seamlessly integrate AI-generated audio into their creative workflows on the go.
Beyond audio, this advancement signals the future of AI-powered media creation on mobile devices.

Stability AI’s vision to bring image, video, and 3D generation to smartphones could redefine how users interact with AI, making high-quality generative tools available anytime, anywhere.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.