• AiNews.com
  • Posts
  • DeepSeek Open Source Week Day 3: DeepGEMM Boosts FP8 GEMM Performance

DeepSeek Open Source Week Day 3: DeepGEMM Boosts FP8 GEMM Performance

A futuristic digital landscape featuring interconnected glowing data nodes and GPU processors, symbolizing high-performance computing. In the background, sleek NVIDIA Hopper GPUs are arranged in rows, with light trails emphasizing speed and efficiency. A subtle overlay of mathematical symbols and matrix grids highlights DeepGEMM’s role in optimizing FP8 matrix multiplication for AI computation.

Image Source: ChatGPT-4o

DeepSeek Open Source Week Day 3: DeepGEMM Boosts FP8 GEMM Performance

For Day 3 of DeepSeek's Open Source Week, DeepSeek has unveiled DeepGEMM, a lightweight yet powerful FP8 General Matrix Multiplication (GEMM) library designed to support both dense and Mix-of-Experts (MoE) GEMMs. The library plays a crucial role in optimizing DeepSeek-V3 and DeepSeek-R1 training and inference, boasting impressive efficiency gains on NVIDIA Hopper GPUs.

Key Features of DeepGEMM

  • Exceptional Performance: Achieves up to 1350+ FP8 TFLOPS on Hopper GPUs.

  • Minimal Dependencies: Designed to be as clean as a tutorial, avoiding unnecessary complexity.

  • Just-In-Time Compilation: Compiles all kernels at runtime, eliminating installation overhead.

  • Compact Yet Powerful: Core logic spans only ~300 lines of code, yet outperforms expert-tuned kernels on most matrix sizes.

  • Versatile Layout Support: Compatible with dense layout and two MoE layouts.

Built in CUDA, DeepGEMM avoids heavy reliance on CUTLASS or CuTe templates, prioritizing simplicity while still using CUDA-core two-level accumulation (promotion) to counter FP8 tensor core imprecision. This makes it a valuable resource for understanding and optimizing FP8 matrix multiplication on Hopper tensor cores.

Despite its impressive speed and efficiency, DeepGEMM does not perform optimally on certain matrix shapes, and DeepSeek welcomes optimization contributions from the community.

For a detailed breakdown of DeepGEMM’s performance across different matrix shapes, visit the official GitHub repository.

DeepSeek API Off-Peak Discounts

In addition to launching DeepGEMM, DeepSeek has introduced off-peak pricing for its API platform, offering significant savings between 16:30 and 00:30 UTC daily (8:30 AM – 4:30 PM PST).

  • DeepSeek-V3: 50% off

  • DeepSeek-R1: 75% off

These discounts provide a cost-effective way for users to maximize their compute resources during designated off-peak hours.

Looking Ahead

DeepGEMM’s introduction underscores DeepSeek’s focus on efficiency and accessibility in AI infrastructure. While already outperforming many expert-tuned kernels, the open-source nature of DeepGEMM leaves room for further refinement. With DeepSeek actively seeking optimization contributions, developers and researchers have the opportunity to push the boundaries of FP8 GEMM performance even further.

What This Means

DeepGEMM’s release underscores DeepSeek’s commitment to open-source innovation, providing a lightweight yet high-performance FP8 GEMM solution for AI workloads. By simplifying implementation while achieving state-of-the-art performance, DeepGEMM offers developers and researchers a valuable tool for optimizing NVIDIA Hopper-based training and inference. Meanwhile, the introduction of off-peak discounts makes DeepSeek’s API platform more cost-effective, encouraging AI practitioners to maximize compute efficiency.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.