FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

FP8 is ~100 tflops faster when the kernel name has "cutlass" in it
#JackDongarra Makes a Stand for Traditional #HPC: "US still doesn’t have a clear, long-term plan for what comes next.... U.S. risks falling behind."
Challenges to high-performance computing threaten #US #innovation
The #AI boom has led chip makers to focus on #FP16 and #FP8, not the #FP64 used by scientific research. If chip companies stop making the parts that #scientists need, then it could become harder to do important research.
https://theconversation.com/challenges-to-high-performance-computing-threaten-us-innovation-255188
Welcome to the thrilling world of "#DeepSeek," where they unleash their groundbreaking #FP8 #GEMM #Kernels, as if these buzzwords mean anything to normal humans.
Now you too can revel in the #excitement of "#fine-grained #scaling," because who doesn't dream of spending their weekends scaling kernels?
#GitHub's #navigation menu is undoubtedly the real star here, stealing the show with its riveting toggle action.
https://github.com/deepseek-ai/DeepGEMM #tech #HackerNews #ngated
DeepSeek Open Sources DeepGEMM: Clean and efficient FP8 GEMM kernels — https://github.com/deepseek-ai/DeepGEMM
#HackerNews #DeepSeek #DeepGEMM #FP8 #AI #Kernels #OpenSource