MastodonTech.de

Khronos GroupOpenCL v3.0.19 maintenance update released with bug fixes & clarifications and adds two new extensions: cl_khr_spirv_queries to simplify querying the SPIR-V capabilities of a device, and cl_khr_external_memory_android_hardware_buffer to more efficiently interoperate with other APIs on Android devices. In addition, the cl_khr_kernel_clock extension to sample a clock within a kernel has been finalized and is no longer an experimental extension. Khronos <a href="https://fosstodon.org/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> Registry: <a href="https://registry.khronos.org/OpenCL/" rel="nofollow noopener" translate="no" target="_blank">https://registry.khronos.org/OpenCL/</a>

Rainer<a href="https://federation.network/@GuettisKnippse" class="u-url mention" rel="nofollow noopener" target="_blank">@GuettisKnippse</a> Unter Einstellungen/Bearbeitung/ <a href="https://social.anoxinon.de/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> aktiviert?

Gamey :thisisfine: :antifa:I want to get <a href="https://chaos.social/tags/davinci_resolve" class="mention hashtag" rel="nofollow noopener" target="_blank">#davinci_resolve</a> working on <a href="https://chaos.social/tags/Fedora" class="mention hashtag" rel="nofollow noopener" target="_blank">#Fedora</a> 42 with my now very old AMD rx480 8GB but it uses <a href="https://chaos.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>. The obvious choice would be <a href="https://chaos.social/tags/rocm" class="mention hashtag" rel="nofollow noopener" target="_blank">#rocm</a> but that dropped support for my GPU years ago and from what I found also causes issues with Davinci resolve for even more years. The other obvious choice would be mesas implementation but while <a href="https://chaos.social/tags/Rusticl" class="mention hashtag" rel="nofollow noopener" target="_blank">#Rusticl</a> improved things it's still not a feature complete implementation and rather slow. Is it smart to use the amdgpu-pro ICD with mesa drivers for this?

रञ्जित (Ranjit Mathew)“Blackwell: Nvidia’s Massive GPU”, Chester Lam, Chips And Cheese (<a href="https://chipsandcheese.com/p/blackwell-nvidias-massive-gpu" rel="nofollow noopener" translate="no" target="_blank">https://chipsandcheese.com/p/blackwell-nvidias-massive-gpu</a>).On HN: <a href="https://news.ycombinator.com/item?id=44409391" rel="nofollow noopener" translate="no" target="_blank">https://news.ycombinator.com/item?id=44409391</a><a href="https://mastodon.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#Nvidia</a> <a href="https://mastodon.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a> <a href="https://mastodon.social/tags/Blackwell" class="mention hashtag" rel="nofollow noopener" target="_blank">#Blackwell</a> <a href="https://mastodon.social/tags/Hardware" class="mention hashtag" rel="nofollow noopener" target="_blank">#Hardware</a> <a href="https://mastodon.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#HPC</a> <a href="https://mastodon.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>

Dr. Moritz LehmannFinally I can "SLI" AMD+Intel+Nvidia <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a>s at home! I simulated this crow in flight at 680M grid cells in 36GB VRAM, pooled together from - 🟥 <a href="https://mast.hpc.social/tags/AMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#AMD</a> Radeon RX 7700 XT 12GB (RDNA3) - 🟦 <a href="https://mast.hpc.social/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#Intel</a> Arc B580 12GB (Battlemage) - 🟩 <a href="https://mast.hpc.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#Nvidia</a> Titan Xp 12GB (Pascal) My <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener" target="_blank">#CFD</a> software can pool the VRAM of any combination of any GPUs together via <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>. <a href="https://mast.hpc.social/tags/Kr%C3%A4henliebe" class="mention hashtag" rel="nofollow noopener" target="_blank">#Krähenliebe</a> <a href="https://mast.hpc.social/tags/birds" class="mention hashtag" rel="nofollow noopener" target="_blank">#birds</a> <a href="https://mast.hpc.social/tags/crow" class="mention hashtag" rel="nofollow noopener" target="_blank">#crow</a> <a href="https://www.youtube.com/watch?v=1z5-ddsmAag" rel="nofollow noopener" translate="no" target="_blank">https://www.youtube.com/watch?v=1z5-ddsmAag</a>

karolherbst 🐧 🦀Who is using CL_sRGBA images with <a href="https://chaos.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>, specifically to write to it (cl_khr_srgb_image_writes)?There is limited hw support for writing to sRGBA images and I'm now curious what even uses that feature.It was apparently important enough to require support for it for OpenCL 2.0, but... that's not telling me much.

Dr. Moritz LehmannIs it possible to run AMD+Intel+Nvidia <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a>s in the same PC? Yes! 🖖😋 Got this RDNA3 chonker for free from 11 bit studios contest! It completes my 36GB VRAM RGB SLI abomination setup: - 🟥 <a href="https://mast.hpc.social/tags/AMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#AMD</a> Radeon RX 7700 XT 12GB - 🟦 <a href="https://mast.hpc.social/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#Intel</a> Arc B580 12GB - 🟩 <a href="https://mast.hpc.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#Nvidia</a> Titan Xp 12GB The drivers all work together in <a href="https://mast.hpc.social/tags/Linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#Linux</a> Ubuntu 24.04.2. Backbone is an ASUS ProArt Z790 with i7-13700K and 64GB, PCIe 4.0 x8/x8 + 3.0 x4 - plenty interconnect bandwidth. Finally I can develop and test <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> on all major patforms!

txt.fileToday’s hate about computers and software

Janne MorenI wish <a href="https://fosstodon.org/tags/pytorch" class="mention hashtag" rel="nofollow noopener" target="_blank">#pytorch</a> wasn't CUDA/ROCm only :(I know I *can* use the nodes at work, but that's not the point. I want to use my own new toy, not somebody else's.Any DL framework out there with good support for <a href="https://fosstodon.org/tags/Vulkan" class="mention hashtag" rel="nofollow noopener" target="_blank">#Vulkan</a> or <a href="https://fosstodon.org/tags/Opencl" class="mention hashtag" rel="nofollow noopener" target="_blank">#Opencl</a> ?

Lukas WeidingerI’m thinking of <a href="https://gruene.social/tags/compiling" class="mention hashtag" rel="nofollow noopener" target="_blank">#compiling</a> <a href="https://gruene.social/tags/darktable" class="mention hashtag" rel="nofollow noopener" target="_blank">#darktable</a> from source so that it’s better optimized for my processor. Anybody experience with its potential? <a href="https://gruene.social/tags/question" class="mention hashtag" rel="nofollow noopener" target="_blank">#question</a> <a href="https://gruene.social/tags/followerpower" class="mention hashtag" rel="nofollow noopener" target="_blank">#followerpower</a>I’m generally ok with how fast the flatpak runs on my i7-1255 laptop. However, with such an iterative workflow, I feel that one has much to gain with slight improvements via <a href="https://gruene.social/tags/opencl" class="mention hashtag" rel="nofollow noopener" target="_blank">#opencl</a> and AVX.

Dr. Moritz LehmannI made this <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener" target="_blank">#CFD</a> simulation run on a frankenstein zoo of 🟥AMD + 🟩Nvidia + 🟦Intel <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a>s! 🖖🤪 <a href="https://www.youtube.com/watch?v=_8Ed8ET9gBU" rel="nofollow noopener" translate="no" target="_blank">https://www.youtube.com/watch?v=_8Ed8ET9gBU</a>The ultimate SLI abomination setup: - 1x Nvidia A100 40GB - 1x Nvidia Tesla P100 16GB - 2x Nvidia A2 15GB - 3x AMD Instinct MI50 - 1x Intel Arc A770 16GBI split the 2.5B cells in 9 domains of 15GB - A100 takes 2 domains, the other GPUs 1 domain each. The GPUs communicate over PCIe via <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>.Huge thanks to Tobias Ribizel from TUM for the hardware!

Giuseppe BilottaI'm liking the class this year. Students are attentive and participating, and the discussion is always productive.We were discussing the rounding up of the launch grid in <a href="https://fediscience.org/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> to avoid the catastrophic performance drops that come from the inability to divide the “actual” work size by anything smaller than the maximum device local work size, and were discussing on how to compute the “rounded up” work size.The idea is this: given the worksize N and the local size L, we have to round N to the smallest multiple of L that is not smaller than N. This effectively means computing D = ceili(N/L) and then using D*L.There are several ways to compute D, but on the computer, working only with integers and knowing that integer division always rounded down, what is the “best way”?D = N/L + 1 works well if N is not a multiple of L, but gives us 1 more than the intended result if N *is* a multiple of L. So we want to add the extra 1 only if N is not a multiple. This can be achieved for example withD = N/L + !!(N % L)which leverages the fact that !! (double logical negation) turns any non-zero value into 1, leaving zero as zero. So we round *down* (which is what the integer division does) and then add 1 if (and only if) there is a reminder to the division.This is ugly not so much because of the !!, but because the modulus operation % is slow.1/n

HGPU groupLLMPerf: GPU Performance Modeling meets Large Language Models<a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> <a href="https://mast.hpc.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#LLM</a> <a href="https://mast.hpc.social/tags/Performance" class="mention hashtag" rel="nofollow noopener" target="_blank">#Performance</a> <a href="https://mast.hpc.social/tags/Package" class="mention hashtag" rel="nofollow noopener" target="_blank">#Package</a><a href="https://hgpu.org/?p=29826" rel="nofollow noopener" translate="no" target="_blank">https://hgpu.org/?p=29826</a>

Dr. Moritz LehmannI got access to <a href="https://mastodon.social/@LRZ_DE" class="u-url mention" rel="nofollow noopener" target="_blank">@LRZ_DE</a>'s new coma-cluster for <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> benchmarking and experimentation 🖖😋💻🥨🍻 I've added a ton of new <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener" target="_blank">#CFD</a> <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a>/<a href="https://mast.hpc.social/tags/CPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#CPU</a> benchmarks: <a href="https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks" rel="nofollow noopener" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks</a>Notable hardware configurations include: - 4x H100 NVL 94GB - 2x Nvidia L40S 48GB - 2x Nvidia A2 15GB datacenter toaster - 2x Intel Arc A770 16GB - AMD+Nvidia SLI abomination consisting of 3x Instinct MI50 32GB + 1x A100 40GB - AMD Radeon 8060S (chonky Ryzen AI Max+ 395 iGPU with quad-channel RAM) thanks to <a href="https://mast.hpc.social/@cheese" class="u-url mention" rel="nofollow noopener" target="_blank">@cheese</a>

.:\dGh/:.Is there any difference between computing AI workloads in Vulkan, OpenCL and CUDA?I know that some people say that NVIDIA doesn't support (quite well) OpenCL or Vulkan, performance is achieved by using CUDA. But what is the story for other vendors (Intel, AMD, QualComm, Apple) ?<a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://mastodon.social/tags/Programming" class="mention hashtag" rel="nofollow noopener" target="_blank">#Programming</a> <a href="https://mastodon.social/tags/AIProgramming" class="mention hashtag" rel="nofollow noopener" target="_blank">#AIProgramming</a> <a href="https://mastodon.social/tags/AIDevelopment" class="mention hashtag" rel="nofollow noopener" target="_blank">#AIDevelopment</a> <a href="https://mastodon.social/tags/Software" class="mention hashtag" rel="nofollow noopener" target="_blank">#Software</a> <a href="https://mastodon.social/tags/SoftwareDevelopment" class="mention hashtag" rel="nofollow noopener" target="_blank">#SoftwareDevelopment</a> <a href="https://mastodon.social/tags/Vulkan" class="mention hashtag" rel="nofollow noopener" target="_blank">#Vulkan</a> <a href="https://mastodon.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> <a href="https://mastodon.social/tags/CUDA" class="mention hashtag" rel="nofollow noopener" target="_blank">#CUDA</a> <a href="https://mastodon.social/tags/NVIDIA" class="mention hashtag" rel="nofollow noopener" target="_blank">#NVIDIA</a> <a href="https://mastodon.social/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#Intel</a> <a href="https://mastodon.social/tags/IntelArc" class="mention hashtag" rel="nofollow noopener" target="_blank">#IntelArc</a> <a href="https://mastodon.social/tags/AMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#AMD</a> <a href="https://mastodon.social/tags/AMDRadeon" class="mention hashtag" rel="nofollow noopener" target="_blank">#AMDRadeon</a> <a href="https://mastodon.social/tags/Radeon" class="mention hashtag" rel="nofollow noopener" target="_blank">#Radeon</a> <a href="https://mastodon.social/tags/Qualcomm" class="mention hashtag" rel="nofollow noopener" target="_blank">#Qualcomm</a> <a href="https://mastodon.social/tags/Apple" class="mention hashtag" rel="nofollow noopener" target="_blank">#Apple</a> <a href="https://mastodon.social/tags/AppleSilicon" class="mention hashtag" rel="nofollow noopener" target="_blank">#AppleSilicon</a> <a href="https://mastodon.social/tags/AppleM4" class="mention hashtag" rel="nofollow noopener" target="_blank">#AppleM4</a> <a href="https://mastodon.social/tags/M4" class="mention hashtag" rel="nofollow noopener" target="_blank">#M4</a>

MicrofractalMy first good <a href="https://mathstodon.xyz/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> <a href="https://mathstodon.xyz/tags/Mandelbrot" class="mention hashtag" rel="nofollow noopener" target="_blank">#Mandelbrot</a> <a href="https://mathstodon.xyz/tags/fractal" class="mention hashtag" rel="nofollow noopener" target="_blank">#fractal</a> using <a href="https://mathstodon.xyz/tags/perturbation" class="mention hashtag" rel="nofollow noopener" target="_blank">#perturbation</a>. (Separated from the fragment shader, which does the coloring of the computed iterations.)Next step is a formula parser, which generates opencl-code, which can be compiled at runtime.<a href="https://mathstodon.xyz/tags/fractalfriday" class="mention hashtag" rel="nofollow noopener" target="_blank">#fractalfriday</a>

GPUOpen🧐 AMD Radeon GPU Analyzer (RGA) is our performance analysis tool for <a href="https://mastodon.gamedev.place/tags/DirectX" class="mention hashtag" rel="nofollow noopener" target="_blank">#DirectX</a>, <a href="https://mastodon.gamedev.place/tags/Vulkan" class="mention hashtag" rel="nofollow noopener" target="_blank">#Vulkan</a>, SPIR-V, <a href="https://mastodon.gamedev.place/tags/OpenGL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenGL</a>, & <a href="https://mastodon.gamedev.place/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>. ✨As well as updates for AMD RDNA 4, there's enhancements to the ISA view UI, using the same updated UI as RGP ✨More detail: <a href="https://gpuopen.com/learn/rdna-cdna-architecture-disassembly-radeon-gpu-analyzer-2-12/?utm_source=mastodon&utm_medium=social&utm_campaign=rdts" rel="nofollow noopener" translate="no" target="_blank">https://gpuopen.com/learn/rdna-cdna-architecture-disassembly-radeon-gpu-analyzer-2-12/?utm_source=mastodon&utm_medium=social&utm_campaign=rdts</a> (🧵5/7)

Dr. Moritz LehmannHere's my <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> implementation: <a href="https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993" rel="nofollow noopener" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993</a>

Dr. Moritz Lehmann<a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener" target="_blank">#CFD</a> v3.2 is out! I've implemented the much requested <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a> summation for object force/torque; it's ~20x faster than <a href="https://mast.hpc.social/tags/CPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#CPU</a> <a href="https://mast.hpc.social/tags/multithreading" class="mention hashtag" rel="nofollow noopener" target="_blank">#multithreading</a>. 🖖😋 Horizontal sum in <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add. Also improved volumetric <a href="https://mast.hpc.social/tags/raytracing" class="mention hashtag" rel="nofollow noopener" target="_blank">#raytracing</a>! <a href="https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2" rel="nofollow noopener" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2</a>

Frühere Suchanfragen

Suchoptionen

Verwaltet von:

Serverstatistik:

#opencl