Today, I found a great presentation by @okias about running #OpenCL-accelerated #LLMs on @postmarketOS (see: https://fosdem.org/2024/events/attachments/fosdem-2024-3364-why-not-run-opencl-accelerated-llm-on-your-phone-/slides/22383/Why_not_run_OpenCL-accelerated_LLM_on_your_phon_nK2DudB.pdf). @okias showed a benchmark with a OnePlus 6 device. I wonder whether it would be possible to use even older hardware to run a functional #LLM. Is there maybe even a list of supported devices?
@LukasBrausch @postmarketOS Thanks. Adreno 630 is a pretty powerful high-end mobile GPU and I was running only GPT2-level LLM. I think older GPUs exposing OpenCL could be still leveraged to accelerate some simpler tasks (e.g. translators etc.), but bigger LLM would be out of scope.
@okias @postmarketOS I see. Thanks a lot for the clarification. What about the other extreme? What's the most potent #LLM that can currently be run on a phone (preferably using only free solutions, such as @postmarketOS)? To me, the idea of having an #OpenSource LLM on a phone instead of a large server farm is just very intriguing.
@LukasBrausch @postmarketOS I think someone mentioned some success regarding some simple small GPT-3. I didn't continue pursuing it, because I want to address a hack I had to do for in Mesa for Tinygrad to work. We also need to get the driver part finished and reviewed. I see the "future" more in smart translators and simple one-language speech recognition on these low-power devices.
@okias @postmarketOS Thanks a lot for your insights and the hard work you're doing here. Much appreciated.