Consumer hardware is no longer a priority for manufacturers

Lee Duna@lemmy.nz · 2 days ago

Consumer hardware is no longer a priority for manufacturers

WhyJiffie@sh.itjust.works · 9 hours ago

You can’t just do “ollama run” and expect good performance, as the local LLM scene is finicky and highly experimental. You have to compile forks and PRs, learn about sampling and chat formatting, perplexity and KL divergence, about quantization and MoEs and benchmarking. Everything is moving too fast, and is too performance sensitive, to make it that easy, unfortunately.

how do you have the time to figure all these out and keep being up to date? do you do this at work?

brucethemoose@lemmy.world · edit-2 1 hour ago

As a hobby mostly, but its useful for work. I found LLMs fascinating even before the hype, when everyone was trying to get GPT-J finetunes named after Star Trek characters to run.

Reading my own quote, I was being a bit dramatic. But at the very least it is super important to grasp some basic concepts (like MoE CPU offloading, quantization, and specs of your own hardware), and watch for new releases in LocalLlama or whatever. You kinda do have to follow and test things, yes, as there’s tons of FUD in open weights AI land.

As an example, stepfun 2.5 seems to be a great model for my hardware (single Nvidia GPU + 128GB CPU RAM), and it could have easily flown under the radar without following stuff. I also wouldn’t know to run it with ik_llama.cpp instead of mainline llama.cpp, for a considerable speed/quality boost over (say) LM Studio.

If I were to google all this now, I’d probably still get links for setting up the Deepseek distillations from Tech Bro YouTubers. That series is now dreadfully slow and long obsolete.