• 1 Post
  • 7 Comments
Joined 3 months ago
cake
Cake day: August 22nd, 2024

help-circle



  • I tried llama.cpp with llama-server and Qwen2.5 Coder 1.5B. Higher parameters just output garbage and I can see an OutOfMemory error in the logs. When trying the 1.5B model, I have an issue where the model will just stop outputting the answer, it will stop mid sentence or in the middle of a class. Is it an issue with my hardware not being performant enough or is it something I can tweak with some parameters?