Releases: arthw/llama.cpp
Releases · arthw/llama.cpp
b4383
b4137
Merge pull request #5 from arthw/cherry-1118 Cherry 1118
b3555
fix error
b3554
ggml-backend : fix async copy from CPU (#8897) * ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same
b3517
[SYCL] Fixing wrong VDR iq4nl value (#8812)
b3482
Merge pull request #2 from arthw/refactor_dev Refactor device management and usage api
b3475
llama : add support for llama 3.1 rope scaling factors (#8676) * Add llama 3.1 rope scaling factors to llama conversion and inference This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192 * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> * address comments * address comments * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>
b3388
fix UT of concat
b3387
mv softmax to separated file
b3313
fix for multiple cards