Skip to content

Releases: arthw/llama.cpp

b4383

20 Dec 16:27
258e80f
Compare
Choose a tag to compare
Merge pull request #6 from arthw/cherry-1220

Cherry 1220

b4137

19 Nov 02:48
8dcc98f
Compare
Choose a tag to compare
Merge pull request #5 from arthw/cherry-1118

Cherry 1118

b3555

07 Aug 16:34
Compare
Choose a tag to compare
fix error

b3554

07 Aug 16:26
Compare
Choose a tag to compare
ggml-backend : fix async copy from CPU (#8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

b3517

02 Aug 05:48
Compare
Choose a tag to compare
[SYCL] Fixing wrong VDR iq4nl value (#8812)

b3482

01 Aug 06:57
c16f01b
Compare
Choose a tag to compare
Merge pull request #2 from arthw/refactor_dev

Refactor device management and usage api

b3475

27 Jul 14:49
Compare
Choose a tag to compare
llama : add support for llama 3.1 rope scaling factors (#8676)

* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>

b3388

14 Jul 04:09
Compare
Choose a tag to compare
fix UT of concat

b3387

13 Jul 18:00
Compare
Choose a tag to compare
mv softmax to separated file

b3313

13 Jul 17:38
Compare
Choose a tag to compare
fix for multiple cards