Releases · arthw/llama.cpp

20 Dec 16:27

258e80f

b4383 Latest

Latest

Merge pull request #6 from arthw/cherry-1220

Cherry 1220

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2024-12-20T16:27:33Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2024-12-20T16:27:39Z
llama-b4383-bin-macos-arm64.zip

59.1 MB 2024-12-20T16:27:46Z
llama-b4383-bin-macos-x64.zip

60 MB 2024-12-20T16:27:47Z
llama-b4383-bin-ubuntu-x64.zip

65.6 MB 2024-12-20T16:27:49Z
llama-b4383-bin-win-avx-x64.zip

9.75 MB 2024-12-20T16:27:51Z
llama-b4383-bin-win-avx2-x64.zip

9.75 MB 2024-12-20T16:27:51Z
llama-b4383-bin-win-avx512-x64.zip

9.76 MB 2024-12-20T16:27:52Z
llama-b4383-bin-win-cuda-cu11.7-x64.zip

147 MB 2024-12-20T16:27:53Z
llama-b4383-bin-win-cuda-cu12.4-x64.zip

147 MB 2024-12-20T16:27:56Z
Source code (zip)

2024-12-20T15:43:04Z
Source code (tar.gz)

2024-12-20T15:43:04Z

19 Nov 02:48

github-actions

b4137

8dcc98f

b4137

Merge pull request #5 from arthw/cherry-1118

Cherry 1118

Assets 21

07 Aug 16:34

github-actions

b3555

75a3266

b3555

fix error

Assets 20

07 Aug 16:26

github-actions

b3554

9d73802

b3554

ggml-backend : fix async copy from CPU (#8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

Assets 19

02 Aug 05:48

github-actions

b3517

11c713b

b3517

[SYCL] Fixing wrong VDR iq4nl value (#8812)

Assets 20

01 Aug 06:57

github-actions

b3482

c16f01b

b3482

Merge pull request #2 from arthw/refactor_dev

Refactor device management and usage api

Assets 20

27 Jul 14:49

github-actions

b3475

e661170

b3475

llama : add support for llama 3.1 rope scaling factors (#8676)

* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>

Assets 20