Releases · arthw/llama.cpp

20 Jun 06:27

2075a66

b3187

metal : fix `ggml_metal_supports_op` for BF16 (#8021)

Currently the Metal backend does not support BF16. `ggml_metal_supports_op` was returning true in these cases, leading to a crash with models converted with `--leave-output-tensor`. This commit checks if the first few sources types are BF16 and returns false if that's the case.

Assets 20

17 Jun 02:07

github-actions

b3163

43b35e3

b3163

Add support for sqrt on CUDA (#7953)

* cuda sqrt support

* enable cuda in pca

* fix comments in pca

* add test

* add sqrt to ggml_backend_cuda_supports_op

* fix test

* new line

* Use F32 sqrtf instead of F64 sqrt

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Assets 20

15 Jun 02:55

github-actions

b3151

f8ec887

b3151

ci : fix macos x86 build (#7940)

In order to use old `macos-latest` we should use `macos-12`

Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975

Assets 20

14 Jun 05:27

github-actions

b3145

172c825

b3145

rpc : fix ggml_backend_rpc_supports_buft() (#7918)

Assets 20

28 May 03:16

github-actions

b3014

852aafb

b3014

update HIP_UMA #7399 (#7414)

* update HIP_UMA #7399

add use of hipMemAdviseSetCoarseGrain when LLAMA_HIP_UMA is enable.
- get x2 on prompte eval and x1.5 on token gen with rocm6.0 on ryzen 7940HX iGPU (780M/gfx1103)

* simplify code, more consistent style

---------

Co-authored-by: slaren <slarengh@gmail.com>

Assets 21

24 May 02:25

github-actions

b2986

74f33ad

b2986

readme : remove trailing space (#7469)

Assets 21

21 May 09:15

github-actions

b2953

917dc8c

b2953

Tokenizer SPM fixes for phi-3 and llama-spm (#7375)

* Update brute force test: special tokens
* Fix added tokens
  - Try to read 'added_tokens.json'.
  - Try to read 'tokenizer_config.json'.
  - Try to read 'tokenizer.json'.
* Fix special tokens rtrim

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : fix test regexes

Assets 21