llama : add support for llama 3.1 rope scaling factors (#8676)
* Add llama 3.1 rope scaling factors to llama conversion and inference
This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net>
* address comments
* address comments
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net>
---------
Co-authored-by: compilade <git@compilade.net>