Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Proxy and SSL Config Options to Python SDK #3180

Open
wants to merge 112 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
e8003c8
Add proxy and SSL config options
cgivre Nov 11, 2024
a00b012
Added args to additional method
cgivre Nov 11, 2024
88ef003
Remove binary state from high-level API and use Jinja templates (#3147)
cebtenzzre Nov 25, 2024
c7c99a1
Fixups for Jinja PR (#3215)
cebtenzzre Dec 4, 2024
0ae1ae3
ci: do not run online installer or publish jobs on PR branches (#3217)
cebtenzzre Dec 4, 2024
1ed30da
llamamodel: add missing softmax to fix temperature (#3202)
cebtenzzre Dec 4, 2024
2cad0d7
chat: cut v3.5.0-rc1 release candidate (#3218)
cebtenzzre Dec 4, 2024
87b5127
add changelog entries for Jinja PR (#3223)
cebtenzzre Dec 6, 2024
49363ed
changelog: add more changes from #3147 (#3226)
cebtenzzre Dec 6, 2024
db4d975
Animate the removal of chat items when editing prompts. (#3227)
manyoso Dec 6, 2024
4807e6a
qml: tweaks to new edit/redo buttons (#3228)
cebtenzzre Dec 6, 2024
3b26a65
chat: cut v3.5.0-rc2 release candidate (#3229)
cebtenzzre Dec 6, 2024
a1e38da
chat: run update_translations for v3.5.0 (#3230)
cebtenzzre Dec 6, 2024
7a71600
changelog: fix parenthesis
cebtenzzre Dec 9, 2024
f325cea
Italian localization update (#3236)
Harvester62 Dec 9, 2024
38c1ab2
fixups for GPT4All v3.5.0-rc2 (#3239)
cebtenzzre Dec 9, 2024
3912990
update Romanian translation for v3.5.0 (#3232)
SINAPSA-IC Dec 9, 2024
9a64f52
chat: cut v3.5.0 release (#3240)
cebtenzzre Dec 9, 2024
d11e18c
chat: release v3.5.0 (#3241)
cebtenzzre Dec 9, 2024
f5cee70
Bump version to v3.5.1-dev0 (#3242)
manyoso Dec 9, 2024
ee9dd88
chatmodel: fix incorrect currentResponse argument (#3245)
cebtenzzre Dec 9, 2024
d8f141a
Fix the z-ordering of the home button. (#3246)
manyoso Dec 9, 2024
0107a8c
metadata: fix typos in release notes
cebtenzzre Dec 10, 2024
6077c39
fix several bad chat templates (#3250)
cebtenzzre Dec 10, 2024
e647581
models3: fix Llama 3.2 chat template (#3251)
cebtenzzre Dec 10, 2024
66c9ffe
changelog: add PR #3251
cebtenzzre Dec 10, 2024
b5d67d1
Update changlog and version to make 3.5.1 hotfix release. (#3252)
manyoso Dec 10, 2024
337afa0
Release notes and latestnews for v3.5.1. (#3253)
manyoso Dec 10, 2024
52e8ea4
Bump the version to 3.5.2-dev0. (#3254)
manyoso Dec 10, 2024
167e0de
latestnews: make it more compact
cebtenzzre Dec 12, 2024
cc30175
Fix local server regressions caused by Jinja PR (#3256)
cebtenzzre Dec 13, 2024
e4b0a8d
modellist: fix cloning of chat template and system message (#3262)
cebtenzzre Dec 13, 2024
816158b
StartupDialog: fix two untranslated strings (#3293)
cebtenzzre Dec 13, 2024
b988e82
Break the explore models view into two. (#3269)
manyoso Dec 13, 2024
9eea8b7
chat: cut v3.5.2 release (#3292)
cebtenzzre Dec 13, 2024
383a99b
fix chatmodel.h #includes
cebtenzzre Dec 13, 2024
3218466
ci: attempt to fix Ubuntu build
cebtenzzre Dec 13, 2024
c6f01e0
chat: release version 3.5.2 (#3296)
cebtenzzre Dec 14, 2024
200c5a9
chat: fix localdocs breakage in v3.5.2 (#3302)
cebtenzzre Dec 16, 2024
6fedb79
New v3.5.3 hotfix release. (#3304)
manyoso Dec 16, 2024
e1a9048
ci: downgrade Windows image to fix build (#3306)
cebtenzzre Dec 16, 2024
0b029fa
chat: release version 3.5.3 (#3307)
cebtenzzre Dec 16, 2024
cf342e1
chat: bump version to 3.5.4-dev0
cebtenzzre Dec 16, 2024
cb00613
Update maintainers. (#3322)
manyoso Dec 18, 2024
11c285f
Fix for remote model templates when messages contain xml. (#3318)
manyoso Dec 18, 2024
6b1a140
Fix Jinja2Cpp bug that broke system msg detection in templates (#3325)
cebtenzzre Dec 19, 2024
1ba56b2
chatmodel: fix sources showing as unconsolidated in UI (#3328)
cebtenzzre Dec 19, 2024
33d7166
Code interpreter (#3173)
manyoso Dec 19, 2024
0f95f7c
modellist: automatically replace known chat templates with our versio…
cebtenzzre Dec 19, 2024
0c2c15e
undo unintentional partial revert of #3173
cebtenzzre Dec 19, 2024
b2b9be4
Release of 3.6.0. (#3329)
manyoso Dec 19, 2024
ac08448
qml: fix missing localdocs and prefill progress (#3330)
cebtenzzre Dec 19, 2024
0e5e4ce
Release notes and latestnews for v3.6.0, and bump version. (#3331)
manyoso Dec 19, 2024
d0d857d
ChatView: make "stop" and "copy conversation" work again (#3336)
manyoso Dec 20, 2024
a49c2cf
Release notes for v3.6.1 and bump version (#3339)
manyoso Dec 20, 2024
6499f27
updated settings page (#3368)
mcembalest Jan 7, 2025
5b8bd22
fix: format of language and locale setting (#3370)
mcembalest Jan 7, 2025
68d4ed7
Properly report that the computation was timedout to the model (#3369)
manyoso Jan 7, 2025
1cb34d1
code interpreter: support variadic console.log (#3371)
cebtenzzre Jan 7, 2025
5cd0bd1
chat templates: work around Jinja2Cpp issue with 'not X is defined' (…
cebtenzzre Jan 7, 2025
8ffac8f
jinja2cpp: update submodule for else/endif crash fix (#3373)
cebtenzzre Jan 8, 2025
dd0cef2
Update README.md - brokenlink (#3380)
AndriyMulyar Jan 10, 2025
4c1c026
Save chats on quit, even if window isn't closed first (#3387)
cebtenzzre Jan 16, 2025
ced514c
ci: use the shared 'gpt4all' context for environment variables (#3392)
cebtenzzre Jan 17, 2025
e629721
Add more chat template substitutions (#3393)
cebtenzzre Jan 21, 2025
7336de3
jinja2cpp: update submodule for partial subscript crash fix (#3394)
cebtenzzre Jan 21, 2025
0acdeee
Sign maintenancetool.app on macOS (#3391)
cebtenzzre Jan 21, 2025
e509457
add Windows ARM build (#3385)
cebtenzzre Jan 21, 2025
2ad4aea
ci: add missing context to Windows ARM builds (#3400)
cebtenzzre Jan 21, 2025
db36e30
Italian localization update (#3389)
Harvester62 Jan 21, 2025
f6399a3
jinja2cpp: update submodule for 'not X is defined' fix (#3402)
cebtenzzre Jan 21, 2025
f0ebabd
jinja2cpp: update submodule to fix unused var (#3403)
cebtenzzre Jan 22, 2025
5ee9b97
Bump version for 3.7.0 release. (#3401)
manyoso Jan 21, 2025
58515cf
changelog: add missing link
cebtenzzre Jan 22, 2025
35d9936
changelog: fix reference to wrong macOS version
cebtenzzre Jan 22, 2025
c13b71f
ci: fix macOS codesigning (#3408)
cebtenzzre Jan 23, 2025
8023ba2
chat: release version 3.7.0 (#3407)
cebtenzzre Jan 23, 2025
9809a2a
metadata: fix typo
cebtenzzre Jan 23, 2025
9641b47
chat: bump version to v3.7.1-dev0
cebtenzzre Jan 23, 2025
881ac19
Fix regression while using localdocs with server API. (#3410)
manyoso Jan 24, 2025
2230628
Server view fix (#3411)
manyoso Jan 24, 2025
45a171c
Update to Qt 6.8.1 (#3386)
cebtenzzre Jan 24, 2025
62a5623
cmake: do not modify gpt4all.app after signing it (#3413)
cebtenzzre Jan 24, 2025
722dcb0
Revert "cmake: do not modify gpt4all.app after signing it (#3413)"
cebtenzzre Jan 24, 2025
919b415
cmake: do not modify gpt4all.app after signing it (#3417)
cebtenzzre Jan 24, 2025
a71fed7
codeinterpreter: permit console.log with single string arg (#3426)
cebtenzzre Jan 27, 2025
b5670ae
[Jinja] Fix typo in Phi-3.1-mini-128k-instruct replacement template (…
ThiloteE Jan 28, 2025
e607840
ci: selective signing and automatic release builds (#3430)
cebtenzzre Jan 28, 2025
32badd2
Support DeepSeek-R1 Qwen (#3431)
cebtenzzre Jan 29, 2025
4207680
ci: verify that installers we build function and are signed (#3432)
cebtenzzre Jan 29, 2025
0db1651
Don't block the gui thread for tool calls (#3435)
manyoso Jan 29, 2025
92ada07
ci: build offline installers when pipeline is scheduled (#3436)
cebtenzzre Jan 30, 2025
44b059b
chat: bump version to 3.8.0-dev0
cebtenzzre Jan 30, 2025
0ea95f3
ci: add missing signing holds to Windows ARM builds
cebtenzzre Jan 30, 2025
4e0fda7
chat: replace Jinja2Cpp with minja (#3433)
cebtenzzre Jan 30, 2025
f8b65c5
Display DeepSeek-R1 thinking like Reasoner (#3440)
manyoso Jan 30, 2025
8e055e9
models: add DeepSeek-R1 distillations to official models list (#3437)
cebtenzzre Jan 30, 2025
f8d5224
chat: cut v3.8.0 release (#3441)
cebtenzzre Jan 30, 2025
6b76d6d
ci: remove conflicting pipeline.git.branch requirement
cebtenzzre Jan 30, 2025
436e6de
ci: fix missing job_allow_tags
cebtenzzre Jan 30, 2025
1a11860
ci: allow generate-config to run on tags
cebtenzzre Jan 30, 2025
8c65950
chat: fix emoji corruption (#3443)
cebtenzzre Jan 30, 2025
81f9624
remove ancient README
cebtenzzre Jan 31, 2025
f348050
chat: release version 3.8.0 (#3439)
cebtenzzre Jan 31, 2025
02ee873
ci: update to Qt 6.8.2 (#3442)
cebtenzzre Jan 31, 2025
568c8a1
cmake: remove reference to deleted README
cebtenzzre Jan 31, 2025
3d01855
Fix index used by LocalDocs when tool calling/thinking is active (#3451)
cebtenzzre Feb 3, 2025
7cdc0bc
minja: update submodule to fix `{#` hang (#3446)
cebtenzzre Feb 3, 2025
31753d4
chat: work around Direct3D 11 rendering artifacts on win11 arm (#3450)
cebtenzzre Feb 3, 2025
498e651
Revert "minja: update submodule to fix `{#` hang (#3446)"
cebtenzzre Feb 3, 2025
5bfc071
Update README.md
AndriyMulyar Feb 3, 2025
668214a
Fix rebase conflicts
cgivre Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Fix rebase conflicts
cgivre committed Feb 4, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit 668214aeb7146957f8cc2636e94826ecf1a96a6f
295 changes: 162 additions & 133 deletions gpt4all-bindings/python/gpt4all/gpt4all.py
Original file line number Diff line number Diff line change
@@ -4,66 +4,37 @@
from __future__ import annotations

import hashlib
import json
import os
import platform
import re
import sys
import warnings
from contextlib import contextmanager
from datetime import datetime
from pathlib import Path
from types import TracebackType
from typing import TYPE_CHECKING, Any, Iterable, Iterator, Literal, NamedTuple, NoReturn, Protocol, TypedDict, overload
from typing import TYPE_CHECKING, Any, Iterable, Literal, Protocol, overload

import jinja2
import requests
from jinja2.sandbox import ImmutableSandboxedEnvironment
from requests.exceptions import ChunkedEncodingError
from tqdm import tqdm
from urllib3.exceptions import IncompleteRead, ProtocolError

from ._pyllmodel import (CancellationError as CancellationError, EmbCancelCallbackType, EmbedResult as EmbedResult,
LLModel, ResponseCallbackType, _operator_call, empty_response_callback)
LLModel, ResponseCallbackType, empty_response_callback)

if TYPE_CHECKING:
from typing_extensions import Self, TypeAlias

if sys.platform == "darwin":
if sys.platform == 'darwin':
import fcntl

# TODO: move to config
DEFAULT_MODEL_DIRECTORY = Path.home() / ".cache" / "gpt4all"

ConfigType: TypeAlias = "dict[str, Any]"
DEFAULT_PROMPT_TEMPLATE = "### Human:\n{0}\n\n### Assistant:\n"

# Environment setup adapted from HF transformers
@_operator_call
def _jinja_env() -> ImmutableSandboxedEnvironment:
def raise_exception(message: str) -> NoReturn:
raise jinja2.exceptions.TemplateError(message)

def tojson(obj: Any, indent: int | None = None) -> str:
return json.dumps(obj, ensure_ascii=False, indent=indent)

def strftime_now(fmt: str) -> str:
return datetime.now().strftime(fmt)

env = ImmutableSandboxedEnvironment(trim_blocks=True, lstrip_blocks=True)
env.filters["tojson" ] = tojson
env.globals["raise_exception"] = raise_exception
env.globals["strftime_now" ] = strftime_now
return env


class MessageType(TypedDict):
role: str
content: str


class ChatSession(NamedTuple):
template: jinja2.Template
history: list[MessageType]
ConfigType: TypeAlias = 'dict[str, Any]'
MessageType: TypeAlias = 'dict[str, str]'


class Embed4All:
@@ -83,7 +54,7 @@ def __init__(self, model_name: str | None = None, *, n_threads: int | None = Non
kwargs: Remaining keyword arguments are passed to the `GPT4All` constructor.
"""
if model_name is None:
model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf"
model_name = 'all-MiniLM-L6-v2.gguf2.f16.gguf'
self.gpt4all = GPT4All(model_name, n_threads=n_threads, device=device, **kwargs)

def __enter__(self) -> Self:
@@ -174,18 +145,18 @@ def embed(
dimensionality = -1
else:
if dimensionality <= 0:
raise ValueError(f"Dimensionality must be None or a positive integer, got {dimensionality}")
raise ValueError(f'Dimensionality must be None or a positive integer, got {dimensionality}')
if dimensionality < self.MIN_DIMENSIONALITY:
warnings.warn(
f"Dimensionality {dimensionality} is less than the suggested minimum of {self.MIN_DIMENSIONALITY}."
" Performance may be degraded."
f'Dimensionality {dimensionality} is less than the suggested minimum of {self.MIN_DIMENSIONALITY}.'
' Performance may be degraded.'
)
try:
do_mean = {"mean": True, "truncate": False}[long_text_mode]
except KeyError:
raise ValueError(f"Long text mode must be one of 'mean' or 'truncate', got {long_text_mode!r}")
result = self.gpt4all.model.generate_embeddings(text, prefix, dimensionality, do_mean, atlas, cancel_cb)
return result if return_dict else result["embeddings"]
return result if return_dict else result['embeddings']


class GPT4All:
@@ -239,7 +210,6 @@ def __init__(
if proxies is None:
proxies = {}
self.model_type = model_type
self._chat_session: ChatSession | None = None
self._history: list[MessageType] | None = None
self._current_prompt_template: str = "{0}"
self._proxies = proxies
@@ -302,13 +272,7 @@ def device(self) -> str | None:

@property
def current_chat_session(self) -> list[MessageType] | None:
return None if self._chat_session is None else self._chat_session.history

@current_chat_session.setter
def current_chat_session(self, history: list[MessageType]) -> None:
if self._chat_session is None:
raise ValueError("current_chat_session may only be set when there is an active chat session")
self._chat_session.history[:] = history
return None if self._history is None else list(self._history)

@staticmethod
def list_models(
@@ -330,7 +294,7 @@ def list_models(

resp = requests.get("https://gpt4all.io/models/models3.json", proxies=proxies, verify=verify_ssl)
if resp.status_code != 200:
raise ValueError(f"Request failed: HTTP {resp.status_code} {resp.reason}")
raise ValueError(f'Request failed: HTTP {resp.status_code} {resp.reason}')
return resp.json()

@classmethod
@@ -366,7 +330,7 @@ def retrieve_model(
# get the config for the model
config: ConfigType = {}
if allow_download:
models = cls.list_models()
models = cls.list_models(proxies=proxies, verify_ssl=verify_ssl)
if (model := next((m for m in models if m["filename"] == model_filename), None)) is not None:
config.update(model)

@@ -437,13 +401,13 @@ def make_request(offset=None):
headers = {}
if offset:
print(f"\nDownload interrupted, resuming from byte position {offset}", file=sys.stderr)
headers["Range"] = f"bytes={offset}-" # resume incomplete response
headers['Range'] = f'bytes={offset}-' # resume incomplete response
headers["Accept-Encoding"] = "identity" # Content-Encoding changes meaning of ranges
response = requests.get(url, stream=True, headers=headers, proxies=proxies, verify=verify_ssl)
if response.status_code not in (200, 206):
raise ValueError(f"Request failed: HTTP {response.status_code} {response.reason}")
if offset and (response.status_code != 206 or str(offset) not in response.headers.get("Content-Range", "")):
raise ValueError("Connection was interrupted and server does not support range requests")
raise ValueError(f'Request failed: HTTP {response.status_code} {response.reason}')
if offset and (response.status_code != 206 or str(offset) not in response.headers.get('Content-Range', '')):
raise ValueError('Connection was interrupted and server does not support range requests')
if (enc := response.headers.get("Content-Encoding")) is not None:
raise ValueError(f"Expected identity Content-Encoding, got {enc}")
return response
@@ -542,19 +506,19 @@ def generate(

def generate(
self,
prompt : str,
prompt: str,
*,
max_tokens : int = 200,
temp : float = 0.7,
top_k : int = 40,
top_p : float = 0.4,
min_p : float = 0.0,
repeat_penalty : float = 1.18,
repeat_last_n : int = 64,
n_batch : int = 8,
n_predict : int | None = None,
streaming : bool = False,
callback : ResponseCallbackType = empty_response_callback,
max_tokens: int = 200,
temp: float = 0.7,
top_k: int = 40,
top_p: float = 0.4,
min_p: float = 0.0,
repeat_penalty: float = 1.18,
repeat_last_n: int = 64,
n_batch: int = 8,
n_predict: int | None = None,
streaming: bool = False,
callback: ResponseCallbackType = empty_response_callback,
) -> Any:
"""
Generate outputs from any GPT4All model.
@@ -579,94 +543,122 @@ def generate(

# Preparing the model request
generate_kwargs: dict[str, Any] = dict(
temp = temp,
top_k = top_k,
top_p = top_p,
min_p = min_p,
repeat_penalty = repeat_penalty,
repeat_last_n = repeat_last_n,
n_batch = n_batch,
n_predict = n_predict if n_predict is not None else max_tokens,
temp=temp,
top_k=top_k,
top_p=top_p,
min_p=min_p,
repeat_penalty=repeat_penalty,
repeat_last_n=repeat_last_n,
n_batch=n_batch,
n_predict=n_predict if n_predict is not None else max_tokens,
)

# Prepare the callback, process the model response
full_response = ""

def _callback_wrapper(token_id: int, response: str) -> bool:
nonlocal full_response
full_response += response
return callback(token_id, response)

last_msg_rendered = prompt
if self._chat_session is not None:
session = self._chat_session
def render(messages: list[MessageType]) -> str:
return session.template.render(
messages=messages,
add_generation_prompt=True,
**self.model.special_tokens_map,
if self._history is not None:
# check if there is only one message, i.e. system prompt:
reset = len(self._history) == 1
self._history.append({"role": "user", "content": prompt})

fct_func = self._format_chat_prompt_template.__func__ # type: ignore[attr-defined]
if fct_func is GPT4All._format_chat_prompt_template:
if reset:
# ingest system prompt
# use "%1%2" and not "%1" to avoid implicit whitespace
self.model.prompt_model(self._history[0]["content"], "%1%2",
empty_response_callback,
n_batch=n_batch, n_predict=0, reset_context=True, special=True)
prompt_template = self._current_prompt_template.format("%1", "%2")
else:
warnings.warn(
"_format_chat_prompt_template is deprecated. Please use a chat session with a prompt template.",
DeprecationWarning,
)
# special tokens won't be processed
prompt = self._format_chat_prompt_template(
self._history[-1:],
self._history[0]["content"] if reset else "",
)
session.history.append(MessageType(role="user", content=prompt))
prompt = render(session.history)
if len(session.history) > 1:
last_msg_rendered = render(session.history[-1:])
prompt_template = "%1"
generate_kwargs["reset_context"] = reset
else:
prompt_template = "%1"
generate_kwargs["reset_context"] = True

# Prepare the callback, process the model response
output_collector: list[MessageType]
output_collector = [
{"content": ""}
] # placeholder for the self._history if chat session is not activated

if self._history is not None:
self._history.append({"role": "assistant", "content": ""})
output_collector = self._history

def _callback_wrapper(
callback: ResponseCallbackType,
output_collector: list[MessageType],
) -> ResponseCallbackType:
def _callback(token_id: int, response: str) -> bool:
nonlocal callback, output_collector

output_collector[-1]["content"] += response

return callback(token_id, response)

# Check request length
last_msg_len = self.model.count_prompt_tokens(last_msg_rendered)
if last_msg_len > (limit := self.model.n_ctx - 4):
raise ValueError(f"Your message was too long and could not be processed ({last_msg_len} > {limit}).")
return _callback

# Send the request to the model
if streaming:
def stream() -> Iterator[str]:
yield from self.model.prompt_model_streaming(prompt, _callback_wrapper, **generate_kwargs)
if self._chat_session is not None:
self._chat_session.history.append(MessageType(role="assistant", content=full_response))
return stream()
return self.model.prompt_model_streaming(
prompt,
prompt_template,
_callback_wrapper(callback, output_collector),
**generate_kwargs,
)

self.model.prompt_model(
prompt,
prompt_template,
_callback_wrapper(callback, output_collector),
**generate_kwargs,
)

self.model.prompt_model(prompt, _callback_wrapper, **generate_kwargs)
if self._chat_session is not None:
self._chat_session.history.append(MessageType(role="assistant", content=full_response))
return full_response
return output_collector[-1]["content"]

@contextmanager
def chat_session(
self,
system_message: str | Literal[False] | None = None,
chat_template: str | None = None,
system_prompt: str | None = None,
prompt_template: str | None = None,
):
"""
Context manager to hold an inference optimized chat session with a GPT4All model.
Args:
system_message: An initial instruction for the model, None to use the model default, or False to disable. Defaults to None.
chat_template: Jinja template for the conversation, or None to use the model default. Defaults to None.
system_prompt: An initial instruction for the model.
prompt_template: Template for the prompts with {0} being replaced by the user message.
"""

if system_message is None:
system_message = self.config.get("systemMessage", False)

if chat_template is None:
if "name" not in self.config:
raise ValueError("For sideloaded models or with allow_download=False, you must specify a chat template.")
if "chatTemplate" not in self.config:
raise NotImplementedError("This model appears to have a built-in chat template, but loading it is not "
"currently implemented. Please pass a template to chat_session() directly.")
if (tmpl := self.config["chatTemplate"]) is None:
raise ValueError(f"The model {self.config['name']!r} does not support chat.")
chat_template = tmpl

history = []
if system_message is not False:
history.append(MessageType(role="system", content=system_message))
self._chat_session = ChatSession(
template=_jinja_env.from_string(chat_template),
history=history,
)
if system_prompt is None:
system_prompt = self.config.get("systemPrompt", "")

if prompt_template is None:
if (tmpl := self.config.get("promptTemplate")) is None:
warnings.warn("Use of a sideloaded model or allow_download=False without specifying a prompt template "
"is deprecated. Defaulting to Alpaca.", DeprecationWarning)
tmpl = DEFAULT_PROMPT_TEMPLATE
prompt_template = tmpl

if re.search(r"%1(?![0-9])", prompt_template):
raise ValueError("Prompt template containing a literal '%1' is not supported. For a prompt "
"placeholder, please use '{0}' instead.")

self._history = [{"role": "system", "content": system_prompt}]
self._current_prompt_template = prompt_template
try:
yield self
finally:
self._chat_session = None
self._history = None
self._current_prompt_template = "{0}"

@staticmethod
def list_gpus() -> list[str]:
@@ -678,6 +670,43 @@ def list_gpus() -> list[str]:
"""
return LLModel.list_gpus()

def _format_chat_prompt_template(
self,
messages: list[MessageType],
default_prompt_header: str = "",
default_prompt_footer: str = "",
) -> str:
"""
Helper method for building a prompt from list of messages using the self._current_prompt_template as a template for each message.
Warning:
This function was deprecated in version 2.3.0, and will be removed in a future release.
Args:
messages: List of dictionaries. Each dictionary should have a "role" key
with value of "system", "assistant", or "user" and a "content" key with a
string value. Messages are organized such that "system" messages are at top of prompt,
and "user" and "assistant" messages are displayed in order. Assistant messages get formatted as
"Response: {content}".
Returns:
Formatted prompt.
"""

full_prompt = default_prompt_header + "\n\n" if default_prompt_header != "" else ""

for message in messages:
if message["role"] == "user":
user_message = self._current_prompt_template.format(message["content"])
full_prompt += user_message
if message["role"] == "assistant":
assistant_message = message["content"] + "\n"
full_prompt += assistant_message

full_prompt += "\n\n" + default_prompt_footer if default_prompt_footer != "" else ""

return full_prompt


def append_extension_if_missing(model_name):
if not model_name.endswith((".bin", ".gguf")):
@@ -690,7 +719,7 @@ def fileno(self) -> int: ...


def _fsync(fd: int | _HasFileno) -> None:
if sys.platform == "darwin":
if sys.platform == 'darwin':
# Apple's fsync does not flush the drive write cache
try:
fcntl.fcntl(fd, fcntl.F_FULLFSYNC)