-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to make the model load only once? #12
Comments
Hi @lattemj # We use --files_extension txt to translate only files with this extension.
# Use empty string to translate all files in the directory
python3 translate.py \
--sentences_dir sample_text/ \
--output_path sample_text/translations \
--files_extension txt \
--source_lang en \
--target_lang es \
--model_name facebook/m2m100_1.2B Is this what you are trying to do? |
Any update on this? He is asking to keep the model loaded in memory so for every inference the model does not have to be loaded again as it time consuming. |
@twicer-is-coder the only solution is to either put all your data in a single or multiple files and do a single call to the code. If you want to run the code as an API, you can use libraries that have been built for that purpose, such as VLLM https://github.com/vllm-project/vllm or TGI https://huggingface.co/docs/text-generation-inference/index |
Can the model be loaded only once instead of waiting for the load to complete each time?
The text was updated successfully, but these errors were encountered: