This is a project repo for Valohai blog post using Huggingface transformers.
There are the python scripts
- fecth_data.py #where you fetch the data
- pre_process.py # for filtering out stuff
- prepare_text.py # splitting the data and further preparing
- fine_tune.py # using transformers for tokenizing and Distilbert model fine-tuning
You can create the valohai.yaml file with valohai-utils sdk. You need to define each step like this vh yaml step fecth_data.py
and then after the pipeline is created with vh yaml pipeline create_pipeline.py
.
Requirements are naturally described in requirements.txt file.'
Disclaimer: this is a repo made by DS used to POCs and yes there are prob way better & more efficient ways of using python. Feel free to make a pull request!