simCataloguer is a workflow for generating machine-authored text and images from catalogue data. It was created in 2023 as part of the Arts and Humanities Research Council (AHRC) funded project 'Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship.'
This process assumes assumes familiarity with Windows Powershell.
Please note that you may require a large amount of VRAM to use this library. Data we have gathered so far: 8Gb is not enough but 24Gb is sufficient.
git clone --recursive git@github.com:dreamingspires/simCataloguer.git
You may need to install zlib, and graphics driver
- get NVIDIA GPU Computing Toolkit, CUDA 11.7.0
- get cuDNN.
- install cuDNN.
- install python 3.9.
- install pip (may not be required, check with pip --version)
- install Poetry.
- add poetry to path using on screen instructions.
- reboot shell.
- check with poetry --version.
- run
poetry config virtualenvs.in-project true
. poetry install
(You may need to run this more than once, it fails to parallel process on occasion)poetry run pip install --extra-index-url https://download.pytorch.org/whl/cu117 --no-deps --force-reinstall torch torchvision
- Go into your
examples
directory withinsimCataloguer
and runpoetry run python test_writer.py
'
- You can adjust
num_cuts
andquality
settings inexamples/test_writer
depending on the GPU size of your local machine. See the Pixray documentation for details. - When you first run
poetry run python test_writer.py
a language model will be built based on the input file inexamples/test_writer
. If this process fails, you may need to download a pre-trained model. ForMDG.txt
a pre-trained model is available to download here. This should be unzipped and placed atsimCataloguer/checkpoint
. - When you first run
poetry run python test_writer.py
, four large files will be downloaded tosimCataloguer/.venv/Lib/site-packages/pixray_module/models
. If the downloads fail for any reason, then rerunningpoetry run python test_writer.py
will fail. If this happens four files -yfcc_2.pth
,ViB-32.pt
,ViT-B-16.pt
, andRN50,pt
can be downloaded seperately and manually placed in this directory. Asyfcc_2.pth
is the largest and most likely to fail, we include a direct download link here.
BM-MDG.zip is derived from a dataset published by the British Museum, data and derived data are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
For more info on this dataset see Baker, James, & Salway, Andrew. (2019, June 13). Creation of the BMSatire Descriptions corpus (Version v1.0). Zenodo. doi: 10.5281/zenodo.3245037.
simCataloguer was developed by Dreaming Spires Software Development Ltd in 2023.
For more details of the type of projects we develop, please contact contact@dreamingspires.dev or visit dreamingspires.dev.