(IM1102 - 232433M, Open Universiteit, Netherlands)
Code for generating text structured by 3D bullets with a background image. The created text images are used to generate datasets for training and testing a Generative Adversarial Network (GAN) to reconstruct the text.
Authors:
- Dietmar Serbée,
- Bob Cruijsberg,
- Johan van Nispen
Please read the report prior to using this code.
NOTE:
The python script 3DSentences.py
is developed to generate data the use with pix2pix presented
by Junjanz et al. in
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix repository. Installation is necessary to run the script
presented here. Please read
the README.md file in the repository
for detailed instructions of installation.
The text is produced from a predefined (hard coded) list of sentences. These sentences are utilized to create images, with the text appearing as "bullets" in the visuals. The bullets are either:
- 2D binary black (0) / white (255),
- 3D gray scale (0-255) and have a 3D effect by adding a spherical intensity shape to the bullet.
NOTE: In the text below, the argument options for the script are enclosed in brackets.
Initially, a selected font (--font
) is used to generate the characters in a window of 256x1280 pixels (height,width).
This
text is then skeletonized to outline its basic structure. A sampling process follows, where a list of random points is
created from the list of skeleton points, adhering to a maximum of 1/d, where 'd' represents the density
parameter (--density
). For each point in this sample,a bullet is created at a random position around the point. The
position is determined by a specified jitter (--jitter
) distance, and each bullet is assigned a random radius that
falls between a minimum (--min_radius
) and maximum (--max_radius
) radius limit.
The images are saved in the datasets folder where either subfolders train and test are created or trainA,
trainB, testA and testB, depending on the --alignment
option. The amount of images can be specified by the
--iterations
flag, e.g. {'train': 100, 'test': 10}.
A background can be added to the images (--add_background
), where the images are taken from the background folder
(--bg_folder
). Words are allways projected in the color version of the background. The option to use monochrome
images (--monochrome
) convert the RGB image to a gray scale image.
Finally, a transformation can be added to the text, called dancing effect (--dancing
). This effect is a random
rotation and scaling of the
characters of the text. The full sentence then follows a random part of a sinus cycle.
- Python 3.10
- packages as described in requirements.txt
- Installation of pytorch-CycleGAN-and-pix2pix
The folder structure follows from pix2pix as much as possible:
./ python scripts developed for the project ./datasets datasets for training and testing ./checkpoints trained models ./results generated images when testing the model ./scripts shell scripts to generate the images` ./backgrounds/marbled background images ./pytorch-CycleGAN-and-pix2pix pix2pix repository
The background images used here are the 'marbled' structures and can be downloaded from https://www.robots.ox.ac.uk/~vgg/data/dtd/ and are part of the Describable Textures Dataset (DTD). For convenience the marbled images used here can be downloaded here.
The scripts below have been created to produce the images:
- 3DSentences.py,
- bullettoolset.py
- visiontoolset.py
- construct_options.py
3DSentences.py is the main file to run the script. If you intend to use other words and sentence please change the
lists train_sentences
and test_sentences
to your needs.
bullettoolset.py contains the functions to generate the images.
visiontoolset.py contains the image processing functions.
construct_options.py contains the options for the script as describe above and in the script.
Below are the scripts to reproduce the test as described in the paper.
Alternatively use the pretrained models and datasets, see
section Download the datasets and models.
Run the scripts from the root of the project in
the order as described below. To see the progress during training first
run: python -m visdom.server and go to http://localhost:8097/
./scripts/3DSentence_color_create_dataset.sh
./scripts/3DSentence_color_train.sh
./scripts/3DSentence_color_test.sh
./scripts/3DSentence_create_dataset.sh
./scripts/3DSentence_train.sh
./scripts/3DSentence_test.sh
./scripts/GAN_architecture_create_dataset.sh
./scripts/GAN_architecture_train.sh
./scripts/GAN_architecture_test.sh
./scripts/density_create_dataset.sh
./scripts/density_test.sh
./scripts/transformed_create_dataset.sh
./scripts/transformed_test.sh
NOTE: Generation of single characters is described in the code of 3DSentences.py and can be modified there.
The datasets and models used in the paper can be downloaded following this link.
- train: 3DSentences_24lines_10iterations_256x1280
- model: 3DSentences_24lines_10iterations_256x1280_pix2pix
- test: 3DSentences_24lines_10iterations_256x1280
- train: 3DSentences_24lines_10iterations_256x1280_mono
- model: 3DSentences_pix2pix
- test: 3DSentences_24lines_10iterations_256x1280_mono
-
model:
-
test: 3DSentences_metrics_test
- train: none
- model: 3DSentences_mono_resnet_9block_Lambda100_pix2pix
- test: 3DSentences_dens_x_y
- x = [5, 6, 7, 8, 9]
- y = ['', 'dance']
- train: none
- model: 3DSentences_pix2pix
- test: 3DSentences_dancing