Skip to content

Commit

Permalink
added language option and new masks to wordcloud script
Browse files Browse the repository at this point in the history
  • Loading branch information
hennyu committed Sep 28, 2021
1 parent 9aee722 commit a5e8ac7
Show file tree
Hide file tree
Showing 18 changed files with 4,171 additions and 7,835 deletions.
8 changes: 4 additions & 4 deletions wordclouds/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,21 @@ The script takes TEI files of RIDE reviews as input and generates a word cloud f
* TEI files of the reviews

### Decisions to make (parameters for the script):
* the script needs 4 positional parameters (they have to be indicated in that order when calling the script from the command line). These are:
* the script needs 3 positional parameters (they have to be indicated in that order when calling the script from the command line). These are:
* (1) which "mask" to use for the form of the word clouds. Masks are png images with a white background and e.g. a black foreground. The cloud will only be placed inside of the foregrounded form. You can use the default mask file "cloud_mask.png". Mask files are stored in the subfolder "masks". The size of the clouds is determined by the size of the mask image. The script needs the name of the mask file as a parameter.
* (2) which font to use for the words in the clouds. You can use the default font file "MKorsair.ttf" which you find in the subfolder "fonts" or you can place your own font file there. The script needs the name of the font file as a parameter.
* (3) which stopword list to use. Two lists are prepared: "stopwords_en.txt" and "stopwords_de.txt". Which stopword list you use, depends on the language of the reviews. For each call of the script you can only use one stopword list. The prepared lists are in the subfolder "stopwords". You are free to edit them and remove or add words. Or you can add a new file for a new language. The script needs the name of the stopword list file as a parameter.
* (4) which colormap to use. Matplotlib colormaps are supported. See for example: https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html Choose the name of the colormap and use it as the fourth parameter for the script. E.g. "summer".
* (3) which colormap to use. Matplotlib colormaps are supported. See for example: https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html Choose the name of the colormap and use it as the fourth parameter for the script. E.g. "summer".
* more parameters could be changed directly in the python script, if needed.

### Preparation of data:
* put the TEI files into the subfolder "tei"
* the script uses predefined stopword lists, which are stored in the "stopwords" folder. Currently, two lists are prepared for English and German. The script automatically uses the list which matches the main language of the review. You can adapt the stopword lists by adding or removing words or add new lists for other languages, if needed.
* observe that the script generating the word clouds cannot recognize the language of the reviews, so you will have to generate clouds for the reviews in each language separately so the right stopwords can be used

### How to call the script:
* open a command line
* navigate to this repository (/Git/ride-scripts/wordclouds)
* type: python3 wordclouds.py "cloud_mask.png" "MKorsair.ttf", "stopwords_en.txt", "summer" (replace the parameter values with your own choices)
* type: python3 wordclouds.py "cloud_mask.png" "MKorsair.ttf" "summer" (replace the parameter values with your own choices)

### See the results:
* wait for the script to finish
Expand Down
Binary file added wordclouds/masks/mask_eichhoernchen.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added wordclouds/masks/mask_eule.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,585 changes: 0 additions & 1,585 deletions wordclouds/tei/de/koeppen-jugend-tei.xml

This file was deleted.

2,195 changes: 2,195 additions & 0 deletions wordclouds/tei/done/corema-tei.xml

Large diffs are not rendered by default.

1,762 changes: 0 additions & 1,762 deletions wordclouds/tei/en/ehd-tei.xml

This file was deleted.

1,478 changes: 0 additions & 1,478 deletions wordclouds/tei/en/ldm-tei.xml

This file was deleted.

Loading

0 comments on commit a5e8ac7

Please sign in to comment.