In this session we discussed about optimal ways for representing and analyzing crowdsourcing results by applying the CrowdTruth metrics. We have prepared of collection of Jupyter Notebooks (also available as Colab notebooks that can be run from a Google Drive account) that illustrate how to run the metrics on the tasks discussed in Session 2:
Closed Tasks: the crowd picks from a set of annotations that is known beforehand
- Binary Choice: the crowd picks 1 annotation out of 2 choices (e.g.
True
andFalse
)- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction in sentences: task template | Jupyter notebook | Colab notebook
- Ternary Choice: the crowd picks 1 annotation out of 3 choices, (e.g.
True
,False
andNone/Other
)- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are the same for every input unit
- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction in sentences: task template | Jupyter notebook | Colab notebook
- Sparse Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are different across input units
- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction in sentences: task template | Jupyter notebook | Colab notebook
- Event extraction in sentences: Jupyter notebook | Colab notebook
Open-Ended Tasks: the crowd dynamically creates the list of annotations, or the set of annotations is too big to compute beforehand
- Sparse Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are different across input units
- Event extraction in sentences: Jupyter notebook | Colab notebook
- Open-ended extraction tasks: the crowd creates different combinations of annotations based on the input unit
- Person identification by highlighting words in text: task template | Jupyter notebook | Colab notebook
- Event extraction by highlighting words in text: Jupyter notebook
- Free Choice: the crowd inputs all possible annotations for an input unit
- Person identification in videos: task template | Jupyter notebook | Colab notebook
-
Install the CrowdTruth package & follow the How to run guide in order to get started.
-
Explore (some of) the notebooks above that implement CrowdTruth for different annotation tasks.
-
Compare the results of the CrowdTruth metrics when the same tasks is processed with a closed vs. open-ended annotation vector, by referring to the trade-off between the degree of expressivity in crowd annotations and potential for ambiguity and disagreement. The following notebook can be used as an example:
- Event extraction from sentences (sparse multiple choice):
- Dimensionality reduction techniques are useful to reduce some of the noise in crowd annotations, particularly for open-ended tasks as they produce very diverse labels. These techniques can be applied to both input units and annotations. Compare the results of the CrowdTruth metrics for an annotation task before & after dimensionality reduction in the following crowd tasks:
- Person identification by highlighting words in text (open-ended extraction task):
- Event extraction from sentences (sparse multiple choice):
- Event extraction by highlighting words in text (open-ended extraction task):
- Person identification in videos (free input task):
- Implement the annotation vector you designed in Session 2 as a CrowdTruth pre-processing configuration.