Arcangelo Corelli - Trio Sonatas (A corpus of annotated scores) (v2.0)
This corpus of annotated MuseScore files has been created within
the DCML corpus initiative and employs
the DCML harmony annotation standard. It was relased together with and as part
of the "workflow paper"
Hentschel, J., Moss, F. C., Neuwirth, M., & Rohrmeier, M. A. (2021). A semi-automated workflow paradigm for the distributed creation and curation of expert annotations. Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, 262–269. https://doi.org/10.5281/ZENODO.5624417
The corpus comprises 36 Sonate a tre
, divided into 149 separate movements. Together they make up for
three of the four famous cycles of 12 trio sonatas each:
Opus | Cycle | Publication | Included |
---|---|---|---|
1 | 12 sonate da chiesa | Rome 1681 | Yes |
2 | 12 sonate da camera | Rome 1685 | No |
3 | 12 sonate da chiesa | Rome 1689 | Yes |
4 | 12 sonate da camera | Rome 1694 | Yes |
Versions
Version 2.0
- TSV files now come with the column
quarterbeats
, which measures in quarter notes each event's position as its
distance from the beginning - Extracted notes now come with the columns
name
andoctave
. - Column
volta
(containing first and second endings) removed from pieces that don't have any. metadata.tsv
has been enriched with further columns, in particular information about each movement's dimensions,
including dimensions upon unfolding repeats (for instance,last_mn
has the number of
measures,last_mn_unfolded
the number of measures when playing all repeats)- The folder
reviewed
contains two files per movement:- A copy of the score where all out-of-label notes have been colored in red;
are shown in these files in a diff-like manner (removed in red, added in green). - A copy of the harmonies TSV with six added columns that reflect the coloring of out-of-label notes ("coloring
reports")
- A copy of the score where all out-of-label notes have been colored in red;
- As long as the
ms3 review
has any complaints, it stores them in the filewarnings.log
. Currently, it is
showing
those labels where over 60% of the notes in the segment have been colored in red and probably need revisiting (
Pull Requests welcome) - TSV files are automatically kept up to date using the new GitHub action
dcml_corpus_workflow which is the successor of the implementation
used in the creation of this dataset.
Version 1.1
This release marks the moment where all 149 movements include a reviewed set of annotations that adhere to version
2.3.0 of the DCML harmony annotation standard. The metadata have not been
completed yet and the data were extracted one last time with the now deprecated version 0.4.11 of the
MuseScore parser ms3 for matters of completeness and homogeneity. The purpose is
mainly to substantiate the claim that the "semi-annotated workflow paradigm", as it had been implemented at publication
time (see the ISMIR paper cited above), can indeed be put to effective use in the creation of a large dataset. This
version is, however, to be followed by a version with upgraded tabular data based on the more mature ms3 > 1.0.0.
Version 1.0
The first release reflects the state of the dataset when finalizing chapter 4 of the workflow paper cited above.