-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Fabian C. Moss edited this page Apr 20, 2018
·
54 revisions
The purpose of this wiki is to collect links to published musical corpora including some explanations. Hopefully, it is useful to some students and researchers that study music. The corpora are not listed in a praticular order, yet. Everybody is welcome to contribute!
- http://rockcorpus.midside.com
- by Trevor deClercq and David Temperley
- first published in 2011
- corpus of harmonic labels for Pop / Rock songs in standard roman numeral notation
- planned to increase to all 500 pieces of Rolling Stones collection
- http://musiccog.ohio-state.edu/home/index.php/iRb_Jazz_Corpus
- by Yuri Broze and Daniel Shanahan
- in humdrum format
- corpus of chord sequences of Jazz standards from Realbooks
- community-based data set
- http://ddmal.music.mcgill.ca/research/billboard
- by John Ashley Burgoyne, Jonathan Wild, and Ichiro Fujinaga
- An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis
- http://u.osu.edu/tavern/
- by Johanna Devaney, Claire Arthur, Nathaniel Condit-Schultz, and Kirsten Nisula
- theme and variation encodings with roman numerals
- themes and variations for piano by Mozart and Beethoven, divided into 1060 phrases
- annotated with roman numerals
- http://isophonics.net/content/reference-annotations-beatles
- by Chris Harte
- annotations of Beatles songs
- annotated features: beats, chords, keys, and form
- http://ycac.yale.edu/downloads
- by Christopher White and Ian Quinn
- poster from ISMIR 2014
- pitch-class and time data from MIDI files contributed by users of http://classicalarchives.com
- data is presented using salami slices
- https://elvisproject.ca
- part of SIMSSA, the Single Interface for Music Score Searching and Analysis project
- 2852 Pieces and 3358 Movements by 164 Composers
- symbolic data in formats such as MEI, MusicXML, MIDI, and others
- https://github.com/kroger/rameau
- by Pedro Kröger, Alexandre Passos, Marcos Sampaio, and Givaldo de Cidra
- the paper that describes the data set
- Band-in-a-Box files available at http://bhs.minor9.com/
- converted by Keunwoo Choi, George Fazekas, and Mark Sandler into one .txt-file for the research presented in this article
- chords of Jazz standards with time information in beats
- http://jazzomat.hfm-weimar.de/dbformat/dbcontent.html
- part of the Jazzomat Research Project
- time-annotated MIDI melodies from monophonic Jazz solos
- chords and transcriptions in staff notation included
- http://gttm.jp/gttm/database/
- by Masatoshi Hamanaka, Keiji Hirata, and Satoshi Tojo
- 300 8-bar phrases of monophonic melodies from western classical music
- XML format
- http://davidtemperley.com/kp-stats/
- by David Temperley
- corpus consisting of 46 chord-analyzed excerpts in the workbook accompanying the theory textbook Tonal Harmony by Stefan Kostka and Dorothy Payne
- http://jazzparser.granroth-wilding.co.uk/ParserPaper.html
- by Mark Granroth-Wilding and Mark Steedman
- http://esavelmat.jyu.fi/collection_download.html
- by Tuomas Eerola and Petri Toiviainen
- http://doc.verovio.humdrum.org/repertory/
- scores in humdrum format, directly accessible using the Verovio Humdrum viewer
- http://kern.humdrum.org/cgi-bin/browse?l=/
- A library of virtual musical scores in the Humdrum **kern data format.
- http://www.norbeck.nu/abc/
- by Henrik Norbeck, Stockholm, Sweden.
- A free online tune book of mostly Irish and Swedish traditional music
- Sheet music and lyrics for more than 2800 tunes in ABC format
- http://compmusic.upf.edu/corpora
- Carnatic, Hindustani, Turkish-Maqam, Beijing Opera, and Arab-Andalusian
- mix of audio and symbolic formats
- https://github.com/napulen/haydn_op20_harm
- 6 Classical string quartets analyzed
- 5000+ chord annotations in the **harm syntax
- annotated by Nestor Napoles and Rafael Caro
- Jesse Rodin, Craig Sapp, Clare Bokulich
- http://josquin.stanford.edu, github
- ca. 750 movements from ca. 1420 - 1520
- collected in Humdrum (on GH), available in many other formats
- CC-BY-SA 4.0 (derivates must be published under similar license)
- web interface for analytic queries
- 19.300 MIDI files in total, 17.500 in "XL Zip Archive"
- requires "academic subscription"
- website info
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
- The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.
- The Million Song Dataset is also a cluster of complementary datasets contributed by the community:
- SecondHandSongs dataset -> cover songs
- musiXmatch dataset -> lyrics
- Last.fm dataset -> song-level tags and similarity
- Taste Profile subset -> user data
- thisismyjam-to-MSD mapping -> more user data
- tagtraum genre annotations -> genre labels
- Top MAGD dataset -> more genre labels
- Link
- List of musical corpora (also audio): http://musicalmetacreation.org/links/corpora/. The individual links listed there will be also incorporated into this list in the future.
- List of data sets by David Meredith: http://www.titanmusic.com/data.php