Skip to content

Jyut Dictionary Database Schema

Aaron Tan edited this page Sep 8, 2023 · 1 revision

The Jyut Dictionary database, currently on version 3, consists of seven (7) tables.

Code reference

If you prefer code, see for the database creation script.



The entries table consists of seven (7) columns. They are: entry_id, traditional, simplified, pinyin, jyutping, and frequency. The set of (traditional, simplified, pinyin, jyutping) is enforced unique by the schema.

  • The entry_id for the set of (traditional, simplified, pinyin, jyutping) is not constant between different database files! I made this decision to allow arbitrary entry additions from a variety of sources without needing a centralized index of pre-existing entries in all the databases.


The sources table consists of seven (7) columns. They are: source_id, sourcename, sourceshortname, version, description, legal, link, update_url, other. The sourcename must be unique for each row.

  • Like entry_id, source_id to source mapping is not consistent between different database files.
  • The link column should contain a link to the original location where the source can be found.
  • The update_url is currently unused. I added it originally intending for Jyut Dictionary to discover updates for dictionaries that were already downloaded, but have not (yet) built this feature.
  • The other column contains a comma-separated list of ["words", "sentences"]. This indicates to Jyut Dictionary whether to copy only rows from the entries and definitions tables ("words"), or to also copy the chinese_sentences, definitions_chinese_sentence_links, nonchinese_sentences, and sentences tables ("sentences").


The definitions table contains five (5) columns. They are: definition_id, definition, label, fk_entry_id, and fk_source_id. The set of (definition, label, fk_entry_id, fk_source_id) is enforced unique by the schema.

  • Like entry_id, definition_id <-> definition mapping is not constant between database files or database versions.
  • The label contains any label that should be displayed with a definition (generally a part-of-speech/POS indicator).
  • fk_entry_id references the entry that this definition is for.
  • fk_source_id references the source that provides this definition. Notice that definitions contain a source_id, but not entries! I made this decision because multiple sources may provide definitions for one entry, so an entry doesn't belong to a single source. There would be no point for an entry to be linked to any particular source.


TODO: Fill this out


TODO: Fill this out


TODO: Fill this out


TODO: Fill this out