-
-
Notifications
You must be signed in to change notification settings - Fork 9
Jyut Dictionary Database Schema
The Jyut Dictionary database, currently on version 3, consists of seven (7) tables.
If you prefer code, see https://github.com/aaronhktan/jyut-dict/blob/main/src/dictionaries/database/database.py for the database creation script.
The entries table consists of seven (7) columns. They are: entry_id
, traditional
, simplified
, pinyin
, jyutping
, and frequency
. The set of (traditional, simplified, pinyin, jyutping) is enforced unique by the schema.
- The entry_id for the set of (traditional, simplified, pinyin, jyutping) is not constant between different database files! I made this decision to allow arbitrary entry additions from a variety of sources without needing a centralized index of pre-existing entries in all the databases.
The sources table consists of seven (7) columns. They are: source_id
, sourcename
, sourceshortname
, version
, description
, legal
, link
, update_url
, other
. The sourcename must be unique for each row.
- Like
entry_id
,source_id
to source mapping is not consistent between different database files. - The
link
column should contain a link to the original location where the source can be found. - The
update_url
is currently unused. I added it originally intending for Jyut Dictionary to discover updates for dictionaries that were already downloaded, but have not (yet) built this feature. - The
other
column contains a comma-separated list of ["words", "sentences"]. This indicates to Jyut Dictionary whether to copy only rows from theentries
anddefinitions
tables ("words"), or to also copy thechinese_sentences
,definitions_chinese_sentence_links
,nonchinese_sentences
, andsentences
tables ("sentences").
The definitions table contains five (5) columns. They are: definition_id
, definition
, label
, fk_entry_id
, and fk_source_id
. The set of (definition, label, fk_entry_id, fk_source_id) is enforced unique by the schema.
- Like
entry_id
,definition_id
<-> definition mapping is not constant between database files or database versions. - The
label
contains any label that should be displayed with a definition (generally a part-of-speech/POS indicator). -
fk_entry_id
references the entry that this definition is for. -
fk_source_id
references the source that provides this definition. Notice that definitions contain a source_id, but not entries! I made this decision because multiple sources may provide definitions for one entry, so an entry doesn't belong to a single source. There would be no point for an entry to be linked to any particular source.
TODO: Fill this out
TODO: Fill this out
TODO: Fill this out
TODO: Fill this out