-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintain infrequent characters' order #583
Conversation
Code Review Agent Run #adee72Actionable Suggestions - 1
Review Details
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
This pre-populates the `phrases` dictionary from the entries in `BPMFBase.txt` such that the dictionary will maintain the original insertion order (guaranteed since Python 3.7). The dictionary entries will then be overwritten with the actual data from `PhraseFreq.txt`. This fixes #582.
`bpmf_phrases` is guaranteed to have entries from `bpmf_chars`, and so we only need to validate data when reading from `PhraseFreq.txt`.
499b363
to
d776acb
Compare
Code Review Agent Run #b72fb0Actionable Suggestions - 0Review Details
|
Also removes a section of code that is now redundant, since the `phrases` dictionary is now pre-populated with all entries in `bpmf_chars`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
for key in bpmf_chars: | ||
phrases[key] = UNK_LOG_FREQ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukhnos sorry for being late to the party. +CC @mjhsieh
I think this PR nicely demonstrated the limitation of non-AST LLM-based code review.
and faster. I will make a new PR to emphasize improvements that should be there in my opinions.
For example, this initialization has a famous Pythonic pattern that is slightly safer
for key in bpmf_chars: | |
phrases[key] = UNK_LOG_FREQ | |
phrases = {key: UNK_LOG_FREQ for key in bpmf_chars} |
User description
This pre-populates the
phrases
dictionary from the entries inBPMFBase.txt
such that the dictionary will maintain the original insertion order (guaranteed since Python 3.7). The dictionary entries will then be overwritten with the actual data fromPhraseFreq.txt
.This fixes #582.
How Tested
A diff of the resulting
data.txt
shows that only infrequently used characters are affected: https://gist.github.com/lukhnos/a93d72da8c77671b2b978f8ac4a3da42PR Type
Bug fix, Enhancement
Description
Pre-populates
phrases
dictionary withBPMFBase.txt
entries to maintain order.Ensures infrequent characters' order matches traditional phonetic order.
Fixes key mismatch errors during dictionary population.
Updates
cook.py
to handlephrases
dictionary more robustly.Changes walkthrough 📝
cook.py
Pre-populate and maintain order in `phrases` dictionary
Source/Data/bin/cook.py
phrases
dictionary withBPMFBase.txt
entries.characters.
PhraseFreq.txt
data.Summary by Bito
This pull request fixes a bug in the McBopomofo input method by ensuring the correct ordering of infrequent characters. It modifies the `cook.py` script to pre-populate the `phrases` dictionary from `BPMFBase.txt` and resolves key mismatch errors, enhancing the script's robustness. Additionally, it preserves the traditional phonetic order for rarely used characters, utilizing Python 3.7+'s dictionary insertion order guarantee.Unit tests added: False
Estimated effort to review (1-5, lower is better): 2