Skip to content

Commit

Permalink
Merge pull request #399 from zonble/master
Browse files Browse the repository at this point in the history
Introduces input macros
  • Loading branch information
lukhnos authored Dec 11, 2023
2 parents 9e47fc3 + 7e19809 commit 2ca3ab7
Show file tree
Hide file tree
Showing 9 changed files with 404 additions and 5 deletions.
4 changes: 4 additions & 0 deletions McBopomofo.xcodeproj/project.pbxproj
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
D427F7B4279086DC004A2160 /* InputSourceHelper in Frameworks */ = {isa = PBXBuildFile; productRef = D427F7B3279086DC004A2160 /* InputSourceHelper */; };
D427F7B6279086F6004A2160 /* InputSourceHelper in Frameworks */ = {isa = PBXBuildFile; productRef = D427F7B5279086F6004A2160 /* InputSourceHelper */; };
D427F7C127908EFC004A2160 /* OpenCCBridge in Frameworks */ = {isa = PBXBuildFile; productRef = D427F7C027908EFC004A2160 /* OpenCCBridge */; };
D43FC40B2B23788400ED5A1C /* InputMacro.swift in Sources */ = {isa = PBXBuildFile; fileRef = D43FC40A2B23788400ED5A1C /* InputMacro.swift */; };
D44FB74527915565003C80A6 /* Preferences.swift in Sources */ = {isa = PBXBuildFile; fileRef = D44FB74427915555003C80A6 /* Preferences.swift */; };
D44FB74A2791B829003C80A6 /* VXHanConvert in Frameworks */ = {isa = PBXBuildFile; productRef = D44FB7492791B829003C80A6 /* VXHanConvert */; };
D44FB74D2792189A003C80A6 /* PhraseReplacementMap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = D44FB74B2792189A003C80A6 /* PhraseReplacementMap.cpp */; };
Expand Down Expand Up @@ -167,6 +168,7 @@
D427F7AC27907B7E004A2160 /* NotifierUI */ = {isa = PBXFileReference; lastKnownFileType = wrapper; name = NotifierUI; path = Packages/NotifierUI; sourceTree = "<group>"; };
D427F7B2279086B5004A2160 /* InputSourceHelper */ = {isa = PBXFileReference; lastKnownFileType = wrapper; name = InputSourceHelper; path = Packages/InputSourceHelper; sourceTree = "<group>"; };
D427F7BF27908EAC004A2160 /* OpenCCBridge */ = {isa = PBXFileReference; lastKnownFileType = wrapper; name = OpenCCBridge; path = Packages/OpenCCBridge; sourceTree = "<group>"; };
D43FC40A2B23788400ED5A1C /* InputMacro.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = InputMacro.swift; sourceTree = "<group>"; };
D44FB74427915555003C80A6 /* Preferences.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = Preferences.swift; sourceTree = "<group>"; };
D44FB7482791B346003C80A6 /* VXHanConvert */ = {isa = PBXFileReference; lastKnownFileType = wrapper; name = VXHanConvert; path = Packages/VXHanConvert; sourceTree = "<group>"; };
D44FB74B2792189A003C80A6 /* PhraseReplacementMap.cpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.cpp; path = PhraseReplacementMap.cpp; sourceTree = "<group>"; };
Expand Down Expand Up @@ -278,6 +280,7 @@
D4E569DB27A34CC100AC2CEF /* KeyHandler.mm */,
6AE215102A2849BB005A6A02 /* UTF8Helper.cpp */,
6AE2150F2A2849BB005A6A02 /* UTF8Helper.h */,
D43FC40A2B23788400ED5A1C /* InputMacro.swift */,
D456576D279E4F7B00DF6BC9 /* KeyHandlerInput.swift */,
D461B791279DAC010070E734 /* InputState.swift */,
D427F76B278CA1BA004A2160 /* AppDelegate.swift */,
Expand Down Expand Up @@ -671,6 +674,7 @@
6A0D4F4515FC0EB100ABF4B3 /* Mandarin.cpp in Sources */,
6ACC3D452793701600F1B140 /* ParselessLM.cpp in Sources */,
D41355DE278EA3ED005E5CBD /* UserPhrasesLM.cpp in Sources */,
D43FC40B2B23788400ED5A1C /* InputMacro.swift in Sources */,
6AE215112A2849BB005A6A02 /* UTF8Helper.cpp in Sources */,
6ACC3D3F27914F2400F1B140 /* KeyValueBlobReader.cpp in Sources */,
B058C5272AC9DF51002EDD66 /* ServiceProvider.swift in Sources */,
Expand Down
39 changes: 39 additions & 0 deletions Source/Data/Macros.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
今天日期 ㄐㄧㄣ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_TODAY_SHORT ㄐㄧㄣ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_TODAY_MEDIUM ㄐㄧㄣ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_TODAY_MEDIUM_ROC ㄐㄧㄣ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_TODAY_MEDIUM_CHINESE ㄐㄧㄣ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
昨天日期 ㄗㄨㄛˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_YESTERDAY_SHORT ㄗㄨㄛˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_YESTERDAY_MEDIUM ㄗㄨㄛˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_YESTERDAY_MEDIUM_ROC ㄗㄨㄛˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_YESTERDAY_MEDIUM_CHINESE ㄗㄨㄛˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
明天日期 ㄇㄧㄥˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_TOMORROW_SHORT ㄇㄧㄥˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_TOMORROW_MEDIUM ㄇㄧㄥˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_TOMORROW_MEDIUM_ROC ㄇㄧㄥˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
MACRO@DATE_TOMORROW_MEDIUM_CHINESE ㄇㄧㄥˊ-ㄊㄧㄢ-ㄖˋ-ㄑㄧˊ -8
現在時刻 ㄒㄧㄢˋ-ㄗㄞˋ-ㄕˊ-ㄎㄜˋ -8
MACRO@TIME_NOW_SHORT ㄒㄧㄢˋ-ㄗㄞˋ-ㄕˊ-ㄎㄜˋ -8
MACRO@TIME_NOW_MEDIUM ㄒㄧㄢˋ-ㄗㄞˋ-ㄕˊ-ㄎㄜˋ -8
現在時間 ㄒㄧㄢˋ-ㄗㄞˋ-ㄕˊ-ㄐㄧㄢ -8
MACRO@TIME_NOW_SHORT ㄒㄧㄢˋ-ㄗㄞˋ-ㄕˊ-ㄐㄧㄢ -8
MACRO@TIME_NOW_MEDIUM ㄒㄧㄢˋ-ㄗㄞˋ-ㄕˊ-ㄐㄧㄢ -8
目前時區 ㄇㄨˋ-ㄑㄧㄢˊ-ㄕˊ-ㄑㄩ -8
MACRO@TIMEZONE_STANDARD ㄇㄨˋ-ㄑㄧㄢˊ-ㄕˊ-ㄑㄩ -8
MACRO@TIMEZONE_GENERIC_SHORT ㄇㄨˋ-ㄑㄧㄢˊ-ㄕˊ-ㄑㄩ -8
所在時區 ㄙㄨㄛˇ-ㄗㄞˋ-ㄕˊ-ㄑㄩ -8
MACRO@TIMEZONE_STANDARD ㄙㄨㄛˇ-ㄗㄞˋ-ㄕˊ-ㄑㄩ -8
MACRO@TIMEZONE_GENERIC_SHORT ㄙㄨㄛˇ-ㄗㄞˋ-ㄕˊ-ㄑㄩ -8
今年干支 ㄐㄧㄣ-ㄋㄧㄢˊ-ㄍㄢ-ㄓ -8
MACRO@THIS_YEAR_GANZHI ㄐㄧㄣ-ㄋㄧㄢˊ-ㄍㄢ-ㄓ -8
去年干支 ㄑㄩˋ-ㄋㄧㄢˊ-ㄍㄢ-ㄓ -8
MACRO@LAST_YEAR_GANZHI ㄑㄩˋ-ㄋㄧㄢˊ-ㄍㄢ-ㄓ -8
明年干支 ㄇㄧㄥˊ-ㄋㄧㄢˊ-ㄍㄢ-ㄓ -8
MACRO@NEXT_YEAR_GANZHI ㄇㄧㄥˊ-ㄋㄧㄢˊ-ㄍㄢ-ㄓ -8
今年生肖 ㄐㄧㄣ-ㄋㄧㄢˊ-ㄕㄥ-ㄒㄧㄠˋ -8
MACRO@THIS_YEAR_CHINESE_ZODIAC ㄐㄧㄣ-ㄋㄧㄢˊ-ㄕㄥ-ㄒㄧㄠˋ -8
去年生肖 ㄑㄩˋ-ㄋㄧㄢˊ-ㄕㄥ-ㄒㄧㄠˋ -8
MACRO@LAST_YEAR_CHINESE_ZODIAC ㄑㄩˋ-ㄋㄧㄢˊ-ㄕㄥ-ㄒㄧㄠˋ -8
明年生肖 ㄇㄧㄥˊ-ㄋㄧㄢˊ-ㄕㄥ-ㄒㄧㄠˋ -8
MACRO@NEXT_YEAR_CHINESE_ZODIAC ㄇㄧㄥˊ-ㄋㄧㄢˊ-ㄕㄥ-ㄒㄧㄠˋ -8
6 changes: 3 additions & 3 deletions Source/Data/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ data-plain-bpmf.txt: bin/cook-plain-bpmf.py BPMFBase.txt BPMFPunctuations.txt
bin/cook-plain-bpmf.py BPMFBase.txt BPMFPunctuations.txt data-plain-bpmf.txt

data.txt: bin/cook.py BPMFBase.txt BPMFMappings.txt BPMFPunctuations.txt \
PhraseFreq.txt phrase.occ Symbols.txt \
heterophony1.list heterophony2.list heterophony3.list
bin/cook.py PhraseFreq.txt BPMFMappings.txt BPMFBase.txt BPMFPunctuations.txt Symbols.txt $@
PhraseFreq.txt phrase.occ Symbols.txt Macros.txt\
heterophony1.list heterophony2.list heterophony3.list
bin/cook.py PhraseFreq.txt BPMFMappings.txt BPMFBase.txt BPMFPunctuations.txt Symbols.txt Macros.txt $@

PhraseFreq.txt: bin/buildFreq.py phrase.occ exclusion.txt
bin/buildFreq.py
Expand Down
2 changes: 1 addition & 1 deletion Source/Data/Symbols.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
☢ ㄈㄨˊ-ㄕㄜˋ-ㄒㄧㄥˋ -8
≥ ㄉㄚˋ-ㄩˊ-ㄉㄥˇ-ㄩˊ -8
🉐 ㄉㄜˊ -8
▼ ㄉㄠˋ-ㄙㄢ-ㄐㄧㄠˇ -8
℡ ㄉㄧㄢˋ-ㄏㄨㄚˋ -8
▼ ㄉㄠˋ-ㄙㄢ-ㄐㄧㄠˇ -8
㏒ ㄉㄨㄟˋ-ㄕㄨˋ -8
♏ ㄊㄧㄢ-ㄒㄧㄝ -8
♎ ㄊㄧㄢ-ㄔㄥˋ -8
Expand Down
6 changes: 6 additions & 0 deletions Source/Data/bin/cook.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,12 @@
assert len(row) == 3, row
output.append(tuple(row))

with open(sys.argv[6]) as macro_file:
for line in macro_file:
row = line.rstrip().split(" ")
assert len(row) == 3, row
output.append(tuple(row))

output = convert_vks_rows_to_sorted_kvs_rows(output)
with open(sys.argv[-1], "w") as fout:
fout.write(HEADER)
Expand Down
11 changes: 10 additions & 1 deletion Source/Engine/McBopomofoLM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,10 @@ void McBopomofoLM::setExternalConverter(std::function<std::string(std::string)>
m_externalConverter = externalConverter;
}

void McBopomofoLM::setMacroConverter(std::function<std::string(std::string)> macroConverter) {
m_macroConverter = macroConverter;
}

std::vector<Formosa::Gramambular2::LanguageModel::Unigram> McBopomofoLM::filterAndTransformUnigrams(const std::vector<Formosa::Gramambular2::LanguageModel::Unigram> unigrams, const std::unordered_set<std::string>& excludedValues, std::unordered_set<std::string>& insertedValues)
{
std::vector<Formosa::Gramambular2::LanguageModel::Unigram> results;
Expand All @@ -189,17 +193,22 @@ std::vector<Formosa::Gramambular2::LanguageModel::Unigram> McBopomofoLM::filterA
}

std::string value = originalValue;

if (m_phraseReplacementEnabled) {
std::string replacement = m_phraseReplacement.valueForKey(value);
if (!replacement.empty()) {
value = replacement;
}
}
if (m_macroConverter) {
std::string replacement = m_macroConverter(value);
value = replacement;
}
if (m_externalConverterEnabled && m_externalConverter) {
std::string replacement = m_externalConverter(value);
value = replacement;
}
if (insertedValues.find(value) == insertedValues.end()) {
if (!value.empty() && insertedValues.find(value) == insertedValues.end()) {
results.emplace_back(value, unigram.score());
insertedValues.insert(value);
}
Expand Down
3 changes: 3 additions & 0 deletions Source/Engine/McBopomofoLM.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ class McBopomofoLM : public Formosa::Gramambular2::LanguageModel {
bool externalConverterEnabled() const;
/// Sets a lambda to let the values of unigrams could be converted by it.
void setExternalConverter(std::function<std::string(std::string)> externalConverter);
/// Sets a lambda to convert the macros.
void setMacroConverter(std::function<std::string(std::string)> macroConverter);

const std::vector<std::string> associatedPhrasesForKey(const std::string& key);
bool hasAssociatedPhrasesForKey(const std::string& key);
Expand Down Expand Up @@ -128,6 +130,7 @@ class McBopomofoLM : public Formosa::Gramambular2::LanguageModel {
bool m_phraseReplacementEnabled;
bool m_externalConverterEnabled;
std::function<std::string(std::string)> m_externalConverter;
std::function<std::string(std::string)> m_macroConverter;
};
};

Expand Down
Loading

0 comments on commit 2ca3ab7

Please sign in to comment.