Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Being aware of some hanjas' phonetic changes #7

Open
dahlia opened this issue Aug 20, 2016 · 3 comments
Open

Being aware of some hanjas' phonetic changes #7

dahlia opened this issue Aug 20, 2016 · 3 comments

Comments

@dahlia
Copy link
Collaborator

dahlia commented Aug 20, 2016

Some hanjas like 金/讀/畵 can be pronounced in different ways. The current behavior can produce incorrect results in some cases e.g.:

  • Input: 日成綜合大學은 平壤에 있는 朝鮮民主主義人民共和國의 國立大學이다.
  • Expected output: 일성종합대학은 평양에 있는 조선민주주의인민공화국의 국립대학이다.
  • Actual output: 일성종합대학은 평양에 있는 조선민주주의인민공화국의 국립대학이다.

See also the following table:

Hanja Word 1 Word 2
剛經 (강경) 浦國際空港 (포국제공항)
書 (서) 點 (구점)
龍點睛 (룡점정) (기)
@suminb
Copy link
Owner

suminb commented Aug 21, 2016

Thanks for your report. I'm unable to investigate this issue at the moment, but I'll try to re-visit this sometime this week.

@suminb
Copy link
Owner

suminb commented Jun 14, 2017

Sorry for the late response. It's almost been a year 😆

I looked into this briefly, and it looks like there is no easy way to deal with this issue other than making a huge rule table. Or maybe I'm missing something... If anyone could suggest a solution for this, it would be much appreciated.

@chaaklau
Copy link

chaaklau commented Feb 22, 2020

A huge rule table would be the easiest solution. Since you are using a mapping file, if you have a list of phrases that do not use the most common reading, you can place those phrases on top of your file.

For example, the hanja 金 has two readings 금 and 김. 김 is everywhere and you can't possibly list out all names with the reading 김. If you have a list of all '金' words, e.g. 대금 (代金), 금고 (金庫), 금요일 (金曜日) etc., put them on the top of your list. If there is no matched phrases, then use the default pronunciation. The table will grow 10 times bigger but it should not affect run-time too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants