Skip to content

Latest commit

 

History

History
114 lines (86 loc) · 3.49 KB

README.en.md

File metadata and controls

114 lines (86 loc) · 3.49 KB

Tossi

Build Status Coverage Status README in English

"Tossi(토씨)" is a pure-Korean name for grammatical particles. Some of Korean particles has allomorphic variant forms depending on a leading word. The Tossi library determines most natural form.

Installation

$ pip install tossi

Usage

>>> import tossi
>>> tossi.postfix_particle(u'집', u'(으)로')
집으로
>>> tossi.postfix_particle(u'말', u'으로는')
말로는
>>> tossi.postfix_particle(u'대한민국', u'은(는)')
대한민국은
>>> tossi.postfix_particle(u'민주공화국', u'다')
민주공화국이다

Natural Form for Particles

These particles do not have allomorphic variant. They always appear in same form: , , 만~, 에~, 께~, 뿐~, 하~, 보다~, 밖에~, 같이~, 부터~, 까지~, 마저~, 조차~, 마냥~, 처럼~, and 커녕~:

나오, 모리안, 키홀, 나오, 모리안, 키홀

Meanwhile, these particles appear in different form depending on whether the leading word have a final consonant or not: 은(는), 이(가), 을(를), and 과(와)~:

나오, 모리안, 키홀

(으)로~ also have similar rule but if the final consonant is , it appears same with after non final consonant:

나오, 모리안으로, 키홀

(이)다 which is a predicative particle have more diverse forms. Its end can be inflected in general:

나오지만, 모리안이지만, 키홀이에요, 나오예요

Tossi tries to determine most natural form for particles. But if it fails to do, determines both forms like 은(는) or (으)로 for tolerance:

>>> tossi.postfix_particle(u'벽돌', u'으로')
벽돌로
>>> tossi.postfix_particle(u'짚', u'으로')
짚으로
>>> tossi.postfix_particle(u'黃金', u'으로')
黃金()

If the leading word ends with number, a natural form can be determined:

>>> tossi.postfix_particle(u'레벨 10', u'이')
레벨 10
>>> tossi.postfix_particle(u'레벨 999', u'이')
레벨 999

Words in a parentheses are ignored:

>>> tossi.postfix_particle(u'나뭇가지(만렙)', u'을')
나뭇가지(만렙)

Tolerance Styles

When Tossi can't determine the natural form, the result includes the both forms. In this case, you can choose the order of the forms. For example, if the most words are Japanese, they probably will not end with final consonants. Therefore 는(은) is better than 은(는) which is the default style:

>>> tolerance_style = tossi.parse_tolerance_style(u'는(은)')
>>> tossi.postfix_particle(u'さくら', u'이', tolerance_style=tolerance_style)
さくら가()

Choose one of 은(는), (은)는, 는(은), (는)은 for your project.

Licensing

Written by Heungsub Lee and Chanwoong Kim at What! Studio in Nexon, and distributed under the BSD 3-Clause license.