Skip to content

Commit

Permalink
Update: add ablity to install via pip
Browse files Browse the repository at this point in the history
  • Loading branch information
NoerNova committed Jun 28, 2024
1 parent 160a505 commit 116ac6a
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ venv/
**/__pycache__

test.py
labs.ipynb
30 changes: 28 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# ShanNLP: Shan Natural Language Processing

**experimental project and self-research inspired by [PythaiNLP](https://github.com/PyThaiNLP/pythainlp)**

## Current State

- [ ] corpus dict word: 19904 words (60% corvered and need more to collected)

## Word Tokenization method

- [x] maximal_matching
- [x] pythainlp (newmm)

## TODO

- [ ] mining more shan words, poem
- [ ] experiment more method to tokenize
- [ ] word tokenize
Expand All @@ -21,7 +25,11 @@
- [ ] word_vector

## USAGE

### Install

Clone this Repo

```python
# this project using pythainlp dependecy
# - Trie data structure
Expand All @@ -31,9 +39,16 @@ pip install -r requirements.txt
# or pip install pythainlp
```

Install with pip
```bash
pip install git+https://github.com/NoerNova/ShanNLP

```

### Tokenization

#### maximal_matching bruce-force

```python
from shannlp import word_tokenize

Expand All @@ -56,6 +71,7 @@ print(word_tokenize(input_text))
```

#### pythainlp newmm

```python
from shannlp import word_tokenize
import time
Expand All @@ -78,6 +94,7 @@ print(end - start)
```

### Digit convert

```python
from shannlp.util import digit_to_text

Expand All @@ -88,6 +105,7 @@ print(digit_to_text("မႂ်ႇသုင်ပီမႂ်ႇတႆး ႒႑
```

#### num_to_word

```python
from shannlp.util import num_to_shanword

Expand All @@ -96,6 +114,7 @@ print(num_to_shanword(2117))
```

#### shanword_to_num

```python
from shannlp.util import shanword_to_num

Expand All @@ -104,6 +123,7 @@ print(shanword_to_num("ထွၼ်ႁဵင်ၵဝ်ႈပၢၵ်ႇၵဝ
```

#### text_to_num

```python
from shannlp.util import text_to_num

Expand All @@ -112,7 +132,9 @@ print(text_to_num("သွင်ႁဵင်ၼိုင်ႈပၢၵ်ႇသ
```

### Date converter

#### ***need more reference for years converter***

```md
current reference
# https://shn.wikipedia.org/wiki/ဝၼ်းၸဵတ်းဝၼ်း_ၽၢႆႇတႆး
Expand All @@ -136,6 +158,7 @@ print(f"မိူဝ်ႈဝၼ်းသိုၼ်း {shanword_to_date('မ
```

#### years convert

```python
from shannlp.util import convert_years

Expand All @@ -153,6 +176,7 @@ print(convert_years(2566, "be", "ga"))
```

### Keyboard

```python
from shannlp.util import eng_to_shn, shn_to_eng

Expand All @@ -164,12 +188,14 @@ print(shn_to_eng("ေၺၺူၼ"))
```

## Citations
```

```txt
Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, & Pattarawat Chormai. (2016, Jun 27). PyThaiNLP: Thai Natural Language Processing in Python. Zenodo. http://doi.org/10.5281/zenodo.3519354
```

BibText entry:
```

```txt
@misc{pythainlp,
author = {Wannaphong Phatthiyaphaibun and Korakot Chaovavanich and Charin Polpanumas and Arthit Suriyawongkul and Lalita Lowphansirikul and Pattarawat Chormai},
title = {{PyThaiNLP: Thai Natural Language Processing in Python}},
Expand Down

0 comments on commit 116ac6a

Please sign in to comment.