Skip to content
forked from linuxscout/mishkal

Mishkal is an arabic text vocalization software

License

Notifications You must be signed in to change notification settings

chrys87/mishkal

 
 

Repository files navigation

Mishkal

Mishkal Arabic text vocalization software مشكال لتشكيل النصوص العربية

downloads downloads

Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Features value
Authors Authors.md
Release 1.10 Bouira
License GPL
Tracker linuxscout/mishkal/Issues
Mailinglist mishkal@googlegroups.com
Website tahadz.com/mishkal
Source Github
Download sourceforge
Feedbacks Comments
Accounts @Facebook @Twitter @Sourceforge

Setup

Debian/Ubuntu Linux

  1. Install necessary packages:
sudo apt install git python-pip
python -m pip install pyarabic arramooz-pysqlite qalsadi tashaphyne mysam-tagmanager
  1. Clone mishkal project from GitHub:
git clone https://github.com/linuxscout/mishkal.git

Usage

Windows :

  • Run MishkalGui.exe

GUI: Linux

  • interfaces/gui/mishkal-gui.py

Web server (linux, windows)

Console (linux/windows)

Usage: bin/mishkal-console -f filename [OPTIONS]

    mishkal-console 'السلام عليكم' [OPTIONS]

[-f | --file= filename]input file to mishkal-console
[-i | --ignore]   ignore the last Mark on output words.
[-r | --reduced]  Reduced Tashkeel.
[-s | --strip]    Strip tashkeel (remove harakat).
[-h | --help]     outputs this usage message
[-v | --version]  program version
[-l | --limit]    vocalize only a limited number of line
[-x | --syntax]   disable syntaxic analysis
[-m | --semantic] disable semantic analysis
[-c | --compare]  compare the vocalized text with the program output
[-t | --stat]     disable statistic tashkeel

This program is licensed under the GPL License

Files

  • file/directory category description

  • [bin]

    • mishkal-console.py program Mishkal script used on shell command
    • mishkal-gui.py program launch mishkal GUI interface with QT
    • mishkal-webserver.py web lauch mishkal web server\n
  • [docs] docs/ docs documentation

  • [setup]

    • exe_setup.py setup prepare setup for windows using py2exe
    • setup.py setup setup for library and linux package
  • [mishkal]

    • tashkeel/ Tashkeel module source
  • core/ basic API to join most of tools

  • [support]

    • aranasyn : syntaxical analyzer
    • arramooz : arabic morphological dictionary
    • asmai : semantic analyzer
    • CodernityDB : pure python, fast, NoSQL database, used as cache system to minimize load of morphological analyzer
    • collocations : collocation library ( deprecated)
    • libqutrub : verb conjugation library used by morphological analyzer
    • maskouk : collocation library
    • naftawayh : word tag library
    • pyarabic : basic arabic library
    • qalsadi ; morphological analyzer
    • spellcheck : spellchecking
    • tashaphyne : light stemmer used by morphological analyzer
  • [interfaces]

    • [web]
    • lib/ lib Libraries fot web interface
    • lib/okasha trivial web framework
    • lib/paste web frame work
    • lib/simplejson simple json library
    • files/ web files used for web service
    • templates/ web Templates used for web service
    • adawaty.py web a script for web service
    • cgirunner.py web a script for web service using cgi
    • crossdomain.xml web Configuration file to allow cross domain json API
    • index.html web an index file to avoid directory access
    • mishkal web A cgi Script used on web service
    • mishkal-webserver.py web lauch mishkal web server
    • [gui]
      • ar/ resources reources for gui arabic
  • [data]

    • data/ data databases files
  • [log]

    • tmp/ log tomporary fdirectory for web service
  • [tools]

    • cleanpyc setup a shell script to remove .pyc files
  • [test]

    • output/ test test output
    • samples/ test sample files
    • tools/ test script to use mishkal
  • [apps]

    • mintiq TTS a shell script to join mishkla with espeak Text to speech

How does Mishkal work:

Mishkal use a rule based method to detect relations and diacritics, First, it analyzes all morphological cases, it generates all possible diacritized word forms, by detecting all affixes and check it in a dictionary. second, It add word frequency to each word. The two previous steps are made by support/Qalsadi ( arabic morphological analyzer), the used dictionary is a separated project named 'Arramooz: arabic dictionnary for morphology". Third, we use a syntax analyzer to detect all possible relations between words. The syntax library is named support/ArAnaSyn. This analyzer is basic for the moment, it use only linear relations between adjacent words.

Forth, all data generated and relations will be analyzed semantically, to detect semantic relation in order to reduce ambiguity. The use libary is support/asmai ( Arabic semantic analysis). The semantic relations extraction is based on corpus. The used corpus is named "Tashkeela: arabic vocalized texts corpus".

In the final stage, The module mishkal/tashkeel tries to select the suitable word in the context, it tries to get evidents cases, or more related cases, else, it tries to select more probable case, using some rules like select a stop word by default, or select Mansoub case by default.

The rest of program provides functions to handles interfaces and API with web/desktop or command line

JSON connection API:

التشكيل عن بعد

يمكن استدعاء خدمة الموقع عبر مكتبة جيسون json و ajax من أي موقع، ويمكنك استعمالها في موقعك طريقة الاستدعاء 1- باستعمال تقنية json مع مكتبة Jquery

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
<!DOCTYPE html 	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <script src="http://code.jquery.com/jquery-latest.js"></script>
</head>
<body>
  <div id="result">

</div> <script> $().ready(function() { $.getJSON("http://tahadz.com/mishkal/ajaxGet", {text:"السلام عليكم\nاهلا بكم\nكيف حالكم", action:"TashkeelText"}, function(data) { $("#result").text(data.result); });

}); </script>


الاستدعاء يكون كما يأتي
1
$.getJSON("http://tahadz.com/mishkal/ajax...", {text:"السلام عليكم\nاهلا بكم\nكيف حالكم", action:"TashkeelText"},

حيث

  • text: النص المطلوب تشكيله.
  • action: العملية المطلوبة وهنا هي TashkeelText.
النتيجة تكون من الشكل
1
{"result": " السّلامُ عَلَيكُمْ اهلا بِكُمْ كَيْفَ حالُكُمْ", "order": "0"}
حيث
  • result: النص الناتج المشكول.
  • order: رقم السطر في النص الأصلي، فإذا كان النص الأصلي كبيرا يقسمه المشكال لعدد من الاسطر، وقد لا يرجعون في نفس الترتيب، لذا حددنا رقم الترتيب.

Featured Posts

  • “مشكال” لتشكيل النصوص العربية بإحترافية كمال فودة
  • كيفيشكيل الحروف والكلمات أو حتى نصوص باللغة العربية في ثواني من خلال متصفحك- رضا بوربعة
  • خدمة عربية جديدة : تشكيل النصوص العربية Sam Hamou
  • إطلاق الإصدار التجريبي برنامج مشكال لتشكيل النصوص العربية Zaid AlSaadi
  • مشكال: الطريق نحو التشكيل مدونة اليراع
  • مشكال لتشكيل النصوص العربية: إطلاق واجهة سطح المكتب مدونة اليراع
  • تعرّف على مشاريع “تحدّث” .. مشاريعٌ للغةٍ عظيمة محمد هاني صباغ

About

Mishkal is an arabic text vocalization software

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.0%
  • HTML 0.4%
  • C 0.3%
  • JavaScript 0.3%
  • Shell 0.0%
  • CSS 0.0%