Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paninian Generator #144

Open
kmadathil opened this issue Oct 3, 2020 · 9 comments
Open

Paninian Generator #144

kmadathil opened this issue Oct 3, 2020 · 9 comments

Comments

@kmadathil
Copy link
Owner

FYI - I have begun coding a Paninian generator. The goal is to implement the ashtadhyayi plus vartikas as needed.
As of now, a basic skeleton that handles some pada-sandhi rules has been committed. Over time, I hope to add more rules, and move the process backward, eventually going through the following steps.

  1. Semantic tag input
  2. Prakriti + Pratyaya selection
  3. Prakriti + Pratyaya transformations
  4. Anga Transformation
  5. Samhita - intra pada
  6. Samhita - inter pada

Take a look at the generator branch - the sandhi.yaml file encodes the sutras I have so far, and process_yaml.py turns them into executable code. prakriya.py is the skeleton execution engine.

Run cd sanskrit_parser/generator ; python test.py to try it out.

@avinashvarna
Copy link
Collaborator

I think @drdhaval2785 has implemented similar generators. See https://github.com/drdhaval2785/SanskritVerb which I believe now has the older Subanta generation repo merged in. It includes a sandhi generator as well. Should we look at leveraging it before reimplementing?

@drdhaval2785
Copy link

Would be happy to help.

@kmadathil
Copy link
Owner Author

Sure, we should.
@drdhaval2785 - I had looked at this, and I remember we'd discussed this briefly as well. Is this completely in PHP, or is there a python version available? I remember you mentioning that this is a linear application of sutras based on the SK order - do I recollect it right?
What would be the best way to leverage this?

@drdhaval2785
Copy link

This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.

@kmadathil
Copy link
Owner Author

kmadathil commented Oct 6, 2020 via email

@kmadathil
Copy link
Owner Author

kmadathil commented Jan 3, 2021

Current status

  • YAML format for Sutras defined and parser implemented. This allows Sutras to be coded easily. This is way better than coding directly in Python, but I'm not 100% happy with the format yet
  • Implemented ~300 sutras.
  • Paninian Prakriya Engine implemented (with some current limitations, such as nitya/anitya tests)
  • Can generate prakriya for ajanta pum/strI/napum prAtipadikas.
  • Basic test suite added, with manual and pytest versions
    • pytest suite takes too much memory while the manual version (same underlying code) takes very little.

Eventually, this will allow us to replace the INRIA/Sanskrit_data databases with our own pada generator. Also, it will allow us to solve the overgeneration problem in the sandhi splitter by validating output splits with this generator.

@kmadathil
Copy link
Owner Author

$ time python ../../scripts/sanskrit_generator -t rAma -p jas --verbose
unable to import 'smart_open.gcs', disabling that module
INFO     Inputs [rAma, as]
INFO     rAma ['prAtipadika', 'pum']
INFO     as ['pratyaya', 'svAdi', 'sup', 'jas', 'suw', 'bahuvacana', 'praTamA', 'viBakti']
INFO     End Inputs

Prakriya
Input ['rAma', 'as']
Root
Prakriya Node
0 Prakriya Start ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
1 1.1.43 : suqanapuMsakasya  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
1.4.17 : svAdizvasarvanAmasTAne 
1.4.18 : yaci Bam 
1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam 
End
Child
Prakriya Node
2 1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam  ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
3 7.3.109: jasi ca  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.102: praTamayoH pUrvasavarRaH 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
4 6.1.102: praTamayoH pUrvasavarRaH  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
5 6.1.101: akaH savarRe dIrGaH  ['rAma', 'as'] 0-> ['rAmA', 's']
End
Child
Prakriya Node
6 6.1.105.1: dIrGAjjasi ca  ['rAmA', 's'] 0-> ['rAmA', 's']
End
Leaf Node
Final Output [['rAmA', 's']] = ['rAmAs']


Output: ['rAmAs']

real    0m10.504s
user    0m10.268s
sys     0m0.232s

@gasyoun
Copy link

gasyoun commented Apr 1, 2021

replace the INRIA/Sanskrit_data databases with our own pada generator

Have you seen P. Scharf's code? Based on it such picture can be generated:

KVfpnPuQMCc

@VedantMadane
Copy link

Have you seen P. Scharf's code? Based on it such picture can be generated:

Could you provide a link to the repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants