Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create tests for Vakya Analyzer #155

Open
kmadathil opened this issue Feb 3, 2021 · 17 comments
Open

Create tests for Vakya Analyzer #155

kmadathil opened this issue Feb 3, 2021 · 17 comments

Comments

@kmadathil
Copy link
Owner

kmadathil commented Feb 3, 2021

Locate the DCS10K and DCS4K datasets mentioned in this paper. Also, look at the larger dataset mentioned in this later paper.

From these, create a set of testcases for the vakya analyzer.

Also, figure out their actual definitions for Precision, Recall and F-Score

@kmadathil
Copy link
Owner Author

kmadathil commented Feb 3, 2021

An even better option may be the smaller 1300 sentence testset found in this even later paper. . Advantage, it is available on github

The paper author has provided another publication with a better description of this work

@avinashvarna
Copy link
Collaborator

Thanks for the investigation. I've been a bit busy but will try do some catching up this weekend, by reading up on the papers.

The KISS paper has some superficial similarities. I found the supplementary material helpful in understanding the methodology, but probably need to read it a few more times to completely understand it.

We should probably plan out a proper sequence of next steps (based on priorities). Once we are ready to discuss this step, it may be helpful to have a call.

@kmadathil
Copy link
Owner Author

I have received the DCS10K and KISS datasets from Amrith Krishna. KISS has been committed into the DB. DCS10K will be added after I figure out how to (too many directories).

I have added basic test infrastructure and added a test_parser.py. I will close this after I get KISS tests working.

@gasyoun
Copy link

gasyoun commented Mar 9, 2021

https://zenodo.org/record/803508# is from 2017, so there has been a DCS update after it.

smaller 1300 sentence testset

There is this set of sentences that J. Huet trained on as well.

apte-verified.txt

@gasyoun
Copy link

gasyoun commented Mar 16, 2021

@drdhaval2785
Copy link

Web service is closed, due to not many users. Readme was also updated recently to explicitly say so, as far as I remember. Not able to locate it now.

@avinashvarna
Copy link
Collaborator

@drdhaval2785 Actually, we created a different web service on Google App Engine which is always enabled.

@gasyoun Thanks for reporting this issue. Looking at the logs, it does seem to be related to the parsing. I see logs of the form:

ERROR:sanskrit_parser.parser.datastructures:Partition 4: eva went to zero length!

@kmadathil can you please take a look to see if this works from the command line? I can also take a look, but probably in the weekend.

@kmadathil
Copy link
Owner Author

@gasyoun Please try a different input. This is an error condition that somehow is hanging the API

@avinashvarna
Copy link
Collaborator

Actually, sorry. The log I was looking at was for a slightly shorter input than what was in the reported issue. It appears that this input is causing the parse to take > 30s (which is the time limit on App Engine), and the process gets killed. GAE instances are not super-high performance, so we may need further optimizations.

@gasyoun
Copy link

gasyoun commented Mar 16, 2021

It appears that this input is causing the parse to take > 30s

How many words can I input?

@kmadathil
Copy link
Owner Author

I've sped this case up using on_the_fly constraint checking (explained in the Sphinx document). This case takes about 8 seconds on my computer

time python scripts/sanskrit_parser vakya "sA tu mahASvetAyA eva muKam avalokitavatI" --input SLP1  --min-cost --max-paths 10
...
real    0m8.508s
user    0m8.256s
sys     0m0.248s

@avinashvarna - thanks for the idea! Please update appspot to v0.2.3

@avinashvarna
Copy link
Collaborator

I updated, but the online version still times out for this input (runs in a container after all).

@gasyoun
Copy link

gasyoun commented Mar 23, 2021

I updated, but the online version still times out for this input (runs in a container after all).

So no way to test the scripts on the web, only locally?

@kmadathil
Copy link
Owner Author

Please hold on while we update the web service. We are working through some deployment issue with the sped-up code. It should work for you after that.

@gasyoun
Copy link

gasyoun commented Mar 23, 2021

It should work for you after that.

Oh, ok, I can wait for a few hours anyway ))

@avinashvarna
Copy link
Collaborator

So no way to test the scripts on the web, only locally?

If you are comfortable with python notebooks, you can use Binder and modify this notebook for your input to test it out online.

@gasyoun
Copy link

gasyoun commented Apr 1, 2021

python notebooks, you can use Binder and modify this notebook

Would ask for a video intro, if possible, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants