-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics for evaluating performance of lexical/morphological analyzer #84
Comments
First results from a "quick and dirty" script I wrote to evaluate word lookup accuracy (recall, if you will):
At a first pass, looks like the sanskrit data based lookup recognized about 300k more words. I think it is definitely worthwhile to move to it. As we incorporate more and more of the Inria db into it, it will always be the better choice from a recall perspective. It may look like the overall accuracy is quite low, but there are two mitigating factors:
Next steps:
I will clean up my "quick and dirty" script to make it more amenable for the next steps and check it in by the weekend. |
I have added some metrics for word level accuracy on the sanskrit_util branch here - https://github.com/kmadathil/sanskrit_parser/tree/sanskrit_util/metrics I have also started working on evaluating lexical split accuracy using the dataset as part of the project referred to in #85 . Currently planning to use the BLEU score or chrF score (from machine translation literature) to evaluate the accuracy of these splits. Please let me know if there are any other ideas for evaluating accuracy |
I concur |
Scripts for evaluating lexical split accuracy added to scoring branch here - https://github.com/kmadathil/sanskrit_parser/blob/scoring/metrics/lexical_split_scores.py |
Adding an use case where scoring may help resolve the best split below. Can the tool choose
|
I worked a lot on this problem, and can vouch that https://stackoverflow.com/questions/8870261/how-to-split-text-without-spaces-into-list-of-words/11642687 is the best solution around. All we need is a frequency count for lexemes. drdhaval2785/samasasplitter#3 (comment) is where some idea about frequencies will be got |
@codito - Not sure how the whitespace problem and this issue are related? This is about evaluating accuracy, is it not. Your issue is picking one split over another. |
I thought this issue also tracks using a score to ensure the most likely split gets higher priority in the output. Please ignore if I confused two different things. |
An Automatic Sanskrit Compound Processing How would you classify the approach? |
Need to develop metrics for evaluating performance of the analyzers. This would be useful if we were trying to choose between databases for looking up tags/ different approaches for lexical/morphological analysis
From #82 (comment)
This would be a good start. Currently we do not pay much attention to the number of pass/fail etc in the test suite. My concern is that the UoHD dataset entries are not broken down to simple roots, and we are using the splitter to split them until we get words that are in the db (as discussed before - #19 (comment)). I am not sure that this will give us an accurate representation of the performance.
We should start looking into the DCS database to see if it is more appropriate. E.g. for the Level 1 database/tag lookups, we could perhaps just start with the roots provided in the DCS database and see how many are identifiable using the level 1/tag lookup db. We can then start building the tests up to the lexical/morphological levels.
The text was updated successfully, but these errors were encountered: