-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quote attributions to Character Ids #3
Open
NikhilPr95
wants to merge
27
commits into
dbamman:master
Choose a base branch
from
NikhilPr95:quotes
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
New function setCharacterIDs contains code copy-pasted
New function setCharacterIDs contains code cut-pasted from PrintUtils.PrintTokens. This sets the characterIds of each token beforehand making the Ids easier to access as well as resulting in more accurate tokens during processing. Originally, the characterIds are set only during printing, leaving them inaccessible until that point.
Added sentenceID as well as attributed Character Id for each quote that is printed. This makes it convenient to attribute quotes to characters even when the speakers are referred to only by pronouns such as 'he' and 'she'.
Calling new function setCharacterIds that sets character Ids beforehand rather than during printing only.
In each 'para' containing multiple quotes, all said quotes are spoken by the first speaker, except when an error is made by the parser itself. The code added reflects this change.
switched positions of the calls of printWithLinksAndCorefAndQuotes and dumpForAnnotation, as the former uses information from the latter
Used alternate method that does not rely on book.animateEntities for extracting phrase names. This was mostly done due to the changes added in the quote attribution method that required a quote-attributed name to be valid only if it came from a phrase put into animateEntities. The improvements I made to the quote-attribution program stood in contradiciton with this, as the name I extracted for quote-attribution did not always stay in animateEntities. I could have added that particular 'phrase' containing the name to animateEntities instead, but I decided to subvert the requirement itself as I did not want to meddle with the code for extracting phrases unnecessarily. The code I wrote would add phrases which the phrase-generating code did not deem as legitimate to add. As my program does not need this requirement anyway (I don't know if requirement is the correct word - I saw the code few weeks ago, and all I can say is that all the quote-attributed names happen to be from phrases in animateEntities - I don't remember whether the name dictates whether the phrase is added or the other way round), I just skipped it and used an alternate method that requires the printHTML option to be processed first as can be seen in my changes to BookNLP.java
Added condition for ner being 'PERSON' as well as a new feature for the same, isPerson
Includes checking whether the quote is in same para as last quote, and assigning the former attribution to the latter
Now using Stanford CoreNLP (latest version) for parsing including dependency parsing (formerly done by MaltParser) including option 'depparse' for faster parsing using neural networks. This uses Universal Dependencies rather than Stanford dependencies as the CoreNLP does itself. Universal dependencies, however, create trees which sometimes have loops in them for dependency trees because of multiple found relations. This is dealt with here by choosing the best link, and tree from a graph.
New weights generated as a result of adding new feature 'isPerson'
New feature isPerson added
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Created a function that assigns character Ids beforehand in the BookNLP process() function, rather than at the end during printing.
For the option "d", In printQuotes, added two extra attributes, sentenceID and characterId to be attributed to each quote and printed to file. This quickly widens the scope of quote Attribution to a large extent, as every 'he' and 'she' that a quote is attributed to is mapped to their character Ids, making it possible to see which character actually said which quote by a hugely increased amount.