-
Notifications
You must be signed in to change notification settings - Fork 5
Documentation
Documentation of ppaxe.core and ppaxe.report classes.
List of PubMed identifiers to query.
Database to download the articles or abstracts from. PMC or PUBMED.
List of downloaded Article objects.
PubMed identifiers of the articles found in database.
PubMed identifiers of the articles not found in database.
Retrieves the Fulltext or the abstracts of the specified Articles
PubMed identifier of the article.
PubMedCentral identifier of the article.
Journal id of the article.
Whole text of the article.
Abstract of the article.
List of Sentence objects in article (fulltext or abstract).
Writes tokenized sentences as HTML
Returns how many times each gene appears. Dictionary of gene objects with counts as values
Finds sentence boundaries and saves them as sentence objects in the attribute "sentences" as a list of Sentence objects.
-
mode
str
optional
default = "split"
Split the sentences ("split") or use the whole "source" as a single sentence ("no-split"). Useful for developing and debugging.
-
source
str
optional
default = "fulltext"
Use the "fulltext" or the "abstract" to extract sentences.
Creates a wordcloud image
Simple wrapper method to avoid calls to multiple methods.
-
source
str
optional
default = "fulltext"
Retrieve the interactions in the article from the source (fulltext or abstract).
Original text string of the sentence.
List of tokens retrieved from StanfordCoreNLP. Each element is a dictionary with keys:
"index" : Position of token (1-Indexed).
"word" : Word of the token.
"lemma" : Lemma of the token.
"ner" : Protein ("P") or Other ("O").
"pos" : Part-of-Speech tag.
List of Candidate objects in sentence.
List of Protein objects found in sentence.
Annotates the genes/proteins in the sentence using StanfordCoreNLP trained NER tagger. Will add a list of tokens to the attribute "tokens".
Gets interaction candidates candidates for sentence (attribute: candidates) and all the proteins (attribute: proteins).
Sentence to HTML string tagging the proteins and the verbs using tags.
Symbol of the protein or the gene.
List of the position of the protein in the tokenized sentence (1-Indexed).
Sentence object to which the protein belongs.
Synonymous symbol of the protein/gene.
Length of position list.
Method for disambiguating the gene (convert it to the approved symbol if possible).
Protein object of the first protein involved in the possible interaction.
Protein object of the second protein involved in the possible interaction.
Indexes of end of Protein_1 and start of Protein_2 (1-Indexed).
Label of Candidate when prediction is performed. True for interacting proteins and False for non-interacting proteins. True if votes >= 0.55.
Percentage of votes of the Random Forest Classifier.
Feature column indexes of the non-zero features computed for Candidate.
Store the current feature column index that has been computed.
Values of the non-zero features.
Sparse Coo matrix with features for Candidate.
Computes all the necessary features to predict if this InteractionCandidate is a real interaction. Fills attribute features_sparse, which is a Scipy sparse matrix.
Returns features as a plain python list. Used for testing.
Computes the votes (prediction) of the candidate by using the Random Forest classifier trained with scikitlearn.
Transforms candidate to html with only involved proteins tagged and only verbs between proteins tagged.
List of Article objects or PMQuery with Article objects in attribute "articles".
ProteinSummary object of the analysis.
GraphSummary object of the analysis.
Creates a pdf out of an html file.
-
outfile
str
required
no default
Output filename of the pdf report. Will append ".pdf" to it.
Makes all the necessary steps to make the report.
-
outfile
str
optional
default = "report"
Filename of the output file. Will append ".html" or ".pdf".
Writes an html with the report to outfile.
-
outfile
str
required
no default
Output filename of the html report. Will append ".html" to it.
List of Article objects with Article objects in attribute "articles".
Dictionary of dictionary with information about protein counts in articles. Keys:
symbol: symbol of the protein
'totalcount' : total number of ocurrencies of protein.
'int_count'
'left' : Ocurrencies of protein on left hand side of interaction.
'right' : Ocurrencies of protein on right hand side of interaction.
Makes the summary of the proteins found using the NER
Returns an html string with the desired protein count table.
-
sorted_by
str
optional
default = "totalcount"
Sort table by total number of ocurrences of protein in sentences (sorted_by="totalcount"), by total number of ocurrences in interactions (sorted_by="int_count"), by ocurrences in left hand side of interaction (sorted_by="left") or righ hand side (sorted_by="right").
-
reverse
bool
optional
default = True
Sort proteins in reverse order (from bigger to smaller) according to the sorted_by rule if True. Reverse (smaller to bigger) if False.
List of Article objects with Article objects in attribute "articles".
List of lists with interactions in articles. Elements:
[
[
votes,
prot1.symbol,
prot1.disambiguate(),
prot2.symbol,
prot2.disambiguate(),
candidate.to_html(),
article.pmid
],
...
]
Number of interactions in articles.
Set with symbols of interactions in articles to remove redundant interactions.
Number of unique interactions in articles.
Returns a json string with the graph prepared for cytoscape
Makes the summary of the interactions retrieved.
Returns a string in html with the interactions sorted by votes/confidence