Skip to content

mskspi/NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods

Challenge: The Critical Assessment of Massive Data Analysis (CAMDA) 2022 in collaboration with the Intelligent Systems for Molecular Biology (ISMB) hosted the Literature AI for Drug Induced Liver Injury (DILI) challenge.

A pipeline of data analysis using natural language processing in conjunction with machine learning methods

Internal and external validation strategy

The top 10 most common words in (A) DILI-related and (B) unrelated literature

The t-SNE visualization of the TF-IDF vectors obtained using (A) the title and abstract and (B) only the title of each publication

  1. Data for modeling
    • DILIPositive.tsv: DILI-related literature (title + abstract)
    • DILINegative.tsv: DILI-unrelated literature (title + abstract)
  2. External validaiton data
  3. Code
    • CAMDA_word_frequency.ipynb: To generate word frequecy and TSNE figures
    • CAMDA_word2vec+TFIDF.ipynb: Modeling and test using DILIPositive.tsv and DILINegative.tsv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published