Assessing the Diagnostic Accuracy of Machine Learning Algorithms for Identification of Asthma in United States Adults based on NHANES Dataset
Authors:
- O. Kohandel Gargari
- M. Fathi
- S. Rajai Firouzabadi
- I. Mohammadi
- M. H. Mahmoudi
- M. Sarmadi
- A. Shafiee
2024
Background and Aim
Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare different ML diagnostic models and obtain the most valuable features for asthma diagnosis using data from the National Health and Nutrition Examination Survey (NHANES) dataset.
Material and Methods
A total of 8,888 participants with available asthma diagnosis data from the 2017-2018 NHANES survey were included. After careful selection of variables related to asthma, various ML algorithms, including Support Vector Machine (SVM), Random Forest (RF), AdaBoost (ADA), XGBoost (XGB), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Multi-Layer Perceptron (MLP), were evaluated.
Results
SVM and ADA emerged as top performers with the highest area under the curve (AUC) scores of 0.72 and 0.71, respectively. RF exhibited high accuracy but low precision. Feature interpretation using SHapley Additive exPlanations (SHAP) values identified significant predictors such as close relative asthma history, dietary fat intake, and chronic bronchitis. Feature reduction experiments showed promising results without significant loss in predictive performance.
Conclusion
Our findings demonstrate the potential diagnostic ability of ML algorithms, particularly SVM and ADA, in asthma diagnosis by incorporating diverse clinical and demographic factors. In addition, close relative asthma history, dietary fat intake, and chronic bronchitis could be suggested as valuable asthma diagnostic features. These outcomes can bring promising results in the early diagnosis of asthma.