An explainable artificial intelligence framework for risk prediction of COPD in smokers


  • Zhe W, Lin LI, Cheng LI, University XMJCDM. Stage prediction of chronic obstructive pneumonia based on machine learning. China Digit Med. 2019;14(03):38–40.

  • López-Campos JL, Tan W, Soriano JB. Global burden of COPD. Respirology (Carlton, Vic). 2016;21(1):14–23.

    PubMed 

    Google Scholar 

  • Berlin L. Medical errors, malpractice, and defensive medicine: an ill-fated triad. Diagnosis (2194-802X). 2017.

  • Adeloye D, Chua S, Lee C, Basquill C, Papana A, Theodoratou E, Nair H, Gasevic D, Sridhar D, Campbell H, et al. Global and regional estimates of COPD prevalence: systematic review and meta-analysis. J Glob Health. 2015;5(2):020415.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang C, Xu J, Yang L, Xu Y, Zhang X, Bai C, Kang J, Ran P, Shen H, Wen F, et al. Prevalence and risk factors of chronic obstructive pulmonary disease in China (the China Pulmonary Health [CPH] study): a national cross-sectional study. Lancet (London, England). 2018;391(10131):1706–17.

    PubMed 

    Google Scholar 

  • Qian W, Jiaonan W, Tiantian L. Research progress on the relationship between air pollution and chronic obstructive pulmonary disease. Chin J Front Med. 2016;8(09):9–13.

    Google Scholar 

  • Woodruff PG, Barr RG, Bleecker E, Christenson SA, Couper D, Curtis JL, Gouskova NA, Hansel NN, Hoffman EA, Kanner RE, et al. Clinical significance of symptoms in smokers with preserved pulmonary function. N Engl J Med. 2016;374(19):1811–21.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mathers CD, Loncar D. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 2006;3(11):e442.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Miravitlles M, de la Roza C, Naberan K, Lamban M, Gobartt E, Martin A. Use of spirometry and patterns of prescribing in COPD in primary care. Respir Med. 2007;101(8):1753–60.

    PubMed 

    Google Scholar 

  • National Institute for Health and Care Excellence-NICE [homepage on the Internet]. Chronic obstructive pulmonary disease in over 16s: diagnosis and management; [about 4 screens]. London: NICE; c2016. [cited 2016 Feb 26]. Available from: https://www.nice.org.uk/guidance/cg101.

  • Qaseem A, Wilt TJ, Weinberger SE, Hanania NA, Criner G, van der Molen T, Marciniuk DD, Denberg T, Schünemann H, Wedzicha W, et al. Diagnosis and management of stable chronic obstructive pulmonary disease: a clinical practice guideline update from the American College of Physicians, American College of Chest Physicians, American Thoracic Society, and European Respiratory Society. Ann Intern Med. 2011;155(3):179–91.

    PubMed 

    Google Scholar 

  • Centers for Disease Control and Prevention (US); National Center for Chronic Disease Prevention and Health Promotion (US); Office on Smoking and Health (US). How Tobacco Smoke Causes Disease: The Biology and Behavioral Basis for Smoking-Attributable Disease: A Report of the Surgeon General. Atlanta (GA): Centers for Disease Control and Prevention (US); 2010. ISBN-13: 978-0-16-084078-4. Available from: https://www.ncbi.nlm.nih.gov/books/NBK53017/.

  • Services USDoHaH. The health conseques of smoking-50 years of progress. Atlanta: Centers for Disease Control and Prevention; 2014.

    Google Scholar 

  • Lamprecht B, McBurnie MA, Vollmer WM, Gudmundsson G, Welte T, Nizankowska-Mogilnicka E, Studnicka M, Bateman E, Anto JM, Burney P, et al. COPD in never smokers: results from the population-based burden of obstructive lung disease study. Chest. 2011;139(4):752–63.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Thomsen M, Nordestgaard BG, Vestbo J, Lange P. Characteristics and outcomes of chronic obstructive pulmonary disease in never smokers in Denmark: a prospective population study. Lancet Respir Med. 2013;1(7):543–50.

    PubMed 

    Google Scholar 

  • Zhang J, Lin XF, Bai CX. Comparison of clinical features between non-smokers with COPD and smokers with COPD: a retrospective observational study. Int J Chron Obstruct Pulmon Dis. 2014;9:57–63.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Hagstad S, Bjerg A, Ekerljung L, Backman H, Lindberg A, Rönmark E, Lundbäck B. Passive smoking exposure is associated with increased risk of COPD in never smokers. Chest. 2014;145(6):1298–304.

    PubMed 

    Google Scholar 

  • Yu H, Zhao J, Liu D, Chen Z, Sun J, Zhao X. Multi-channel lung sounds intelligent diagnosis of chronic obstructive pulmonary disease. BMC Pulm Med. 2021;21(1):321.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Levy J, Álvarez D, Del Campo F, Behar JA. Machine learning for nocturnal diagnosis of chronic obstructive pulmonary disease using digital oximetry biomarkers. Physiol Meas. 2021;42(5). https://doi.org/10.1088/1361-6579/abf5ad.

  • Ma X, Wu Y, Zhang L, Yuan W, Yan L, Fan S, Lian Y, Zhu X, Gao J, Zhao J, et al. Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population. J Transl Med. 2020;18(1):146.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wu CT, Li GH, Huang CT, Cheng YC, Chen CH, Chien JY, Kuo PH, Kuo LC, Lai F. Acute exacerbation of a chronic obstructive pulmonary disease prediction system using wearable device data, machine learning, and deep learning: development and cohort study. JMIR Mhealth Uhealth. 2021;9(5):e22591.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Moslemi A, Kontogianni K, Brock J, Wood S, Herth F, Kirby M. Differentiating COPD and asthma using quantitative CT imaging and machine learning. Eur Respir J. 2022;60(3):2103078.

    PubMed 

    Google Scholar 

  • Wang C, Chen X, Du L, Zhan Q, Yang T, Fang Z. Comparison of machine learning algorithms for the identification of acute exacerbations in chronic obstructive pulmonary disease. Comput Methods Programs Biomed. 2020;188:105267.

    PubMed 

    Google Scholar 

  • Goto T, Camargo CA Jr, Faridi MK, Yun BJ, Hasegawa K. Machine learning approaches for predicting disposition of asthma and COPD exacerbations in the ED. Am J Emerg Med. 2018;36(9):1650–4.

    PubMed 

    Google Scholar 

  • Makimoto K, Hogg JC, Bourbeau J, Tan WC, Kirby M. CT imaging with machine learning for predicting progression to COPD in individuals at risk. Chest. 2023. https://doi.org/10.1016/j.chest.2023.06.008.

  • Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, Liston DE, Low DK, Newman SF, Kim J, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2(10):749–60.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Cabitza F, Rasoini R, Gensini GF. Unintended consequences of machine learning in medicine. JAMA. 2017;318(6):517–8.

    PubMed 

    Google Scholar 

  • Kaplan A, Cao H, FitzGerald JM, Iannotti N, Yang E, Kocks JWH, Kostikas K, Price D, Reddel HK, Tsiligianni I, et al. Artificial intelligence/machine learning in respiratory medicine and potential role in asthma and COPD diagnosis. J Allergy Clin Immunol Pract. 2021;9(6):2255–61.

    CAS 
    PubMed 

    Google Scholar 

  • Feng Y, Wang Y, Zeng C, Mao H. Artificial intelligence and machine learning in chronic airway diseases: focus on asthma and chronic obstructive pulmonary disease. Int J Med Sci. 2021;18(13):2871–89.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Liwen F, Heling B, Baohua W, Yajing F, Shu C, Ning W, Jing F, Linhong W. A summary of item and method of national chronic obstructive pulmonary disease surveillance in China. Chin J Epidemiol. 2018;39(05):546–50.

    Google Scholar 

  • Audigier V, Husson F, Josse J. A principal component method to impute missing values for mixed data. In: Advances in data analysis & classification. 2016.

  • Singh A, Thakur N, Sharma A. A review of supervised machine learning algorithms. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom): 2016. 2016.

  • Liu Y, Wang Y, Zhang J. New machine learning algorithm: random forest. In: International conference on information computing & applications: 2012. 2012.

  • Jinsha M. Variable selection methods based on variable importance measurement from random forest and its application in diagnosis of tumor typing. Master. Shanxi Medical University. 2022. https://doi.org/10.27288/d.cnki.gsxyu.2021.000202.

  • Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinform. 2019;20(2):492–503.

    PubMed 

    Google Scholar 

  • Tang Z, Zhang F, Wang Y, Zhang C, Li X, Yin M, Shu J, Yu H, Liu X, Guo Y, et al. Diagnosis of hepatocellular carcinoma based on salivary protein glycopatterns and machine learning algorithms. Clin Chem Lab Med. 2022;60(12):1963–73.

    CAS 
    PubMed 

    Google Scholar 

  • Li M, Lu X, Yang H, Yuan R, Yang Y, Tong R, Wu X. Development and assessment of novel machine learning models to predict medication non-adherence risks in type 2 diabetics. Front Public Health. 2022;10:1000622.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Kursa MB, Jankowski A, Rudnicki WR. Boruta – a system for feature selection. Fund Inform. 2010;101(4):271–85.

    Google Scholar 

  • Sun Y, Kamel MS, Wong A, Yang W. Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 2007;40(12):3358–78.

    Google Scholar 

  • He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263–84.

    Google Scholar 

  • Zhang C, Tan KC, Li H, Hong GS. A cost-sensitive deep belief network for imbalanced classification. IEEE Trans Neural Netw Learn Syst. 2019;30(1):109–22.

    PubMed 

    Google Scholar 

  • Barandela R, Sánchez JS, Garcıa V, Rangel E. Strategies for learning in class imbalance problems. Pattern Recogn. 2003;36(3):849–51.

    Google Scholar 

  • Tahir MA, Kittler J, Yan F. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognit. 2012;45(10):3738–50.

    Google Scholar 

  • García S, Herrera F. Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput. 2009;17(3):275–306.

    PubMed 

    Google Scholar 

  • Hu F, Li H. A novel boundary oversampling algorithm based on neighborhood rough set model: NRSBoundary-SMOTE. Math Probl Eng. 2013;2013(pt.13):43–4.

    Google Scholar 

  • Cortes C, Vapnik VN. Support vector networks. Mach Learn. 1995;20(3):273–97.

    Google Scholar 

  • Basili VR, Briand LC. A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng. 1996;22(10):273–97.

    Google Scholar 

  • Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Knowledge discovery and data mining: 2016. 2016.

  • Qi M. LightGBM: a highly efficient gradient boosting decision tree. In: Neural information processing systems: 2017. 2017.

  • Duan T, Avati A, Ding DY, Thai KK, Basu S, Ng AY, Schuler A. NGBoost: natural gradient boosting for probabilistic prediction. 2019.

  • Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. 2018.

  • Yang H, Li X, Cao H, Cui Y, Luo Y, Liu J, Zhang Y. Using machine learning methods to predict hepatic encephalopathy in cirrhotic patients with unbalanced data. Comput Methods Programs Biomed. 2021;211:106420.

    PubMed 

    Google Scholar 

  • Wang K, Tian J, Zheng C, Yang H, Ren J, Liu Y, Han Q, Zhang Y. Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med. 2021;137:104813.

    PubMed 

    Google Scholar 

  • Liao H, Zhang X, Zhao C, Chen Y, Zeng X, Li H. LightGBM: an efficient and accurate method for predicting pregnancy diseases. J Obstet Gynaecol. 2022;42(4):620–9.

    CAS 
    PubMed 

    Google Scholar 

  • Choe S, Punmiya R. Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing. IEEE Trans Smart Grid. 2019;10(2):2326–9.

    Google Scholar 

  • Lundberg S, Lee SI. A unified approach to interpreting model predictions. In: Nips: 2017. 2017.

  • Athanasiou M, Sfrintzeri K, Zarkogianni K, Thanopoulou AC, Nikita KS. An explainable XGBoost–based approach towards assessing the risk of cardiovascular disease in patients with Type 2 Diabetes Mellitus. In: 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE): 2020. 2020.

  • Lundberg SM, Erion GG, Lee SI. Consistent individualized feature attribution for tree ensembles. 2018.

  • Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment effects. Oxford Acad. 2021;108(2):299–319.

    Google Scholar 

  • Enright PL, Crapo RO. Controversies in the use of spirometry for early recognition and diagnosis of chronic obstructive pulmonary disease in cigarette smokers. Clin Chest Med. 2000;21(4):645–52.

    CAS 
    PubMed 

    Google Scholar 

  • Amaral JL, Lopes AJ, Jansen JM, Faria AC, Melo PL. An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms. Comput Methods Programs Biomed. 2013;112(3):441–54. https://doi.org/10.1016/j.cmpb.2013.08.004.

  • Kim BJ, Jang SK, Kim YH, Lee EJ, Chang JY, Kwon SU, Kim JS, Kang DW. Diagnosis of acute central dizziness with simple clinical information using machine learning. Front Neurol. 2021;12:691057.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Kang EA, Jang J, Choi CH, Kang SB, Bang KB, Kim TO, Seo GS, Cha JM, Chun J, Jung Y, et al. Development of a clinical and genetic prediction model for early intestinal resection in patients with Crohn’s disease: results from the IMPACT study. J Clin Med. 2021;10(4):633.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mohanty SD, Lekan D, McCoy TP, Jenkins M, Manda P. Machine learning for predicting readmission risk among the frail: explainable AI for healthcare. Patterns (New York, NY). 2021;3(1):100395.

    Google Scholar 

  • Peng C, Yan Y, Li Z, Jiang Y, Cai Y. Chronic obstructive pulmonary disease caused by inhalation of dust: a meta-analysis. Medicine (Baltimore). 2020;99(34):e21908.

    PubMed 

    Google Scholar 

  • Yang H, Wang H, Du L, Wang Y, Zhang R. Disease knowledge and self-management behavior of COPD patients in China. Medicine. 2019;98(8):e14460.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhong N, Wang C, Yao W, Chen P, Kang J, Huang S, Chen B, Wang C, Ni D, Zhou Y, et al. Prevalence of chronic obstructive pulmonary disease in China: a large, population-based survey. Am J Respir Crit Care Med. 2007;176(8):753–60.

    PubMed 

    Google Scholar 

  • Pathak U, Gupta NC, Suri JC. Risk of COPD due to indoor air pollution from biomass cooking fuel: a systematic review and meta-analysis. Int J Environ Health Res. 2020;30(1):75–88.

    PubMed 

    Google Scholar 

  • Hardin M, Foreman M, Dransfield MT, Hansel N, Han MK, Cho MH, Bhatt SP, Ramsdell J, Lynch D, Curtis JL, et al. Sex-specific features of emphysema among current and former smokers with COPD. Eur Respir J. 2016;47(1):104–12.

    CAS 
    PubMed 

    Google Scholar 

  • Chan KY, Li X, Chen W, Song P, Wong NWK, Poon AN, Jian W, Soyiri IN, Cousens S, Adeloye D, et al. Prevalence of chronic obstructive pulmonary disease (COPD) in China in 1990 and 2010. J Glob Health. 2017;7(2):020704.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Johnston AK, Mannino DM, Hagan GW, Davis KJ, Kiri VA. Relationship between lung function impairment and incidence or recurrence of cardiovascular events in a middle-aged cohort. Thorax. 2008;63(7):599–605.

    CAS 
    PubMed 

    Google Scholar 


  • Leave a Reply

    Your email address will not be published. Required fields are marked *