整合淋巴细胞亚群与临床特征的机器学习模型在非结核分枝杆菌肺病、肺结核及其他肺部疾病鉴别诊断中的应用与效能评估
CSTR:
作者:
作者单位:

(1. 同济大学附属上海市肺科医院上海市结核病重点实验室,结核病临床研究中心,上海200433; 2. 同济大学附属上海市肺科医院检验科,上海200433)

作者简介:

王蕾(1989—),女,主治医师,硕士研究生,E-mail: 17602155688@163.com

通讯作者:

孙勤,E-mail: sunqinbongjour@163.com;沙巍,E-mail: Wei Sha, shfksw@126.com;

中图分类号:

R52

基金项目:

上海市卫生健康委员会青年项目(20204Y0325)


Application and performance of machine learning models integrating lymphocyte subsets and clinical features in: discriminating NTM-PD, pulmonary tuberculosis and other lung diseases
Author:
Affiliation:

(1. Clinical and Research Center for Tuberculosis, Shanghai Key Laboratory of Tuberculosis, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China; 2. Department of Clinical Laboratory, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的基于淋巴细胞亚群计数指标,利用不同机器学习方法构建诊断模型,区分非结核分枝杆菌肺病(nontuberculous mycobacterial pulmonary disease, NTM-PD)、肺结核(pulmonary tuberculosis, PTB)以及其余常见易混淆的肺部疾病,为早期识别肺部感染性疾病提供科学依据。 方法本研究选取2023年1月—2023年12月同济大学附属上海市肺科医院结核科收治的确诊为活动性结核病、NTM-PD和其他肺部疾病(肺部炎症性疾病及肺部肿瘤)的患者为研究对象,采用流式细胞技术检测淋巴细胞亚群计数。使用多分类Logistic回归、朴素贝叶斯、随机森林、XGBoost 4种算法进行建模预测,使用贝叶斯和交叉验证方式进行算法超参数优化。从开发集中的差异性分析筛选P<0.1变量,使用相关性分析和Lasso回归进行特征筛选后进入建模。构建多分类Logistic回归、朴素贝叶斯、随机森林、XGBoost 4种机器学习模型。使用受试者工作特征曲线下面积(area under the receiver operating characteristic curve, AU-ROC)、平均精度-精确率召回曲线(average precision-precision recall curve, AP-PR)和决策曲线分析(decision curve analysis, DCA)曲线对模型在测试集上的表现进行评价。 结果本研究共纳入1 383例患者,其中结核组836例,NTM肺病组254例,其他组293例;以筛选出的人口学信息、合并疾病、淋巴细胞亚群指标为输入变量,以疾病类别为结果变量,成功构建多分类Logistic回归、朴素贝叶斯、随机森林、XGBoost 4种机器学习模型,其中随机森林预测效果最好;模型中变量的重要性排序依次为: 身体质量指数(body mass index, BMI)、CD3+T细胞、CD16+56+NK细胞、CD8+T细胞(细胞毒性T细胞)、年龄、%CD3+T细胞、CD19+B细胞、CD4+T细胞(辅助性T细胞)、性别、贫血、糖尿病、白细胞减少、低蛋白血症、自身免疫性疾病,其中BMI和CD3+T细胞、CD16+56+NK细胞和CD8+T细胞(细胞毒性T细胞)贡献度最大。 结论本研究构建的机器学习模型通过结合淋巴细胞亚群及临床特征,成功区分了活动性结核、NTM-PD及其他肺部疾病,为肺部疾病的早期诊断和个性化治疗提供了新的思路和方法。

    Abstract:

    ObjectiveBased on lymphocyte subset count indicators, diagnostic models were constructed using different machine learning methods to distinguish non-tuberculous mycobacterial pulmonary disease(NTM-PD), pulmonary tuberculosis(PTB), and other common confounding pulmonary diseases, to provide a scientific basis for the early identification of infectious pulmonary diseases. MethodsThe patients diagnosed with active tuberculosis(ATB), NTM-PD, or other pulmonary diseases(including inflammatory and neoplastic conditions) admitted to the Department of Tuberculosis at Shanghai Pulmonary Hospital from January to December in 2023 were included in this study. Lymphocyte subset counts were measured using flow cytometry. Four machine learning algorithms—multinomial logistic regression, naive Bayes, random forest, and XGBoost—were employed for model development. Hyperparameter tuning was performed using Bayesian optimization and cross-validation. The variables with P<0.1 from univariate analysis were selected and further refined via correlation analysis and LASSO for final model input. The models were evaluated using area under the receiver operating characteristic curve(AU-ROC), average precision-precision recall curve(AP-PR), and decision curve analysis(DCA) curves on the test set. ResultsA total of 1 383 patients were included, with 836 cases in the ATB group, 254 in the NTM group, and 293 in the OTHER group. Using selected demographic data, comorbidities, and lymphocyte subset indices as input variables and disease category as the outcome variable, four machine learning models were successfully constructed. Among them, the random forest model demonstrated the best predictive performance; the top contributing variables in the models were body mass index(BMI), CD3+T cells, CD16+56+NK cells, CD8+T cells(cytotoxic T cells), age, %CD3+T cells, CD19+B cells, CD4+T cells(helper T cells), gender, anemia, diabetes, leukopenia, hypoproteinemia, and autoimmune disease; and BMI, CD3+T cells, CD16+56+NK cells, and CD+T cells(cytotoxic T cells) contributed most significantly. ConclusionThe machine learning models developed in this study successfully differentiated ATB, NTM-PD, and other pulmonary diseases by integrating lymphocyte subset profiles with clinical features. These models provide novel approaches for the early diagnosis and personalized management of pulmonary diseases.

    参考文献
    相似文献
    引证文献
引用本文

王蕾,曹婕,刘轾彬,等.整合淋巴细胞亚群与临床特征的机器学习模型在非结核分枝杆菌肺病、肺结核及其他肺部疾病鉴别诊断中的应用与效能评估[J].同济大学学报(医学版),2025,46(6):848-856.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-08-15
  • 最后修改日期:
  • 录用日期:2025-09-14
  • 在线发布日期: 2026-01-07
  • 出版日期:
文章二维码