基于机器学习构建乳腺癌骨转移预测模型

欧阳飞, 王阳, 陈瑜, 裴国清, 王陵, 张扬, 石磊

  1. 1.空军军医大学第一附属医院骨科,陕西 西安,710032
    2.空军军医大学第一附属医院神经内科,陕西 西安,710032
    3.空军军医大学军事预防医学系卫生统计学教研室,陕西 西安,710032
  • 收稿日期:2024-06-13 修回日期:2024-09-05 出版日期:2024-10-30 发布日期:2024-11-20
  • 通信作者: 石磊
  • 作者简介:第一作者:欧阳飞(ORCID: 0009-0007-6226-1440),在读硕士研究生,住院医师。

摘要/Abstract

摘要:

背景与目的:乳腺癌是全球重大公共卫生问题,骨是乳腺癌远处转移最常见的部位,约占所有转移病例的70%。乳腺癌骨转移可引起一系列并发症,包括剧烈疼痛、病理性骨折、高钙血症、脊髓压迫等,给患者身体活动带来极大不便,影响生活质量。转移性复发是乳腺癌患者死亡的主要原因。因此迫切需要构建乳腺癌骨转移预测模型,以识别具有高骨转移风险的患者。本研究旨在开发基于机器学习的预测模型来预测乳腺癌发生骨转移的概率。方法:从监测、流行病学和最终结果(The Surveillance, Epidemiology, and End Results,SEER)数据库中提取2010年—2015年诊断的乳腺癌患者数据,并通过最小绝对收敛和选择算子(least absolute shrinkage and selection operator,LASSO)回归、单因素和多因素logistic回归分析对变量进行筛选,纳入具有统计学意义的风险因素构建预测模型。本研究使用决策树、弹性网络、K最近邻、轻量级梯度提升机、logistic回归、神经网络、随机森林、支持向量机和极限梯度提升等9种机器学习算法,通过随机搜索和5倍交叉验证调整模型超参数,构建乳腺癌骨转移预测模型。利用受试者工作特征曲线(receiver operating characteristic,ROC)的曲线下面积(area under curve,AUC)、校准曲线和决策曲线对模型进行评价,得到最优模型,并基于最优模型分析变量的重要性。最后,应用最优模型建立预测乳腺癌骨转移风险的网络计算器。本队列研究严格遵循《加强流行病学中观察性研究报告质量》(Strengthening the Reporting of Observational Studies in Epidemiology,STROBE)指南中的各项条目。结果:本研究纳入10 106例乳腺癌患者,训练集7 073例患者,验证集3 033例患者,在这两个队列中,分别有4 494例(63.5%)和1 927例(63.5%)患者发生骨转移。种族、病理学分级、雌激素受体(estrogen receptor,ER)状态、孕激素受体(progesterone receptor,PR)状态、人表皮生长因子受体2(human epidermal growth factor receptor 2,HER2)状态、N分期、肺转移、放疗、化疗、手术是骨转移的独立预测因素。使用训练集和验证集对模型进行验证,综合ROC曲线的AUC、校准曲线和决策曲线等评价指标发现极限梯度提升算法优于其他机器学习算法。最后,本研究利用极限梯度提升算法构建预测乳腺癌骨转移的网络计算器,链接为https://bcbm.shinyapps.io/DynNomapp/。结论:本研究开发基于机器学习的预测模型,用于预测乳腺癌患者发生骨转移的概率,希望有助于临床医师作出更合理的治疗决策。

关键词: 乳腺癌, 骨转移, 预测模型, 机器学习, 网络计算器

Abstract:

Background and purpose: Breast cancer is a major global public health problem. Bone is the most common site of distant metastasis of breast cancer, accounting for about 70% of all metastatic cases. Bone metastasis of breast cancer can cause a series of complications, including severe pain, pathological fracture, hypercalcemia, spinal cord compression, etc., which bring great inconvenience to patients' physical activities and affect their quality of life. Metastatic recurrence is the leading cause of death in breast cancer patients. Therefore, there is an urgent need to build a diagnostic model of bone metastasis in breast cancer to identify patients with a high risk of bone metastasis. The aim of this study was to develop a predictive model based on machine learning to predict the probability of breast cancer developing bone metastasis. Methods: Data of breast cancer patients diagnosed between 2010 and 2015 were extracted from The Surveillance, Epidemiology, and End Results (SEER) database. The variables were screened by least absolute shrinkage and selection operator (LASSO) regression, univariate and multivariate logistic regression analysis, and statistically significant risk factors were included to build a prediction model. In this study, nine machine learning algorithms, including decision tree, elastic network, K-nearest neighbor, lightweight gradient elevator, logistic regression, neural network, random forest, support vector machine and limit gradient lifting, were used to adjust the model hyperparameters through random search and 5x cross-validation to build a breast cancer bone metastasis prediction model. The area under the receiver operating characteristic (ROC) curve, calibration curve and decision curve were used to evaluate the model, the optimal model was obtained, and the importance of variables was analyzed based on the optimal model. Finally, a network calculator for predicting the risk of bone metastasis of breast cancer was established using the optimal model. Results: The study included 10 106 patients with breast cancer, 7 073 patients in the training set, and 3 033 patients in the validation set. We found that 4 494 (63.5%) patients in the training set and 1 927 (63.5%) patients in the validation set developed bone metastases, respectively. Race, pathologic grade, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER2) status, N stage, lung metastasis, radiotherapy, chemotherapy and surgery were independent predictors of bone metastasis. The training set and verification set were used to verify the model, and the limit gradient lifting algorithm was superior to other machine learning algorithms by integrating the evaluation indexes such as the area under the ROC curve, calibration curve and decision curve. Finally, we used limit gradient algorithm to build network calculator for prediction of breast cancer bone metastases (https://bcbm.shinyapps.io/DynNomapp/). Conclusion: This study developed a predictive model based on machine learning to predict the probability of bone metastases in breast cancer patients, hoping to help clinicians make more rational treatment decisions.

Key words: Breast cancer, Bone metastasis, Prediction model, Machine learning, Network calculator

中图分类号: 

相关文章

[1] 伍雯, 张若昕, 翁俊勇, 马延磊, 蔡国响, 李心翔, 杨永志. 探索阳性淋巴结比率在ypⅢ期结直肠癌患者中的预后价值及预测模型的建立[J]. 中国癌症杂志, 2024, 34(9): 873-880.
[2] 徐睿, 王泽浩, 吴炅. 肿瘤相关中性粒细胞在乳腺癌发生、发展中的作用研究进展[J]. 中国癌症杂志, 2024, 34(9): 881-889.
[3] 曹晓珊, 杨蓓蓓, 丛斌斌, 刘红. 三阴性乳腺癌脑转移治疗的研究进展[J]. 中国癌症杂志, 2024, 34(8): 777-784.
[4] 张剑. 关于女性乳腺癌患者绝经状态判断两个关键问题的临床思考[J]. 中国癌症杂志, 2024, 34(7): 619-627.
[5] 姜丹, 宋国庆, 王晓丹. 乳腺癌中线粒体功能障碍与CPT1A/ERK信号转导通路共同调节乳腺癌恶性行为的机制研究[J]. 中国癌症杂志, 2024, 34(7): 650-658.
[6] 翁俊勇, 叶紫岚, 张若昕, 刘琪, 李心翔. 探究不良病理学特征数量对Ⅰ~Ⅲ期结直肠癌复发风险分层的指导作用:对9 875例病例的回顾性队例研究[J]. 中国癌症杂志, 2024, 34(6): 527-536.
[7] 董涧桥, 李坤艳, 李菁, 王斌, 王艳红, 贾红燕. SIRT3通过去乙酰化YME1L1诱导乳腺癌内分泌治疗耐药的作用机制研究[J]. 中国癌症杂志, 2024, 34(6): 537-547.
[8] 郝弦, 黄建军, 杨文秀, 刘晋廷, 张军红, 罗钰蓓, 李青, 王大红, 高玉炜, 谭福云, 薄莉, 郑羽, 王荣, 冯江龙, 李静, 赵春华, 豆晓伟. 乳腺癌原代细胞系为药物筛选和基础研究提供癌症新模型[J]. 中国癌症杂志, 2024, 34(6): 561-570.
[9] 蒋佻宴, 贾田颖, 张琴. 基于胸部增强CT影像组学模型用于胸腺瘤分类的研究[J]. 中国癌症杂志, 2024, 34(6): 581-589.
[10] 张若昕, 叶紫岚, 翁俊勇, 李心翔. 高龄与Ⅱ期结直肠癌患者预后不良的相关性研究[J]. 中国癌症杂志, 2024, 34(5): 485-492.
[11] 中国抗癌协会乳腺癌专业委员会. 中国早期乳腺癌卵巢功能抑制临床应用专家共识(2024年版)[J]. 中国癌症杂志, 2024, 34(3): 316-333.
[12] 张琪, 修秉虬, 吴炅. 2023年中国乳腺癌重要临床研究成果及最新进展[J]. 中国癌症杂志, 2024, 34(2): 135-142.
[13] 张思源, 江泽飞. 2023年改变晚期乳腺癌临床实践的重要研究成果及进展[J]. 中国癌症杂志, 2024, 34(2): 143-150.
[14] 王昭卜, 黎星, 于鑫淼, 金锋. 2023年改变早期乳腺癌临床实践的重要研究成果及进展[J]. 中国癌症杂志, 2024, 34(2): 151-160.
[15] 罗扬, 孙涛, 邵志敏, 崔久嵬, 潘跃银, 张清媛, 程颖, 李惠平, 杨燕, 叶长生, 于国华, 王京芬, 刘运江, 刘新兰, 周宇红, 柏玉举, 谷元廷, 王晓稼, 徐兵河, 宋礼华. AK-HER2与参照药治疗HER2阳性转移性乳腺癌患者的疗效、体内代谢特征、安全性和免疫原性比较:一项多中心、随机、双盲Ⅲ期等效性临床试验[J]. 中国癌症杂志, 2024, 34(2): 161-175.