NPJ_Digital_Medicine_2024(142-260)
共收录文章:358 篇PMID: 39242810 | DOI: 10.1038/s41746-024-01226-1 | 日期: 2024-09-07摘要: Distributed collaborative learning is a promising approach for building predictive models for privacy-sensitive biomedic
NPJ Digital Medicine 2024年文章汇总(2)
共收录文章:358 篇
142. MyThisYourThat for interpretable identification of systematic bias in federated learning for biomedical images.
MyThisYourThat 用于生物医学图像联合学习中系统偏差的可解释识别。
PMID: 39242810 | DOI: 10.1038/s41746-024-01226-1 | 日期: 2024-09-07
摘要: Distributed collaborative learning is a promising approach for building predictive models for privacy-sensitive biomedical images. Here, several data owners (clients) train a joint model without sharing their original data. However, concealed systematic biases can compromise model performance and fairness. This study presents MyThisYourThat (MyTH) approach, which adapts an interpretable prototypical part learning network to a distributed setting, enabling each client to visualize feature differences learned by others on their own image: comparing one client’s ‘This’ with others’ ‘That’. Our setting demonstrates four clients collaboratively training two diagnostic classifiers on a benchmark X-ray dataset. Without data bias, the global model reaches 74.14% balanced accuracy for cardiomegaly and 74.08% for pleural effusion. We show that with systematic visual bias in one client, the performance of global models drops to near-random. We demonstrate how differences between local and global prototypes reveal biases and allow their visualization on each client’s data without compromising privacy.
中文摘要: 分布式协作学习是为隐私敏感的生物医学图像构建预测模型的一种有前途的方法。在这里,多个数据所有者(客户)在不共享原始数据的情况下训练联合模型。然而,隐藏的系统偏差可能会损害模型的性能和公平性。这项研究提出了 MyThisYourThat (MyTH) 方法,该方法将可解释的原型零件学习网络适应分布式环境,使每个客户能够在自己的图像上可视化其他人学到的特征差异:将一个客户的“这个”与其他客户的“那个”进行比较。我们的设置演示了四个客户在基准 X 射线数据集上协作训练两个诊断分类器。在没有数据偏差的情况下,全局模型对心脏肥大的平衡准确率达到 74.14%,对胸腔积液的平衡准确率达到 74.08%。我们表明,由于一个客户存在系统性视觉偏差,全局模型的性能下降到近乎随机。我们展示了本地和全球原型之间的差异如何揭示偏见,并允许它们在不损害隐私的情况下对每个客户的数据进行可视化。
143. Integrating digital gait data with metabolomics and clinical data to predict outcomes in Parkinson’s disease.
将数字步态数据与代谢组学和临床数据相结合,以预测帕金森病的结果。
PMID: 39242660 | DOI: 10.1038/s41746-024-01236-z | 日期: 2024-09-06
摘要: Parkinson’s disease (PD) presents diverse symptoms and comorbidities, complicating its diagnosis and management. The primary objective of this cross-sectional, monocentric study was to assess digital gait sensor data’s utility for monitoring and diagnosis of motor and gait impairment in PD. As a secondary objective, for the more challenging tasks of detecting comorbidities, non-motor outcomes, and disease progression subgroups, we evaluated for the first time the integration of digital markers with metabolomics and clinical data. Using shoe-attached digital sensors, we collected gait measurements from 162 patients and 129 controls in a single visit. Machine learning models showed significant diagnostic power, with AUC scores of 83-92% for PD vs. control and up to 75% for motor severity classification. Integrating gait data with metabolomics and clinical data improved predictions for challenging-to-detect comorbidities such as hallucinations. Overall, this approach using digital biomarkers and multimodal data integration can assist in objective disease monitoring, diagnosis, and comorbidity detection.
中文摘要: 帕金森病 (PD) 具有多种症状和合并症,使其诊断和治疗变得复杂。这项横断面、单中心研究的主要目的是评估数字步态传感器数据在监测和诊断帕金森病运动和步态障碍方面的效用。作为次要目标,对于检测合并症、非运动结果和疾病进展亚组等更具挑战性的任务,我们首次评估了数字标记与代谢组学和临床数据的整合。使用鞋上的数字传感器,我们在单次访问中收集了 162 名患者和 129 名对照者的步态测量数据。机器学习模型显示出显着的诊断能力,与对照相比,PD 的 AUC 分数为 83-92%,运动严重程度分类的 AUC 分数高达 75%。将步态数据与代谢组学和临床数据相结合,可以改进对难以检测的合并症(例如幻觉)的预测。总体而言,这种使用数字生物标志物和多模式数据集成的方法可以帮助客观的疾病监测、诊断和合并症检测。
144. Derivation, external and clinical validation of a deep learning approach for detecting intracranial hypertension.
用于检测颅内高压的深度学习方法的推导、外部和临床验证。
PMID: 39237755 | DOI: 10.1038/s41746-024-01227-0 | 日期: 2024-09-05
摘要: Increased intracranial pressure (ICP) ≥15 mmHg is associated with adverse neurological outcomes, but needs invasive intracranial monitoring. Using the publicly available MIMIC-III Waveform Database (2000-2013) from Boston, we developed an artificial intelligence-derived biomarker for elevated ICP (aICP) for adult patients. aICP uses routinely collected extracranial waveform data as input, reducing the need for invasive monitoring. We externally validated aICP with an independent dataset from the Mount Sinai Hospital (2020-2022) in New York City. The AUROC, accuracy, sensitivity, and specificity on the external validation dataset were 0.80 (95% CI, 0.80-0.80), 73.8% (95% CI, 72.0-75.6%), 73.5% (95% CI 72.5-74.5%), and 73.0% (95% CI, 72.0-74.0%), respectively. We also present an exploratory analysis showing aICP predictions are associated with clinical phenotypes. A ten-percentile increment was associated with brain malignancy (OR = 1.68; 95% CI, 1.09-2.60), intracerebral hemorrhage (OR = 1.18; 95% CI, 1.07-1.32), and craniotomy (OR = 1.43; 95% CI, 1.12-1.84; P < 0.05 for all).
中文摘要: 颅内压(ICP)升高≥15mmHg与不良神经系统结果相关,但需要侵入性颅内监测。利用波士顿公开的 MIMIC-III 波形数据库(2000-2013),我们开发了一种人工智能衍生的成人患者 ICP (aICP) 升高的生物标志物。 aICP 使用常规收集的颅外波形数据作为输入,减少了侵入性监测的需要。我们使用纽约市西奈山医院 (2020-2022) 的独立数据集对 aICP 进行了外部验证。外部验证数据集的 AUROC、准确性、敏感性和特异性分别为 0.80 (95% CI, 0.80-0.80)、73.8% (95% CI, 72.0-75.6%)、73.5% (95% CI 72.5-74.5%) 和 73.0% (95% CI, 72.0-74.0%),分别。我们还提出了一项探索性分析,显示 aICP 预测与临床表型相关。百分之十的增量与脑恶性肿瘤(OR = 1.68;95% CI,1.09-2.60)、脑出血(OR = 1.18;95% CI,1.07-1.32)和开颅手术(OR = 1.43;95% CI, 1.12-1.84;P < 0.05)。
145. Artificial intelligence estimated electrocardiographic age as a recurrence predictor after atrial fibrillation catheter ablation.
人工智能估计心电图年龄作为房颤导管消融术后复发的预测因子。
PMID: 39237703 | DOI: 10.1038/s41746-024-01234-1 | 日期: 2024-09-05
摘要: The application of artificial intelligence (AI) algorithms to 12-lead electrocardiogram (ECG) provides promising age prediction models. We explored whether the gap between the pre-procedural AI-ECG age and chronological age can predict atrial fibrillation (AF) recurrence after catheter ablation. We validated a pre-trained residual network-based model for age prediction on four multinational datasets. Then we estimated AI-ECG age using a pre-procedural sinus rhythm ECG among individuals on anti-arrhythmic drugs who underwent de-novo AF catheter ablation from two independent AF ablation cohorts. We categorized the AI-ECG age gap based on the mean absolute error of the AI-ECG age gap obtained from four model validation datasets; aged-ECG (≥10 years) and normal ECG age (<10 years) groups. In the two AF ablation cohorts, aged-ECG was associated with a significantly increased risk of AF recurrence compared to the normal ECG age group. These associations were independent of chronological age or left atrial diameter. In summary, a pre-procedural AI-ECG age has a prognostic value for AF recurrence after catheter ablation.
中文摘要: 人工智能 (AI) 算法在 12 导联心电图 (ECG) 中的应用提供了有前景的年龄预测模型。我们探讨了术前 AI-ECG 年龄与实际年龄之间的差距是否可以预测导管消融后心房颤动 (AF) 复发。我们在四个跨国数据集上验证了基于预训练残差网络的年龄预测模型。然后,我们使用来自两个独立 AF 消融队列的接受抗心律失常药物的个体进行术前窦性心律心电图来估计 AI-ECG 年龄,这些个体接受了从头 AF 导管消融术。我们根据从四个模型验证数据集中获得的 AI-ECG 年龄差距的平均绝对误差对 AI-ECG 年龄差距进行分类;老年心电图(≥10岁)和正常心电图年龄(<10岁)组。在两个房颤消融队列中,与正常心电图年龄组相比,老年心电图与房颤复发风险显着增加相关。这些关联与实际年龄或左心房直径无关。总之,术前 AI-ECG 年龄对于导管消融后 AF 复发具有预后价值。
146. Regulatory considerations for developing remote measurement technologies for Alzheimer’s disease research.
开发用于阿尔茨海默病研究的远程测量技术的监管考虑因素。
PMID: 39232033 | DOI: 10.1038/s41746-024-01211-8 | 日期: 2024-09-04
摘要: The Remote Assessment of Disease and Relapse – Alzheimer’s Disease (RADAR-AD) consortium evaluated remote measurement technologies (RMTs) for assessing functional status in AD. The consortium engaged with the European Medicines Agency (EMA) to obtain feedback on identification of meaningful functional domains, selection of RMTs and clinical study design to assess the feasibility of using RMTs in AD clinical studies. We summarized the feedback and the lessons learned to guide future projects.
中文摘要: 疾病和复发远程评估 - 阿尔茨海默病 (RADAR-AD) 联盟评估了用于评估 AD 功能状态的远程测量技术 (RMT)。该联盟与欧洲药品管理局 (EMA) 合作,获取有关有意义的功能域识别、RMT 选择和临床研究设计的反馈,以评估在 AD 临床研究中使用 RMT 的可行性。我们总结了反馈和经验教训以指导未来的项目。
147. Development, deployment and scaling of operating room-ready artificial intelligence for real-time surgical decision support.
开发、部署和扩展手术室就绪的人工智能,以支持实时手术决策。
PMID: 39227660 | DOI: 10.1038/s41746-024-01225-2 | 日期: 2024-09-03
摘要: Deep learning for computer vision can be leveraged for interpreting surgical scenes and providing surgeons with real-time guidance to avoid complications. However, neither generalizability nor scalability of computer-vision-based surgical guidance systems have been demonstrated, especially to geographic locations that lack hardware and infrastructure necessary for real-time inference. We propose a new equipment-agnostic framework for real-time use in operating suites. Using laparoscopic cholecystectomy and semantic segmentation models for predicting safe/dangerous (“Go”/“No-Go”) zones of dissection as an example use case, this study aimed to develop and test the performance of a novel data pipeline linked to a web-platform that enables real-time deployment from any edge device. To test this infrastructure and demonstrate its scalability and generalizability, lightweight U-Net and SegFormer models were trained on annotated frames from a large and diverse multicenter dataset from 136 institutions, and then tested on a separate prospectively collected dataset. A web-platform was created to enable real-time inference on any surgical video stream, and performance was tested on and optimized for a range of network speeds. The U-Net and SegFormer models respectively achieved mean Dice scores of 57% and 60%, precision 45% and 53%, and recall 82% and 75% for predicting the Go zone, and mean Dice scores of 76% and 76%, precision 68% and 68%, and recall 92% and 92% for predicting the No-Go zone. After optimization of the client-server interaction over the network, we deliver a prediction stream of at least 60 fps and with a maximum round-trip delay of 70 ms for speeds above 8 Mbps. Clinical deployment of machine learning models for surgical guidance is feasible and cost-effective using a generalizable, scalable and equipment-agnostic framework that lacks dependency on hardware with high computing performance or ultra-fast internet connection speed.
中文摘要: 计算机视觉的深度学习可用于解释手术场景并为外科医生提供实时指导以避免并发症。然而,基于计算机视觉的手术引导系统的通用性和可扩展性都尚未得到证明,特别是对于缺乏实时推理所需的硬件和基础设施的地理位置。我们提出了一种新的与设备无关的框架,用于在操作套件中实时使用。使用腹腔镜胆囊切除术和语义分割模型来预测安全/危险(“Go”/“No-Go”)解剖区域作为示例用例,本研究旨在开发和测试连接到网络平台的新型数据管道的性能,该管道可以从任何边缘设备进行实时部署。为了测试该基础设施并证明其可扩展性和通用性,轻量级 U-Net 和 SegFormer 模型在来自 136 个机构的大型多样化多中心数据集的带注释框架上进行了训练,然后在单独的前瞻性收集的数据集上进行了测试。创建了一个网络平台,可以对任何手术视频流进行实时推理,并对各种网络速度进行性能测试和优化。 U-Net 和 SegFormer 模型在预测 Go 区域时分别实现了平均 Dice 分数 57% 和 60%、准确率 45% 和 53%、召回率 82% 和 75%,在预测 No-Go 区域时分别实现了平均 Dice 分数 76% 和 76%、准确率 68% 和 68%、召回率 92% 和 92%。在优化网络上的客户端-服务器交互后,我们提供了至少 60 fps 的预测流,并且在速度高于 8 Mbps 时最大往返延迟为 70 毫秒。使用可通用、可扩展且与设备无关的框架,在临床上部署用于手术指导的机器学习模型是可行且经济高效的,该框架不依赖于具有高计算性能或超快互联网连接速度的硬件。
148. Mapping the regulatory landscape for artificial intelligence in health within the European Union.
绘制欧盟内部人工智能在健康领域的监管格局。
PMID: 39191937 | DOI: 10.1038/s41746-024-01221-6 | 日期: 2024-08-27
摘要: Regulatory frameworks for artificial intelligence (AI) are needed to mitigate risks while ensuring the ethical, secure, and effective implementation of AI technology in healthcare and population health. In this article, we present a synthesis of 141 binding policies applicable to AI in healthcare and population health in the EU and 10 European countries. The EU AI Act sets the overall regulatory framework for AI, while other legislations set social, health, and human rights standards, address the safety of technologies and the implementation of innovation, and ensure the protection and safe use of data. Regulation specifically pertaining to AI is still nascent and scarce, though a combination of data, technology, innovation, and health and human rights policy has already formed a baseline regulatory framework for AI in health. Future work should explore specific regulatory challenges, especially with respect to AI medical devices, data protection, and data enablement.
中文摘要: 人工智能 (AI) 的监管框架需要降低风险,同时确保人工智能技术在医疗保健和人口健康领域的道德、安全和有效实施。在本文中,我们综合了 141 项适用于欧盟和 10 个欧洲国家的医疗保健和人口健康领域人工智能的约束性政策。欧盟人工智能法案制定了人工智能的总体监管框架,而其他立法则制定了社会、健康和人权标准,解决了技术安全和创新实施的问题,并确保数据的保护和安全使用。尽管数据、技术、创新以及健康和人权政策的结合已经形成了健康领域人工智能的基线监管框架,但专门与人工智能相关的监管仍然处于萌芽状态且稀缺。未来的工作应该探索具体的监管挑战,特别是在人工智能医疗设备、数据保护和数据支持方面。
149. A trust based framework for the envelopment of medical AI.
一个基于信任的医疗人工智能框架。
PMID: 39191927 | DOI: 10.1038/s41746-024-01224-3 | 日期: 2024-08-27
摘要: The importance of a trust-based relationship between patients and medical professionals has been recognized as one of the most important predictors of treatment success and patients’ satisfaction. We have developed a novel legal, social and regulatory envelopment of medical AI that is explicitly based on the preservation of trust between patients and medical professionals. We require that the envelopment fosters reliance on the medical AI by both patients and medical professionals. Focusing on this triangle of desirable attitudes allows us to develop eight envelopment components that will support, strengthen and preserve these attitudes. We then demonstrate how each envelopment component can be enacted during different stages of the systems development life cycle and demonstrate that this requires the involvement of medical professionals and patients at the earliest stages of the life cycle. Therefore, this framework requires medical AI start-ups to cooperate with medical professionals and patients throughout.
中文摘要: 患者和医疗专业人员之间基于信任的关系的重要性已被认为是治疗成功和患者满意度的最重要预测因素之一。我们为医疗人工智能制定了一种新颖的法律、社会和监管框架,该框架明确基于维护患者和医疗专业人员之间的信任。我们要求这种包围能够促进患者和医疗专业人员对医疗人工智能的依赖。关注这个理想态度的三角关系使我们能够开发八个包围组件来支持、加强和维护这些态度。然后,我们演示如何在系统开发生命周期的不同阶段制定每个包围组件,并证明这需要医疗专业人员和患者在生命周期的最早阶段参与。因此,这个框架需要医疗AI初创企业与医疗专业人员和患者全程合作。
150. Personalized dose selection for the first Waldenström macroglobulinemia patient on the PRECISE CURATE.AI trial.
PRECISE CURATE.AI 试验中首例华氏巨球蛋白血症患者的个性化剂量选择。
PMID: 39191913 | DOI: 10.1038/s41746-024-01195-5 | 日期: 2024-08-27
摘要: The digital revolution in healthcare, amplified by the COVID-19 pandemic and artificial intelligence (AI) advances, has led to a surge in the development of digital technologies. However, integrating digital health solutions, especially AI-based ones, in rare diseases like Waldenström macroglobulinemia (WM) remains challenging due to limited data, among other factors. CURATE.AI, a clinical decision support system, offers an alternative to big data approaches by calibrating individual treatment profiles based on that individual’s data alone. We present a case study from the PRECISE CURATE.AI trial with a WM patient, where, over two years, CURATE.AI provided dynamic Ibrutinib dose recommendations to clinicians (users) aimed at achieving optimal IgM levels. An 80-year-old male with newly diagnosed WM requiring treatment due to anemia was recruited to the trial for CURATE.AI-based dosing of the Bruton tyrosine kinase inhibitor Ibrutinib. The primary and secondary outcome measures were focused on scientific and logistical feasibility. Preliminary results underscore the platform’s potential in enhancing user and patient engagement, in addition to clinical efficacy. Based on a two-year-long patient enrollment into the CURATE.AI-augmented treatment, this study showcases how AI-enabled tools can support the management of rare diseases, emphasizing the integration of AI to enhance personalized therapy.
中文摘要: COVID-19 大流行和人工智能 (AI) 的进步加剧了医疗保健领域的数字革命,导致数字技术的发展激增。然而,由于数据有限等因素,将数字健康解决方案(尤其是基于人工智能的解决方案)整合到华氏巨球蛋白血症(WM)等罕见疾病中仍然具有挑战性。 CURATE.AI 是一种临床决策支持系统,通过仅根据个人数据校准个人治疗概况,提供大数据方法的替代方案。我们介绍了一项针对 WM 患者的 PRECISE CURATE.AI 试验的案例研究,在两年多的时间里,CURATE.AI 向临床医生(用户)提供了动态依鲁替尼剂量建议,旨在实现最佳 IgM 水平。一名 80 岁男性新诊断出 WM,因贫血需要治疗,被招募参加基于 CURATE.AI 的布鲁顿酪氨酸激酶抑制剂依鲁替尼给药试验。主要和次要结果指标侧重于科学和后勤可行性。初步结果强调了该平台除了临床疗效外,还具有增强用户和患者参与度的潜力。这项研究以为期两年的 CURATE.AI 增强治疗患者入组为基础,展示了人工智能工具如何支持罕见疾病的管理,强调人工智能的整合以增强个性化治疗。
151. Digital twins in dermatology, current status, and the road ahead.
皮肤病学中的数字孪生、现状和未来之路。
PMID: 39187568 | DOI: 10.1038/s41746-024-01220-7 | 日期: 2024-08-26
摘要: Digital twins, innovative virtual models synthesizing real-time biological, environmental, and lifestyle data, herald a new era in personalized medicine, particularly dermatology. These models, integrating medical-purpose Internet of Things (IoT) devices, deep and digital phenotyping, and advanced artificial intelligence (AI), offer unprecedented precision in simulating real-world physical conditions and health outcomes. Originating in aerospace and manufacturing for system behavior prediction, their application in healthcare signifies a paradigm shift towards patient-specific care pathways. In dermatology, digital twins promise enhanced diagnostic accuracy, optimized treatment plans, and improved patient monitoring by accommodating the unique complexities of skin conditions. However, a comprehensive review across PubMed, Embase, Web of Science, Cochrane, and Scopus until February 5th, 2024, underscores a significant research gap; no direct studies on digital twins’ application in dermatology is identified. This gap signals challenges, including the intricate nature of skin diseases, ethical and privacy concerns, and the necessity for specialized algorithms. Overcoming these barriers through interdisciplinary efforts and focused research is essential for realizing digital twins’ potential in dermatology. This study advocates for a proactive exploration of digital twins, emphasizing the need for a tailored approach to dermatological care that is as personalized as the patients themselves.
中文摘要: 数字孪生是综合实时生物、环境和生活方式数据的创新虚拟模型,预示着个性化医疗(尤其是皮肤病学)的新时代。这些模型集成了医疗用途的物联网 (IoT) 设备、深度数字表型分析以及先进的人工智能 (AI),在模拟现实世界的身体状况和健康结果方面提供了前所未有的精度。它们起源于航空航天和制造领域的系统行为预测,在医疗保健中的应用标志着向患者特定护理途径的范式转变。在皮肤病学中,数字孪生有望通过适应皮肤状况的独特复杂性来提高诊断准确性、优化治疗计划并改善患者监测。然而,截至 2024 年 2 月 5 日,PubMed、Embase、Web of Science、Cochrane 和 Scopus 的全面综述强调了重大的研究差距;尚未发现有关数字孪生在皮肤病学中应用的直接研究。这一差距标志着挑战,包括皮肤病的复杂性、道德和隐私问题以及专门算法的必要性。通过跨学科努力和重点研究克服这些障碍对于实现数字孪生在皮肤病学中的潜力至关重要。这项研究提倡积极探索数字孪生,强调需要采用与患者本身一样个性化的定制皮肤病护理方法。
152. A scoping review of large language model based approaches for information extraction from radiology reports.
对基于大语言模型的放射学报告信息提取方法的范围审查。
PMID: 39182008 | DOI: 10.1038/s41746-024-01219-0 | 日期: 2024-08-24
摘要: Radiological imaging is a globally prevalent diagnostic method, yet the free text contained in radiology reports is not frequently used for secondary purposes. Natural Language Processing can provide structured data retrieved from these reports. This paper provides a summary of the current state of research on Large Language Model (LLM) based approaches for information extraction (IE) from radiology reports. We conduct a scoping review that follows the PRISMA-ScR guideline. Queries of five databases were conducted on August 1st 2023. Among the 34 studies that met inclusion criteria, only pre-transformer and encoder-based models are described. External validation shows a general performance decrease, although LLMs might improve generalizability of IE approaches. Reports related to CT and MRI examinations, as well as thoracic reports, prevail. Most common challenges reported are missing validation on external data and augmentation of the described methods. Different reporting granularities affect the comparability and transparency of approaches.
中文摘要: 放射成像是一种全球流行的诊断方法,但放射学报告中包含的自由文本并不经常用于次要目的。自然语言处理可以提供从这些报告中检索到的结构化数据。本文总结了基于大语言模型 (LLM) 的放射学报告信息提取 (IE) 方法的研究现状。我们按照 PRISMA-ScR 指南进行范围界定审查。 2023年8月1日对五个数据库进行了查询。在符合纳入标准的34项研究中,仅描述了基于预变压器和编码器的模型。外部验证显示总体性能下降,尽管法学硕士可能会提高 IE 方法的通用性。以CT、MRI检查相关报告以及胸部报告为准。报告的最常见挑战是缺少对外部数据的验证和所描述方法的增强。不同的报告粒度会影响方法的可比性和透明度。
153. Multimodal fusion learning for long QT syndrome pathogenic genotypes in a racially diverse population.
针对不同种族人群中长 QT 综合征致病基因型的多模式融合学习。
PMID: 39181999 | DOI: 10.1038/s41746-024-01218-1 | 日期: 2024-08-24
摘要: Congenital long QT syndrome (LQTS) diagnosis is complicated by limited genetic testing at scale, low prevalence, and normal QT corrected interval in patients with high-risk genotypes. We developed a deep learning approach combining electrocardiogram (ECG) waveform and electronic health record data to assess whether patients had pathogenic variants causing LQTS. We defined patients with high-risk genotypes as having ≥1 pathogenic variant in one of the LQTS-susceptibility genes. We trained the model using data from United Kingdom Biobank (UKBB) and then fine-tuned in a racially/ethnically diverse cohort using Mount Sinai BioMe Biobank. Following group-stratified 5-fold splitting, the fine-tuned model achieved area under the precision-recall curve of 0.29 (95% confidence interval [CI] 0.28-0.29) and area under the receiver operating curve of 0.83 (0.82-0.83) on independent testing data from BioMe. Multimodal fusion learning has promise to identify individuals with pathogenic genetic mutations to enable patient prioritization for further work up.
中文摘要: 先天性长 QT 综合征 (LQTS) 的诊断因高危基因型患者的大规模基因检测有限、患病率低且 QT 校正间期正常而变得复杂。我们开发了一种深度学习方法,结合心电图 (ECG) 波形和电子健康记录数据来评估患者是否患有导致 LQTS 的致病变异。我们将高危基因型患者定义为 LQTS 易感基因之一具有≥1 个致病性变异。我们使用英国生物银行 (UKBB) 的数据训练模型,然后使用西奈山 BioMe 生物银行在种族/民族多样化的队列中进行微调。经过组分层 5 倍分割后,根据 BioMe 的独立测试数据,微调模型的精确回忆曲线下面积为 0.29(95% 置信区间 [CI] 0.28-0.29),接收者操作曲线下面积为 0.83(0.82-0.83)。多模式融合学习有望识别出具有致病性基因突变的个体,以便对患者进行优先排序以进行进一步的检查。
154. Deep learning-based prediction of Clostridioides difficile infection caused by antibiotics using longitudinal electronic health records.
使用纵向电子健康记录对抗生素引起的艰难梭菌感染进行深度学习预测。
PMID: 39181992 | DOI: 10.1038/s41746-024-01215-4 | 日期: 2024-08-24
摘要: Clostridioides difficile infection (CDI) is a major cause of antibiotic-associated diarrhea and colitis. It is recognized as one of the most significant hospital-acquired infections. Although CDI can develop severe complications and spores of Clostridioides difficile can be transmitted by the fecal-oral route, CDI is occasionally overlooked in clinical settings. Thus, it is necessary to monitor high CDI risk groups, particularly those undergoing antibiotic treatment, to prevent complications and spread. We developed and validated a deep learning-based model to predict the occurrence of CDI within 28 days after starting antibiotic treatment using longitudinal electronic health records. For each patient, timelines of vital signs and laboratory tests with a 35-day monitoring period and a patient information vector consisting of age, sex, comorbidities, and medications were constructed. Our model achieved the prediction performance with an area under the receiver operating characteristic curve of 0.952 (95% CI: 0.932-0.973) in internal validation and 0.972 (95% CI: 0.968-0.975) in external validation. Platelet count and body temperature emerged as the most important features. The risk score, the output value of the model, exhibited a consistent increase in the CDI group, while the risk score in the non-CDI group either maintained its initial value or decreased. Using our CDI prediction model, high-risk patients requiring symptom monitoring can be identified. This could help reduce the underdiagnosis of CDI, thereby decreasing transmission and preventing complications.
中文摘要: 艰难梭菌感染(CDI)是抗生素相关性腹泻和结肠炎的主要原因。它被认为是最重要的医院获得性感染之一。尽管 CDI 可能会出现严重的并发症,并且艰难梭菌孢子可以通过粪口途径传播,但 CDI 在临床环境中偶尔会被忽视。因此,有必要监测 CDI 高危人群,特别是接受抗生素治疗的人群,以防止并发症和扩散。我们开发并验证了一种基于深度学习的模型,利用纵向电子健康记录来预测开始抗生素治疗后 28 天内 CDI 的发生。对于每位患者,构建了 35 天监测期的生命体征和实验室检查时间表以及由年龄、性别、合并症和药物组成的患者信息向量。我们的模型实现了预测性能,内部验证中受试者工作特征曲线下面积为 0.952(95% CI:0.932-0.973),外部验证中受试者工作特征曲线下面积为 0.972(95% CI:0.968-0.975)。血小板计数和体温成为最重要的特征。风险评分(模型的输出值)在 CDI 组中表现出持续增加,而非 CDI 组的风险评分要么保持其初始值,要么下降。使用我们的 CDI 预测模型,可以识别需要症状监测的高危患者。这有助于减少 CDI 的漏诊,从而减少传播并预防并发症。
155. Cost-effectiveness of incorporating self-imaging optical coherence tomography into fundus photography-based diabetic retinopathy screening.
将自成像光学相干断层扫描纳入基于眼底照相的糖尿病视网膜病变筛查的成本效益。
PMID: 39181938 | DOI: 10.1038/s41746-024-01222-5 | 日期: 2024-08-24
摘要: Diabetic macular edema (DME) has emerged as the foremost cause of vision loss in the population with diabetes. Early detection of DME is paramount, yet the prevailing screening, relying on two-dimensional and labor-intensive fundus photography (FP), results in frequent unwarranted referrals and overlooked diagnoses. Self-imaging optical coherence tomography (SI-OCT), offering fully automated, three-dimensional macular imaging, holds the potential to enhance DR screening. We conducted an observational study within a cohort of 1822 participants with diabetes, who received comprehensive assessments, including visual acuity testing, FP, and SI-OCT examinations. We compared the performance of three screening strategies: the conventional FP-based strategy, a combination strategy of FP and SI-OCT, and a simulated combination strategy of FP and manual SD-OCT. Additionally, we undertook a cost-effectiveness analysis utilizing Markov models to evaluate the costs and benefits of the three strategies for referable DR. We found that the FP + SI-OCT strategy demonstrated superior sensitivity (87.69% vs 61.53%) and specificity (98.29% vs 92.47%) in detecting DME when compared to the FP-based strategy. Importantly, the FP + SI-OCT strategy outperformed the FP-based strategy, with an incremental cost-effectiveness ratio (ICER) of $8016 per quality-adjusted life year (QALY), while the FP + SD-OCT strategy was less cost-effective, with an ICER of $45,754/QALY. Our results were robust to extensive sensitivity analyses, with the FP + SI-OCT strategy standing as the dominant choice in 69.36% of simulations conducted at the current willingness-to-pay threshold. In summary, incorporating SI-OCT into FP-based screening offers substantial enhancements in sensitivity, specificity for detecting DME, and most notably, cost-effectiveness for DR screening.
中文摘要: 糖尿病性黄斑水肿(DME)已成为糖尿病患者视力丧失的首要原因。 DME 的早期发现至关重要,但目前的筛查依赖于二维且劳动密集型的眼底摄影 (FP),导致经常出现无根据的转诊和被忽视的诊断。自成像光学相干断层扫描 (SI-OCT) 提供全自动三维黄斑成像,具有增强 DR 筛查的潜力。我们对 1822 名糖尿病患者进行了一项观察性研究,他们接受了综合评估,包括视力测试、FP 和 SI-OCT 检查。我们比较了三种筛查策略的性能:传统的基于 FP 的策略、FP 和 SI-OCT 的组合策略以及 FP 和手动 SD-OCT 的模拟组合策略。此外,我们利用马尔可夫模型进行了成本效益分析,以评估三种可参考灾难恢复策略的成本和收益。我们发现,与基于 FP 的策略相比,FP + SI-OCT 策略在检测 DME 方面表现出优异的灵敏度(87.69% vs 61.53%)和特异性(98.29% vs 92.47%)。重要的是,FP + SI-OCT 策略优于基于 FP 的策略,每个质量调整生命年 (QALY) 的增量成本效益比 (ICER) 为 8016 美元,而 FP + SD-OCT 策略的成本效益较低,ICER 为 45,754 美元/QALY。我们的结果对广泛的敏感性分析是稳健的,在当前支付意愿阈值下进行的 69.36% 的模拟中,FP + SI-OCT 策略是主要选择。总之,将 SI-OCT 纳入基于 FP 的筛查可显着提高检测 DME 的灵敏度和特异性,最值得注意的是 DR 筛查的成本效益。
156. A portable and efficient dementia screening tool using eye tracking machine learning and virtual reality.
一种使用眼动追踪机器学习和虚拟现实的便携式高效痴呆症筛查工具。
PMID: 39174736 | DOI: 10.1038/s41746-024-01206-5 | 日期: 2024-08-22
摘要: Dementia represents a significant global health challenge, with early screening during the preclinical stage being crucial for effective management. Traditional diagnostic biomarkers for Alzheimer’s Disease, the most common form of dementia, are limited by cost and invasiveness. Mild cognitive impairment (MCI), a precursor to dementia, is currently identified through neuropsychological tests like the Montreal Cognitive Assessment (MoCA), which are not suitable for large-scale screening. Eye-tracking technology, capturing and quantifying eye movements related to cognitive behavior, has emerged as a promising tool for cognitive assessment. Subtle changes in eye movements could serve as early indicators of MCI. However, the interpretation of eye-tracking data is challenging. This study introduced a dementia screening tool, VR Eye-tracking Cognitive Assessment (VECA), using eye-tracking technology, machine learning, and virtual reality (VR) to offer a non-invasive, efficient alternative capable of large-scale deployment. VECA was conducted with 201 participants from Shenzhen Baoan Chronic Hospital, utilizing eye-tracking data captured via VR headsets to predict MoCA scores and classify cognitive impairment across different educational backgrounds. The support vector regression model employed demonstrated a high correlation (0.9) with MoCA scores, significantly outperforming baseline models. Furthermore, it established optimal cut-off scores for identifying cognitive impairment with notable sensitivity (88.5%) and specificity (83%). This study underscores VECA’s potential as a portable, efficient tool for early dementia screening, highlighting the benefits of integrating eye-tracking technology, machine learning, and VR in cognitive health assessments.
中文摘要: 痴呆症是一项重大的全球健康挑战,临床前阶段的早期筛查对于有效管理至关重要。阿尔茨海默病(最常见的痴呆症)的传统诊断生物标志物受到成本和侵入性的限制。轻度认知障碍(MCI)是痴呆症的前兆,目前是通过蒙特利尔认知评估(MoCA)等神经心理学测试来识别的,但这些测试不适合大规模筛查。眼球追踪技术捕捉并量化与认知行为相关的眼球运动,已成为一种有前途的认知评估工具。眼球运动的细微变化可以作为 MCI 的早期指标。然而,眼球追踪数据的解释具有挑战性。这项研究引入了一种痴呆症筛查工具——VR眼动追踪认知评估(VECA),利用眼动追踪技术、机器学习和虚拟现实(VR)来提供一种能够大规模部署的非侵入性、高效的替代方案。 VECA 对来自深圳宝安慢性病医院的 201 名参与者进行了研究,利用 VR 耳机捕获的眼动追踪数据来预测 MoCA 分数并对不同教育背景的认知障碍进行分类。采用的支持向量回归模型表现出与 MoCA 分数的高度相关性 (0.9),显着优于基线模型。此外,它还建立了识别认知障碍的最佳截止分数,具有显着的敏感性(88.5%)和特异性(83%)。这项研究强调了 VECA 作为一种便携式、高效的早期痴呆症筛查工具的潜力,强调了将眼动追踪技术、机器学习和 VR 整合到认知健康评估中的好处。
157. Digital biomarkers for precision diagnosis and monitoring in Parkinson’s disease.
用于帕金森病精确诊断和监测的数字生物标志物。
PMID: 39169258 | DOI: 10.1038/s41746-024-01217-2 | 日期: 2024-08-21
摘要: Parkinson’s disease (PD) is a multifactorial neurodegenerative disorder with high prevalence among the elderly, primarily manifested by progressive decline in motor function. The aging global demographic and increased life expectancy have led to a rapid surge in PD cases, imposing a significant societal burden. PD along with other neurodegenerative diseases has garnered increasing attention from the scientific community. In PD, motor symptoms are recognized when approximately 60% of dopaminergic neurons have been damaged. The irreversible feature of PD and benefits of early intervention underscore the importance of disease onset prediction and prompt diagnosis. The advent of digital health technology in recent years has elevated the role of digital biomarkers in precisely and sensitively detecting early PD clinical symptoms, evaluating treatment effectiveness, and guiding clinical medication, focusing especially on motor function, responsiveness and sleep quality assessments. This review examines prevalent digital biomarkers for PD and highlights the latest advancements.
中文摘要: 帕金森病(PD)是一种多因素神经退行性疾病,在老年人中发病率较高,主要表现为运动功能进行性衰退。全球人口老龄化和预期寿命的延长导致帕金森病病例迅速激增,造成了巨大的社会负担。 PD 与其他神经退行性疾病一起引起了科学界越来越多的关注。在 PD 中,当大约 60% 的多巴胺能神经元受损时,就会出现运动症状。 PD 的不可逆特征和早期干预的益处强调了疾病发作预测和及时诊断的重要性。近年来数字健康技术的出现,提升了数字生物标志物在精准灵敏检测帕金森病早期临床症状、评估治疗效果、指导临床用药等方面的作用,尤其关注运动功能、反应能力和睡眠质量评估。这篇综述研究了流行的帕金森病数字生物标志物,并重点介绍了最新进展。
158. FDA launches health care at home initiative to drive equity in digital medical care.
FDA 发起家庭医疗保健计划,以推动数字医疗保健的公平性。
PMID: 39169161 | DOI: 10.1038/s41746-024-01198-2 | 日期: 2024-08-21
摘要: A highly ambitious FDA initiative will explore, through a hub and ideas lab, how equitable healthcare at home can be delivered, recognizing that this is unlikely to come about without intervention. Market forces, as shaped by current regulations, are leading to digital health tools developed and operating in islands rather than enabling integrated digital care. Can the initiative, which adopts system-level regulatory thinking, solve this issue?
中文摘要: FDA 的一项雄心勃勃的举措将通过中心和创意实验室探索如何提供公平的家庭医疗保健,并认识到如果没有干预,这不可能实现。由现行法规决定的市场力量正在导致数字医疗工具在岛屿上开发和运行,而不是实现综合数字护理。 《倡议》采用制度层面的监管思维能否解决这个问题?
159. Closing the gap: addressing telehealth disparities across specialties in the sustained pandemic era.
缩小差距:解决持续大流行时代各专业之间的远程医疗差异。
PMID: 39164391 | DOI: 10.1038/s41746-024-01201-w | 日期: 2024-08-21
摘要: Missed appointments, or no-shows, disrupt healthcare delivery, exacerbating chronic disease management and leading to worse health outcomes. Telehealth has surged as a viable solution to reduce no-shows and improve healthcare accessibility, especially during the COVID-19 pandemic. However, telehealth disparities and its long-term efficacy across various medical specialties remain understudied. To address this, we performed a retrospective analysis of electronic health records from a heterogenous network of hospitals in Illinois, examining telehealth use and no-shows across among 444,752 adult patients with 1,973,098 outpatient encounters across nine specialties during the sustained pandemic phase (i.e., January 1, 2021 to July 1, 2022). Among them, 84,290 (4.27%) were no-shows, and telehealth constituted 202,933 (10.3%) of the total encounters. Telehealth use during the sustained phase varied significantly by specialty type. Overall, telehealth encounters were associated with reduced no-show odds compared to in-person encounters (OR, 0.28; 95% CI, 0.26-0.29). Black and Hispanic patients, as well as those with Medicaid, had higher no-show odds relative to their counterparts, even when using telehealth. Mental health specialty had the highest telehealth usage rate and the highest no-show odds (OR, 2.99; 95% CI, 2.84-3.14) relative to other specialties included in the study. Moreover, specialty type had differential effects on no-shows for telehealth. These results underscore the variability in telehealth use by specialty type and pervasive disparities telehealth use and no-shows. As we move beyond the pandemic, our findings can inform policymakers to tailor policies and incentives to reach different patient groups as well as specialties, with varying needs, to promote equitable telehealth utilization.
中文摘要: 错过预约或缺席会扰乱医疗保健服务,加剧慢性病管理并导致更糟糕的健康结果。远程医疗已成为减少缺席和改善医疗保健可及性的可行解决方案,尤其是在 COVID-19 大流行期间。然而,远程医疗的差异及其跨不同医学专业的长期疗效仍未得到充分研究。为了解决这个问题,我们对伊利诺伊州异构医院网络的电子健康记录进行了回顾性分析,检查了在持续大流行阶段(即 2021 年 1 月 1 日至 2022 年 7 月 1 日)9 个专科的 444,752 名成年患者的远程医疗使用情况和缺席情况,涉及 9 个专科的 1,973,098 次门诊就诊。其中,84,290 人次(4.27%)未出现,远程医疗占总就诊次数的 202,933 人次(10.3%)。持续阶段远程医疗的使用因专业类型而异。总体而言,与面对面就诊相比,远程医疗就诊与缺席几率降低相关(OR,0.28;95% CI,0.26-0.29)。即使使用远程医疗,黑人和西班牙裔患者以及享受医疗补助的患者相对于其他患者的缺席几率也更高。相对于研究中包含的其他专业,心理健康专业的远程医疗使用率最高,缺席几率最高(OR,2.99;95% CI,2.84-3.14)。此外,专业类型对远程医疗缺席的影响也不同。这些结果强调了不同专业类型远程医疗使用的差异性以及远程医疗使用和缺席的普遍差异。当我们摆脱大流行的影响时,我们的研究结果可以帮助政策制定者制定政策和激励措施,以覆盖具有不同需求的不同患者群体和专业,以促进公平的远程医疗利用。
160. EU-US data transfers: an enduring challenge for health research collaborations.
欧盟-美国数据传输:健康研究合作的持久挑战。
PMID: 39152232 | DOI: 10.1038/s41746-024-01205-6 | 日期: 2024-08-16
摘要: EU-US data transfers for health research remain a particularly thorny issue in view of the stringent rules of the EU General Data Protection Regulation (GDPR) and the challenges related to US mass surveillance programs, particularly the manner in which US law enforcement and national security agencies can access personal data originating from the EU. Since the entry into force of the GDPR, evidence of impeded collaborations is increasing, particularly in the case of sharing data with US public institutions. The adoption of a new EU-US adequacy decision in July 2023 does not hold the promise for a long-lasting solution due to the risks of being challenged and invalidated - yet again - at the Court of Justice of the EU. As the research community is calling for answers, the new proposal for a European Health Data Space regulation may hold a key to solving some of the existing issues. In this paper, we critically discuss the current rules and outline a possible way forward for transfers between public bodies.
中文摘要: 鉴于欧盟通用数据保护条例 (GDPR) 的严格规则以及与美国大规模监控计划相关的挑战,特别是美国执法和国家安全机构访问源自欧盟的个人数据的方式,用于健康研究的欧盟-美国数据传输仍然是一个特别棘手的问题。自 GDPR 生效以来,合作受阻的证据越来越多,特别是在与美国公共机构共享数据的情况下。由于存在再次在欧盟法院受到质疑和无效的风险,2023 年 7 月通过的新的欧盟-美国充分性决定并不能带来长期解决方案的承诺。随着研究界寻求答案,欧洲健康数据空间监管的新提案可能是解决一些现有问题的关键。在本文中,我们批判性地讨论了现行规则,并概述了公共机构之间转移的可能的前进方向。
161. Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling.
使用基于 Transformer 的序列建模,利用纵向医学成像的力量进行眼病预后。
PMID: 39152209 | DOI: 10.1038/s41746-024-01207-4 | 日期: 2024-08-16
摘要: Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of developing a disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value.
中文摘要: 深度学习在医学成像自动诊断方面取得了突破,并在眼科领域取得了许多成功的应用。然而,标准医学图像分类方法仅评估采集时的疾病存在,忽略了纵向成像的常见临床设置。对于年龄相关性黄斑变性 (AMD) 和原发性开角型青光眼 (POAG) 等缓慢进展的眼部疾病,患者会随着时间的推移进行重复成像以跟踪疾病进展,并预测未来发生疾病的风险对于正确计划治疗至关重要。我们提出的用于生存分析的纵向变压器(LTSA)能够通过纵向医学成像进行动态疾病预测,根据在长时间不规则时间段内捕获的眼底摄影图像序列对疾病发生时间进行建模。使用年龄相关眼病研究 (AREDS) 和高眼压治疗研究 (OHTS) 的纵向成像数据,在晚期 AMD 预后的 19/20 头对头比较和 POAG 预后的 18/20 比较中,LTSA 显着优于单图像基线。时间注意力分析还表明,虽然最近的图像通常最具影响力,但先前的成像仍然提供额外的预后价值。
162. Dose-response relationship between computerized cognitive training and cognitive improvement.
计算机认知训练与认知改善之间的剂量-反应关系。
PMID: 39147783 | DOI: 10.1038/s41746-024-01210-9 | 日期: 2024-08-15
摘要: Although computerized cognitive training (CCT) is an effective digital intervention for cognitive impairment, its dose-response relationship is understudied. This retrospective cohort study explores the association between training dose and cognitive improvement to find the optimal CCT dose. From 2017 to 2022, 8,709 participants with subjective cognitive decline, mild cognitive impairment, and mild dementia were analyzed. CCT exposure varied in daily dose and frequency, with cognitive improvement measured weekly using Cognitive Index. A mixed-effects model revealed significant Cognitive Index increases across most dose groups before reaching the optimal dose. For participants under 60 years, the optimal dose was 25 to <30 min per day for 6 days a week. For those 60 years or older, it was 50 to <55 min per day for 6 days a week. These findings highlight a dose-dependent effect in CCT, suggesting age-specific optimal dosing for cognitive improvement.
中文摘要: 尽管计算机认知训练(CCT)是针对认知障碍的有效数字干预措施,但其剂量-反应关系尚未得到充分研究。这项回顾性队列研究探讨了训练剂量与认知改善之间的关联,以找到最佳 CCT 剂量。从2017年到2022年,对8,709名主观认知能力下降、轻度认知障碍和轻度痴呆的参与者进行了分析。 CCT 暴露的每日剂量和频率各不相同,每周使用认知指数测量认知改善情况。混合效应模型显示,在达到最佳剂量之前,大多数剂量组的认知指数显着增加。对于 60 岁以下的参与者,最佳剂量是每天 25 至 <30 分钟,每周 6 天。对于 60 岁或以上的人,每周 6 天,每天 50 至 <55 分钟。这些发现强调了 CCT 的剂量依赖性效应,表明针对认知改善的特定年龄的最佳剂量。
163. Robust automated calcification meshing for personalized cardiovascular biomechanics.
强大的自动钙化网格划分,可实现个性化心血管生物力学。
PMID: 39143242 | DOI: 10.1038/s41746-024-01202-9 | 日期: 2024-08-15
摘要: Calcification has significant influence over cardiovascular diseases and interventions. Detailed characterization of calcification is thus desired for predictive modeling, but calcium deposits on cardiovascular structures are still often manually reconstructed for physics-driven simulations. This poses a major bottleneck for large-scale adoption of computational simulations for research or clinical use. To address this, we propose an end-to-end automated image-to-mesh algorithm that enables robust incorporation of patient-specific calcification onto a given cardiovascular tissue mesh. The algorithm provides a substantial speed-up from several hours of manual meshing to ~1 min of automated computation, and it solves an important problem that cannot be addressed with recent template-based meshing techniques. We validated our final calcified tissue meshes with extensive simulations, demonstrating our ability to accurately model patient-specific aortic stenosis and Transcatheter Aortic Valve Replacement. Our method may serve as an important tool for accelerating the development and usage of personalized cardiovascular biomechanics.
中文摘要: 钙化对心血管疾病和干预措施具有重大影响。因此,预测模型需要钙化的详细表征,但心血管结构上的钙沉积物仍然经常手动重建以进行物理驱动的模拟。这对大规模采用计算模拟进行研究或临床使用构成了主要瓶颈。为了解决这个问题,我们提出了一种端到端自动图像到网格算法,该算法能够将患者特异性钙化稳健地结合到给定的心血管组织网格上。该算法将数小时的手动网格划分大幅加速到约 1 分钟的自动计算,并且解决了最新基于模板的网格划分技术无法解决的重要问题。我们通过广泛的模拟验证了最终的钙化组织网,证明了我们准确模拟患者特定主动脉瓣狭窄和经导管主动脉瓣置换术的能力。我们的方法可以作为加速个性化心血管生物力学的开发和使用的重要工具。
164. Lessons learned from an unsuccessful decentralized clinical trial in Oncology.
从肿瘤学不成功的分散临床试验中吸取的经验教训。
PMID: 39138304 | DOI: 10.1038/s41746-024-01214-5 | 日期: 2024-08-13
摘要: Decentralized clinical trials have gained in popularity over the last years due to their advantages related to broadening recruitment strategies and resource saving possibilities. As more clinical trials adopt decentralized strategies, it is essential to share the knowledge about both successful and unsuccessful efforts in the research community. In the present commentary, we explore potential reasons that led to early termination of a decentralized clinical trial in Oncology.
中文摘要: 分散式临床试验在过去几年中因其在扩大招募策略和节省资源可能性方面的优势而受到欢迎。随着越来越多的临床试验采用分散策略,有必要分享研究界成功和不成功的努力的知识。在本评论中,我们探讨了导致肿瘤学分散临床试验提前终止的潜在原因。
165. Core elements of national policy for digital health technology evidence and access.
数字医疗技术证据和获取国家政策的核心要素。
PMID: 39138261 | DOI: 10.1038/s41746-024-01209-2 | 日期: 2024-08-13
摘要: Digital health technologies (DHT) offer the ability to deliver personalized care, lower barriers to access, and positively impact health outcomes. However, DHT utilization is impacted by insufficient market access pathways. A policy “full-stack”—including regulatory authorization, product value assessment, pricing and reimbursement, and patient access infrastructure—offers a framework for DHT integration into national healthcare ecosystems. Consistent clinical evidence requirements across national jurisdictions will further increase DHT scalability.
中文摘要: 数字健康技术 (DHT) 能够提供个性化护理、降低获取障碍并对健康结果产生积极影响。然而,DHT 的利用受到市场准入途径不足的影响。政策“全栈”——包括监管授权、产品价值评估、定价和报销以及患者访问基础设施——为 DHT 融入国家医疗保健生态系统提供了框架。各国司法管辖区一致的临床证据要求将进一步提高 DHT 的可扩展性。
166. Biometrics of complete human pregnancy recorded by wearable devices.
可穿戴设备记录的完整人类妊娠的生物识别。
PMID: 39134787 | DOI: 10.1038/s41746-024-01183-9 | 日期: 2024-08-12
摘要: In the United States, normal-risk pregnancies are monitored with the recommended average of 14 prenatal visits. Check-ins every few weeks are the standard of care. This low time resolution and reliance on subjective feedback instead of direct physiological measurement, could be augmented by remote monitoring. To date, continuous physiological measurements have not been characterized across all of pregnancy, so there is little basis of comparison to support the development of the specific monitoring capabilities. Wearables have been shown to enable the detection and prediction of acute illness, often faster than subjective symptom reporting. Wearables have also been used for years to monitor chronic conditions, such as continuous glucose monitors. Here we perform a retrospective analysis on multimodal wearable device data (Oura Ring) generated across pregnancy within 120 individuals. These data reveal clear trajectories of pregnancy from cycling to conception through postpartum recovery. We assessed individuals in whom pregnancy did not progress past the first trimester, and found associated deviations, corroborating that continuous monitoring adds new information that could support decision-making even in the early stages of pregnancy. By contrast, we did not find significant deviations between full-term pregnancies of people younger than 35 and of people with “advanced maternal age”, suggesting that analysis of continuous data within individuals can augment risk assessment beyond standard population comparisons. Our findings demonstrate that low-cost, high-resolution monitoring at all stages of pregnancy in real-world settings is feasible and that many studies into specific demographics, risks, etc., could be carried out using this newer technology.
中文摘要: 在美国,建议平均进行 14 次产前检查来监测正常风险妊娠。每隔几周检查一次是护理的标准。这种低时间分辨率和对主观反馈而不是直接生理测量的依赖可以通过远程监控来增强。迄今为止,连续的生理测量尚未在整个怀孕期间进行表征,因此几乎没有比较基础来支持特定监测能力的发展。可穿戴设备已被证明能够检测和预测急性疾病,通常比主观症状报告更快。可穿戴设备多年来也被用于监测慢性病,例如连续血糖监测仪。在这里,我们对 120 名个体在怀孕期间生成的多模式可穿戴设备数据 (Oura Ring) 进行回顾性分析。这些数据揭示了从周期到受孕再到产后恢复的清晰妊娠轨迹。我们评估了妊娠早期未进展的个体,并发现了相关偏差,证实持续监测增加了新信息,即使在怀孕早期也可以支持决策。相比之下,我们没有发现 35 岁以下人群和“高龄产妇”的足月妊娠之间存在显着偏差,这表明对个体内连续数据的分析可以增强风险评估,超越标准人群比较。我们的研究结果表明,在现实环境中对怀孕各个阶段进行低成本、高分辨率监测是可行的,并且可以使用这种新技术进行针对特定人口统计、风险等的许多研究。
167. Navigating the European Union Artificial Intelligence Act for Healthcare.
浏览欧盟医疗保健人工智能法案。
PMID: 39134637 | DOI: 10.1038/s41746-024-01213-6 | 日期: 2024-08-12
摘要: The European Union’s recently adopted Artificial Intelligence (AI) Act is the first comprehensive legal framework specifically on AI. This is particularly important for the healthcare domain, as other existing harmonisation legislation, such as the Medical Device Regulation, do not explicitly cover medical AI applications. Given the far-reaching impact of this regulation on the medical AI sector, this commentary provides an overview of the key elements of the AI Act, with easy-to-follow references to the relevant chapters.
中文摘要: 欧盟最近通过的人工智能(AI)法案是第一个专门针对人工智能的综合法律框架。这对于医疗保健领域尤其重要,因为其他现有的协调立法(例如医疗器械法规)并未明确涵盖医疗人工智能应用。鉴于该法规对医疗人工智能领域的深远影响,本评论概述了人工智能法案的关键要素,并提供了易于理解的相关章节参考。
168. Disparities in clinical studies of AI enabled applications from a global perspective.
从全球角度来看,人工智能应用的临床研究存在差异。
PMID: 39127820 | DOI: 10.1038/s41746-024-01212-7 | 日期: 2024-08-10
摘要: Artificial intelligence (AI) has been extensively researched in medicine, but its practical application remains limited. Meanwhile, there are various disparities in existing AI-enabled clinical studies, which pose a challenge to global health equity. In this study, we conducted an in-depth analysis of the geo-economic distribution of 159 AI-enabled clinical studies, as well as the gender disparities among these studies. We aim to reveal these disparities from a global literature perspective, thus highlighting the need for equitable access to medical AI technologies.
中文摘要: 人工智能(AI)在医学领域已得到广泛研究,但其实际应用仍然有限。与此同时,现有的人工智能临床研究存在各种差异,这对全球健康公平构成了挑战。在这项研究中,我们对 159 项人工智能临床研究的地缘经济分布以及这些研究之间的性别差异进行了深入分析。我们的目标是从全球文献的角度揭示这些差异,从而强调公平获取医疗人工智能技术的必要性。
169. Responsible development of clinical speech AI: Bridging the gap between clinical research and technology.
临床语音人工智能的负责任发展:弥合临床研究与技术之间的差距。
PMID: 39122889 | DOI: 10.1038/s41746-024-01199-1 | 日期: 2024-08-09
摘要: This perspective article explores the challenges and potential of using speech as a biomarker in clinical settings, particularly when constrained by the small clinical datasets typically available in such contexts. We contend that by integrating insights from speech science and clinical research, we can reduce sample complexity in clinical speech AI models with the potential to decrease timelines to translation. Most existing models are based on high-dimensional feature representations trained with limited sample sizes and often do not leverage insights from speech science and clinical research. This approach can lead to overfitting, where the models perform exceptionally well on training data but fail to generalize to new, unseen data. Additionally, without incorporating theoretical knowledge, these models may lack interpretability and robustness, making them challenging to troubleshoot or improve post-deployment. We propose a framework for organizing health conditions based on their impact on speech and promote the use of speech analytics in diverse clinical contexts beyond cross-sectional classification. For high-stakes clinical use cases, we advocate for a focus on explainable and individually-validated measures and stress the importance of rigorous validation frameworks and ethical considerations for responsible deployment. Bridging the gap between AI research and clinical speech research presents new opportunities for more efficient translation of speech-based AI tools and advancement of scientific discoveries in this interdisciplinary space, particularly if limited to small or retrospective datasets.
中文摘要: 这篇透视文章探讨了在临床环境中使用语音作为生物标志物的挑战和潜力,特别是当受到此类背景下通常可用的小型临床数据集的限制时。我们认为,通过整合语音科学和临床研究的见解,我们可以降低临床语音 AI 模型中的样本复杂性,并有可能缩短翻译时间。大多数现有模型都基于用有限样本量训练的高维特征表示,并且通常不利用语音科学和临床研究的见解。这种方法可能会导致过度拟合,即模型在训练数据上表现得非常好,但无法推广到新的、看不见的数据。此外,如果不结合理论知识,这些模型可能缺乏可解释性和稳健性,从而难以排除故障或改进部署后的性能。我们提出了一个根据健康状况对言语的影响来组织健康状况的框架,并促进言语分析在跨部门分类之外的不同临床环境中的使用。对于高风险的临床用例,我们主张重点关注可解释和单独验证的措施,并强调严格的验证框架和负责任部署的道德考虑的重要性。弥合人工智能研究和临床语音研究之间的差距为更有效地翻译基于语音的人工智能工具和推进这一跨学科领域的科学发现提供了新的机会,特别是在仅限于小型或回顾性数据集的情况下。
170. Evaluating multimodal AI in medical diagnostics.
评估医疗诊断中的多模式人工智能。
PMID: 39112822 | DOI: 10.1038/s41746-024-01208-3 | 日期: 2024-08-07
摘要: This study evaluates multimodal AI models’ accuracy and responsiveness in answering NEJM Image Challenge questions, juxtaposed with human collective intelligence, underscoring AI’s potential and current limitations in clinical diagnostics. Anthropic’s Claude 3 family demonstrated the highest accuracy among the evaluated AI models, surpassing the average human accuracy, while collective human decision-making outperformed all AI models. GPT-4 Vision Preview exhibited selectivity, responding more to easier questions with smaller images and longer questions.
中文摘要: 这项研究评估了多模态人工智能模型在回答 NEJM 图像挑战问题时的准确性和响应能力,并与人类集体智慧并列,强调了人工智能在临床诊断中的潜力和当前局限性。 Anthropic 的 Claude 3 系列在评估的 AI 模型中表现出了最高的准确度,超过了人类的平均准确度,而人类集体决策的表现也优于所有 AI 模型。 GPT-4 Vision Preview 表现出选择性,可以用较小的图像和较长的问题更多地回答更简单的问题。
171. A deep learning system for myopia onset prediction and intervention effectiveness evaluation in children.
用于儿童近视发病预测和干预效果评估的深度学习系统。
PMID: 39112566 | DOI: 10.1038/s41746-024-01204-7 | 日期: 2024-08-07
摘要: The increasing prevalence of myopia worldwide presents a significant public health challenge. A key strategy to combat myopia is with early detection and prediction in children as such examination allows for effective intervention using readily accessible imaging technique. To this end, we introduced DeepMyopia, an artificial intelligence (AI)-enabled decision support system to detect and predict myopia onset and facilitate targeted interventions for children at risk using routine retinal fundus images. Based on deep learning architecture, DeepMyopia had been trained and internally validated on a large cohort of retinal fundus images (n = 1,638,315) and then externally tested on datasets from seven sites in China (n = 22,060). Our results demonstrated robustness of DeepMyopia, with AUCs of 0.908, 0.813, and 0.810 for 1-, 2-, and 3-year myopia onset prediction with the internal test set, and AUCs of 0.796, 0.808, and 0.767 with the external test set. DeepMyopia also effectively stratified children into low- and high-risk groups (p < 0.001) in both test sets. In an emulated randomized controlled trial (eRCT) on the Shanghai outdoor cohort (n = 3303) where DeepMyopia showed effectiveness in myopia prevention compared to NonCyc-based model, with an adjusted relative reduction (ARR) of -17.8%, 95% CI: -29.4%, -6.4%. DeepMyopia-assisted interventions attained quality-adjusted life years (QALYs) of 0.75 (95% CI: 0.53, 1.04) per person and avoided blindness years of 13.54 (95% CI: 9.57, 18.83) per 1 million persons compared to natural lifestyle with no active intervention. Our findings demonstrated DeepMyopia as a reliable and efficient AI-based decision support system for intervention guidance for children.
中文摘要: 全球近视患病率的不断上升对公共卫生提出了重大挑战。对抗近视的一个关键策略是对儿童进行早期发现和预测,因为此类检查可以使用易于获取的成像技术进行有效干预。为此,我们推出了 DeepMyopia,这是一种支持人工智能 (AI) 的决策支持系统,可使用常规视网膜眼底图像检测和预测近视发病情况,并促进对高危儿童进行有针对性的干预。 DeepMyopia 基于深度学习架构,在大量视网膜眼底图像 (n = 1,638,315) 上进行了训练和内部验证,然后在中国七个站点的数据集 (n = 22,060) 上进行了外部测试。我们的结果证明了 DeepMyopia 的稳健性,使用内部测试集进行 1 年、2 年和 3 年近视发病预测的 AUC 分别为 0.908、0.813 和 0.810,使用外部测试集进行预测的 AUC 分别为 0.796、0.808 和 0.767。 DeepMyopia 还在两个测试集中有效地将儿童分为低风险组和高风险组 (p<0.001)。在一项针对上海户外队列 (n = 3303) 的模拟随机对照试验 (eRCT) 中,与基于 NonCyc 的模型相比,DeepMyopia 在预防近视方面显示出有效性,调整后相对降低 (ARR) 为 -17.8%,95% CI:-29.4%、-6.4%。与没有积极干预的自然生活方式相比,深度近视辅助干预措施每 100 万人的质量调整生命年 (QALY) 为 0.75 (95% CI: 0.53, 1.04),避免失明年为 13.54 (95% CI: 9.57, 18.83)。我们的研究结果表明 DeepMyopia 是一种可靠且高效的基于人工智能的决策支持系统,用于儿童干预指导。
172. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review.
用于评估医学中值得信赖的人工智能的数据质量的 METRIC 框架:系统评价。
PMID: 39097662 | DOI: 10.1038/s41746-024-01196-4 | 日期: 2024-08-03
摘要: The adoption of machine learning (ML) and, more specifically, deep learning (DL) applications into all major areas of our lives is underway. The development of trustworthy AI is especially important in medicine due to the large implications for patients’ lives. While trustworthiness concerns various aspects including ethical, transparency and safety requirements, we focus on the importance of data quality (training/test) in DL. Since data quality dictates the behaviour of ML products, evaluating data quality will play a key part in the regulatory approval of medical ML products. We perform a systematic review following PRISMA guidelines using the databases Web of Science, PubMed and ACM Digital Library. We identify 5408 studies, out of which 120 records fulfil our eligibility criteria. From this literature, we synthesise the existing knowledge on data quality frameworks and combine it with the perspective of ML applications in medicine. As a result, we propose the METRIC-framework, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate the content of a dataset. This knowledge helps to reduce biases as a major source of unfairness, increase robustness, facilitate interpretability and thus lays the foundation for trustworthy AI in medicine. The METRIC-framework may serve as a base for systematically assessing training datasets, establishing reference datasets, and designing test datasets which has the potential to accelerate the approval of medical ML products.
中文摘要: 机器学习 (ML),更具体地说,深度学习 (DL) 应用程序正在进入我们生活的所有主要领域。由于对患者生活的重大影响,值得信赖的人工智能的发展在医学领域尤其重要。虽然可信度涉及道德、透明度和安全要求等各个方面,但我们重点关注深度学习中数据质量(训练/测试)的重要性。由于数据质量决定了机器学习产品的行为,因此评估数据质量将在医疗机器学习产品的监管审批中发挥关键作用。我们遵循 PRISMA 指南,使用数据库 Web of Science、PubMed 和 ACM Digital Library 进行系统评价。我们确定了 5408 项研究,其中 120 项记录符合我们的资格标准。从这些文献中,我们综合了有关数据质量框架的现有知识,并将其与机器学习在医学中应用的角度相结合。因此,我们提出了 METRIC 框架,这是一个专门用于医疗培训数据的数据质量框架,包含 15 个认知维度,医疗 ML 应用程序的开发人员应沿着该框架调查数据集的内容。这些知识有助于减少作为不公平主要来源的偏见,提高鲁棒性,促进可解释性,从而为医学领域值得信赖的人工智能奠定基础。 METRIC框架可以作为系统评估训练数据集、建立参考数据集和设计测试数据集的基础,这有可能加速医疗机器学习产品的批准。
173. Eye tracking insights into physician behaviour with safe and unsafe explainable AI recommendations.
眼动追踪通过安全和不安全的可解释人工智能建议来洞察医生行为。
PMID: 39095449 | DOI: 10.1038/s41746-024-01200-x | 日期: 2024-08-02
摘要: We studied clinical AI-supported decision-making as an example of a high-stakes setting in which explainable AI (XAI) has been proposed as useful (by theoretically providing physicians with context for the AI suggestion and thereby helping them to reject unsafe AI recommendations). Here, we used objective neurobehavioural measures (eye-tracking) to see how physicians respond to XAI with N = 19 ICU physicians in a hospital’s clinical simulation suite. Prescription decisions were made both pre- and post-reveal of either a safe or unsafe AI recommendation and four different types of simultaneously presented XAI. We used overt visual attention as a marker for where physician mental attention was directed during the simulations. Unsafe AI recommendations attracted significantly greater attention than safe AI recommendations. However, there was no appreciably higher level of attention placed onto any of the four types of explanation during unsafe AI scenarios (i.e. XAI did not appear to ‘rescue’ decision-makers). Furthermore, self-reported usefulness of explanations by physicians did not correlate with the level of attention they devoted to the explanations reinforcing the notion that using self-reports alone to evaluate XAI tools misses key aspects of the interaction behaviour between human and machine.
中文摘要: 我们研究了人工智能支持的临床决策,作为高风险环境的一个例子,其中可解释的人工智能(XAI)被认为是有用的(理论上为医生提供人工智能建议的背景,从而帮助他们拒绝不安全的人工智能建议)。在这里,我们使用客观的神经行为测量(眼动追踪)来观察医生在医院临床模拟套件中对 N = 19 名 ICU 医生的 XAI 的反应。处方决策是在安全或不安全的 AI 建议以及四种不同类型的同时呈现的 XAI 披露之前和之后做出的。我们使用明显的视觉注意力作为模拟过程中医生精神注意力定向的标记。不安全的人工智能建议比安全的人工智能建议吸引了更多的关注。然而,在不安全的人工智能场景中,这四种解释中的任何一种都没有得到明显更高的关注(即 XAI 似乎没有“拯救”决策者)。此外,医生自我报告的解释的有用性与他们对解释的关注程度并不相关,这强化了这样一种观念:仅使用自我报告来评估 XAI 工具会忽略人与机器之间交互行为的关键方面。
174. AI-enhanced reconstruction of the 12-lead electrocardiogram via 3-leads with accurate clinical assessment.
通过 3 导联人工智能增强重建 12 导联心电图,并进行准确的临床评估。
PMID: 39090394 | DOI: 10.1038/s41746-024-01193-7 | 日期: 2024-08-01
摘要: The 12-lead electrocardiogram (ECG) is an integral component to the diagnosis of a multitude of cardiovascular conditions. It is performed using a complex set of skin surface electrodes, limiting its use outside traditional clinical settings. We developed an artificial intelligence algorithm, trained over 600,000 clinically acquired ECGs, to explore whether fewer leads as input are sufficient to reconstruct a 12-lead ECG. Two limb leads (I and II) and one precordial lead (V3) were required to generate a reconstructed 12-lead ECG highly correlated with the original ECG. An automatic algorithm for detection of ECG features consistent with acute myocardial infarction (MI) performed similarly for original and reconstructed ECGs (AUC = 0.95). When interpreted by cardiologists, reconstructed ECGs achieved an accuracy of 81.4 ± 5.0% in identifying ECG features of ST-segment elevation MI, comparable with the original 12-lead ECGs (accuracy 84.6 ± 4.6%). These results will impact development efforts to innovate ECG acquisition methods with simplified tools in non-specialized settings.
中文摘要: 12 导联心电图 (ECG) 是诊断多种心血管疾病的重要组成部分。它使用一组复杂的皮肤表面电极进行,限制了其在传统临床环境之外的使用。我们开发了一种人工智能算法,对超过 600,000 个临床采集的心电图进行了训练,以探索更少的导联输入是否足以重建 12 导联心电图。需要两个肢体导联(I 和 II)和一个心前导联(V3)来生成与原始心电图高度相关的重建 12 导联心电图。用于检测与急性心肌梗死 (MI) 一致的心电图特征的自动算法对于原始心电图和重建心电图表现相似 (AUC = 0.95)。当心脏病专家解读时,重建心电图在识别 ST 段抬高 MI 心电图特征方面的准确度为 81.4±5.0%,与原始 12 导联心电图(准确度 84.6±4.6%)相当。这些结果将影响在非专业环境中使用简化工具创新心电图采集方法的开发工作。
175. Joint AI-driven event prediction and longitudinal modeling in newly diagnosed and relapsed multiple myeloma.
新诊断和复发的多发性骨髓瘤的联合人工智能驱动事件预测和纵向建模。
PMID: 39075240 | DOI: 10.1038/s41746-024-01189-3 | 日期: 2024-07-29
摘要: Multiple myeloma management requires a balance between maximizing survival, minimizing adverse events to therapy, and monitoring disease progression. While previous work has proposed data-driven models for individual tasks, these approaches fail to provide a holistic view of a patient’s disease state, limiting their utility to assist physician decision-making. To address this limitation, we developed a transformer-based machine learning model that jointly (1) predicts progression-free survival (PFS), overall survival (OS), and adverse events (AE), (2) forecasts key disease biomarkers, and (3) assesses the effect of different treatment strategies, e.g., ixazomib, lenalidomide, dexamethasone (IRd) vs lenalidomide, dexamethasone (Rd). Using TOURMALINE trial data, we trained and internally validated our model on newly diagnosed myeloma patients (N = 703) and externally validated it on relapsed and refractory myeloma patients (N = 720). Our model achieved superior performance to a risk model based on the multiple myeloma international staging system (ISS) (p < 0.001, Bonferroni corrected) and comparable performance to survival models trained separately on each task, but unable to forecast biomarkers. Our approach outperformed state-of-the-art deep learning models, tailored towards forecasting, on predicting key disease biomarkers (p < 0.001, Bonferroni corrected). Finally, leveraging our model’s capacity to estimate individual-level treatment effects, we found that patients with IgA kappa myeloma appear to benefit the most from IRd. Our study suggests that a holistic assessment of a patient’s myeloma course is possible, potentially serving as the foundation for a personalized decision support system.
中文摘要: 多发性骨髓瘤的治疗需要在最大化生存、最小化治疗不良事件和监测疾病进展之间取得平衡。虽然之前的工作提出了针对个体任务的数据驱动模型,但这些方法无法提供患者疾病状态的整体视图,限制了它们协助医生决策的效用。为了解决这一局限性,我们开发了一种基于变压器的机器学习模型,该模型联合(1)预测无进展生存期(PFS)、总生存期(OS)和不良事件(AE),(2)预测关键疾病生物标志物,以及(3)评估不同治疗策略的效果,例如伊沙佐米、来那度胺、地塞米松(IRd)与来那度胺、地塞米松(Rd)。使用 TOURMALINE 试验数据,我们在新诊断的骨髓瘤患者 (N = 703) 上训练和内部验证我们的模型,并在复发和难治性骨髓瘤患者 (N = 720) 上进行外部验证。我们的模型取得了优于基于多发性骨髓瘤国际分期系统 (ISS) 的风险模型的性能(p<0.001,Bonferroni 校正),并且与针对每项任务单独训练的生存模型具有相当的性能,但无法预测生物标志物。在预测关键疾病生物标志物方面,我们的方法优于最先进的深度学习模型(p<0.001,Bonferroni 纠正)。最后,利用我们的模型估计个体水平治疗效果的能力,我们发现 IgA kappa 骨髓瘤患者似乎从 IRd 中受益最多。我们的研究表明,对患者的骨髓瘤病程进行整体评估是可能的,有可能作为个性化决策支持系统的基础。
176. Expert gaze as a usability indicator of medical AI decision support systems: a preliminary study.
专家关注作为医疗人工智能决策支持系统可用性指标:初步研究。
PMID: 39068241 | DOI: 10.1038/s41746-024-01192-8 | 日期: 2024-07-27
摘要: Given the current state of medical artificial intelligence (AI) and perceptions towards it, collaborative systems are becoming the preferred choice for clinical workflows. This work aims to address expert interaction with medical AI support systems to gain insight towards how these systems can be better designed with the user in mind. As eye tracking metrics have been shown to be robust indicators of usability, we employ them for evaluating the usability and user interaction with medical AI support systems. We use expert gaze to assess experts’ interaction with an AI software for caries detection in bitewing x-ray images. We compared standard viewing of bitewing images without AI support versus viewing where AI support could be freely toggled on and off. We found that experts turned the AI on for roughly 25% of the total inspection task, and generally turned it on halfway through the course of the inspection. Gaze behavior showed that when supported by AI, more attention was dedicated to user interface elements related to the AI support, with more frequent transitions from the image itself to these elements. When considering that expert visual strategy is already optimized for fast and effective image inspection, such interruptions in attention can lead to increased time needed for the overall assessment. Gaze analysis provided valuable insights into an AI’s usability for medical image inspection. Further analyses of these tools and how to delineate metrical measures of usability should be developed.
中文摘要: 鉴于医疗人工智能 (AI) 的现状及其认知,协作系统正在成为临床工作流程的首选。这项工作旨在解决专家与医疗人工智能支持系统的交互问题,以深入了解如何更好地考虑用户的需求来设计这些系统。由于眼动追踪指标已被证明是稳健的可用性指标,因此我们利用它们来评估可用性以及用户与医疗人工智能支持系统的交互。我们使用专家凝视来评估专家与人工智能软件的交互,以在咬翼 X 射线图像中进行龋齿检测。我们比较了没有人工智能支持的咬翼图像的标准观看与可以自由打开和关闭人工智能支持的观看。我们发现,专家在整个检查任务中开启人工智能的时间大约为 25%,并且通常在检查过程中途开启。凝视行为表明,当人工智能支持时,更多的注意力集中在与人工智能支持相关的用户界面元素上,从图像本身到这些元素的转换更加频繁。考虑到专家视觉策略已经针对快速有效的图像检查进行了优化,这种注意力中断可能会导致整体评估所需的时间增加。视线分析为人工智能在医学图像检查中的可用性提供了宝贵的见解。应该对这些工具以及如何描述可用性的度量进行进一步分析。
177. Estimation of ventilatory thresholds during exercise using respiratory wearable sensors.
使用呼吸可穿戴传感器估计运动期间的通气阈值。
PMID: 39060511 | DOI: 10.1038/s41746-024-01191-9 | 日期: 2024-07-26
摘要: Ventilatory thresholds (VTs) are key physiological parameters used to evaluate physical performance and determine aerobic and anaerobic transitions during exercise. Current assessment of these parameters requires ergospirometry, limiting evaluation to laboratory or clinical settings. In this work, we introduce a wearable respiratory system that continuously tracks breathing during exercise and estimates VTs during ramp tests. We validate the respiratory rate and VTs predictions in 17 healthy adults using ergospirometry analysis. In addition, we use the wearable system to evaluate VTs in 107 recreational athletes during ramp tests outside the laboratory and show that the mean population values agree with physiological variables traditionally used to exercise prescription. We envision that respiratory wearables can be useful in determining aerobic and anaerobic parameters with promising applications in health telemonitoring and human performance.
中文摘要: 通气阈值 (VT) 是用于评估身体表现并确定运动期间有氧和无氧转换的关键生理参数。目前对这些参数的评估需要进行运动肺量测定,从而将评估限制在实验室或临床环境中。在这项工作中,我们引入了一种可穿戴呼吸系统,该系统可以在运动期间持续跟踪呼吸并在斜坡测试期间估计VT。我们使用运动肺量测定法分析验证了 17 名健康成年人的呼吸频率和 VT 预测。此外,我们使用可穿戴系统在实验室外的坡道测试期间评估 107 名休闲运动员的 VT,结果表明平均群体值与传统上用于运动处方的生理变量一致。我们预计,呼吸可穿戴设备可用于确定有氧和无氧参数,并在健康远程监测和人体表现方面具有广阔的应用前景。
178. Clinical phenotypes and short-term outcomes based on prehospital point-of-care testing and on-scene vital signs.
基于院前护理点测试和现场生命体征的临床表型和短期结果。
PMID: 39048671 | DOI: 10.1038/s41746-024-01194-6 | 日期: 2024-07-24
摘要: Emergency medical services (EMSs) face critical situations that require patient risk classification based on analytical and vital signs. We aimed to establish clustering-derived phenotypes based on prehospital analytical and vital signs that allow risk stratification. This was a prospective, multicenter, EMS-delivered, ambulance-based cohort study considering six advanced life support units, 38 basic life support units, and four tertiary hospitals in Spain. Adults with unselected acute diseases managed by the EMS and evacuated with discharge priority to emergency departments were considered between January 1, 2020, and June 30, 2023. Prehospital point-of-care testing and on-scene vital signs were used for the unsupervised machine learning method (clustering) to determine the phenotypes. Then phenotypes were compared with the primary outcome (cumulative mortality (all-cause) at 2, 7, and 30 days). A total of 7909 patients were included. The median (IQR) age was 64 (51-80) years, 41% were women, and 26% were living in rural areas. Three clusters were identified: alpha 16.2% (1281 patients), beta 28.8% (2279), and gamma 55% (4349). The mortality rates for alpha, beta and gamma at 2 days were 18.6%, 4.1%, and 0.8%, respectively; at 7 days, were 24.7%, 6.2%, and 1.7%; and at 30 days, were 33%, 10.2%, and 3.2%, respectively. Based on standard vital signs and blood test biomarkers in the prehospital scenario, three clusters were identified: alpha (high-risk), beta and gamma (medium- and low-risk, respectively). This permits the EMS system to quickly identify patients who are potentially compromised and to proactively implement the necessary interventions.
中文摘要: 紧急医疗服务 (EMS) 面临危急情况,需要根据分析和生命体征对患者进行风险分类。我们的目的是根据院前分析和生命体征建立聚类衍生的表型,从而进行风险分层。这是一项前瞻性、多中心、EMS 交付、基于救护车的队列研究,考虑了西班牙的 6 个高级生命支持单位、38 个基本生命支持单位和 4 家三级医院。 2020 年 1 月 1 日至 2023 年 6 月 30 日期间,考虑由 EMS 管理的患有未选择的急性疾病并优先送往急诊科的成人。使用院前护理点检测和现场生命体征进行无监督机器学习方法(聚类)来确定表型。然后将表型与主要结局(2、7 和 30 天的累积死亡率(全因))进行比较。总共纳入了 7909 名患者。中位年龄 (IQR) 为 64 (51-80) 岁,41% 为女性,26% 生活在农村地区。确定了三个簇:α 16.2%(1281 名患者)、β 28.8%(2279 名患者)和 γ 55%(4349 名患者)。 α、β和γ的2天死亡率分别为18.6%、4.1%和0.8%; 7 天时,分别为 24.7%、6.2% 和 1.7%; 30 天时,分别为 33%、10.2% 和 3.2%。根据院前场景中的标准生命体征和血液测试生物标志物,确定了三个集群:α(高风险)、β和γ(分别为中风险和低风险)。这使得 EMS 系统能够快速识别可能受到损害的患者,并主动实施必要的干预措施。
179. Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.
医学领域多模态 GPT-4 视觉专家级准确性背后隐藏的缺陷。
PMID: 39043988 | DOI: 10.1038/s41746-024-01185-7 | 日期: 2024-07-23
摘要: Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V’s rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges-an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V’s high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.
中文摘要: 最近的研究表明,具有视觉功能的生成式预训练 Transformer 4 (GPT-4V) 在医疗挑战任务中的表现优于人类医生。然而,这些评估主要只关注多项选择题的准确性。我们的研究通过对 GPT-4V 在解决新英格兰医学杂志 (NEJM) 图像挑战(旨在测试医疗专业人员的知识和诊断能力的成像测验)时的图像理解、医学知识回忆和逐步多模态推理的基本原理进行全面分析,扩展了当前的范围。评估结果证实,GPT-4V 在多项选择准确性方面的表现与人类医生相当(81.6% vs. 77.8%)。 GPT-4V 在医生回答错误的情况下也表现良好,准确率超过 78%。然而,我们发现 GPT-4V 在做出正确的最终选择 (35.5%) 的情况下经常呈现出有缺陷的基本原理,最突出的是图像理解 (27.2%)。尽管 GPT-4V 在多项选择题中具有很高的准确性,但我们的研究结果强调,在将此类多模态 AI 模型集成到临床工作流程之前,有必要对其基本原理进行进一步深入的评估。
180. Orchestrating explainable artificial intelligence for multimodal and longitudinal data in medical imaging.
为医学成像中的多模式和纵向数据编排可解释的人工智能。
PMID: 39039248 | DOI: 10.1038/s41746-024-01190-w | 日期: 2024-07-22
摘要: Explainable artificial intelligence (XAI) has experienced a vast increase in recognition over the last few years. While the technical developments are manifold, less focus has been placed on the clinical applicability and usability of systems. Moreover, not much attention has been given to XAI systems that can handle multimodal and longitudinal data, which we postulate are important features in many clinical workflows. In this study, we review, from a clinical perspective, the current state of XAI for multimodal and longitudinal datasets and highlight the challenges thereof. Additionally, we propose the XAI orchestrator, an instance that aims to help clinicians with the synopsis of multimodal and longitudinal data, the resulting AI predictions, and the corresponding explainability output. We propose several desirable properties of the XAI orchestrator, such as being adaptive, hierarchical, interactive, and uncertainty-aware.
中文摘要: 在过去几年中,可解释人工智能(XAI)的认知度大幅提高。尽管技术发展是多方面的,但对系统的临床适用性和可用性的关注较少。此外,对于可以处理多模态和纵向数据的 XAI 系统并没有给予太多关注,我们假设这是许多临床工作流程中的重要特征。在这项研究中,我们从临床角度回顾了多模态和纵向数据集的 XAI 现状,并强调了其面临的挑战。此外,我们还提出了 XAI 协调器,该实例旨在帮助临床医生汇总多模态和纵向数据、由此产生的 AI 预测以及相应的可解释性输出。我们提出了 XAI 协调器的几个理想属性,例如自适应、分层、交互和不确定性感知。
181. Autonomous artificial intelligence for diabetic eye disease increases access and health equity in underserved populations.
用于治疗糖尿病眼病的自主人工智能可以增加服务不足人群的获取机会和健康公平性。
PMID: 39039218 | DOI: 10.1038/s41746-024-01197-3 | 日期: 2024-07-22
摘要: Diabetic eye disease (DED) is a leading cause of blindness in the world. Annual DED testing is recommended for adults with diabetes, but adherence to this guideline has historically been low. In 2020, Johns Hopkins Medicine (JHM) began deploying autonomous AI for DED testing. In this study, we aimed to determine whether autonomous AI implementation was associated with increased adherence to annual DED testing, and how this differed across patient populations. JHM primary care sites were categorized as “non-AI” (no autonomous AI deployment) or “AI-switched” (autonomous AI deployment by 2021). We conducted a propensity score weighting analysis to compare change in adherence rates from 2019 to 2021 between non-AI and AI-switched sites. Our study included all adult patients with diabetes (>17,000) managed within JHM and has three major findings. First, AI-switched sites experienced a 7.6 percentage point greater increase in DED testing than non-AI sites from 2019 to 2021 (p < 0.001). Second, the adherence rate for Black/African Americans increased by 12.2 percentage points within AI-switched sites but decreased by 0.6% points within non-AI sites (p < 0.001), suggesting that autonomous AI deployment improved access to retinal evaluation for historically disadvantaged populations. Third, autonomous AI is associated with improved health equity, e.g. the adherence rate gap between Asian Americans and Black/African Americans shrank from 15.6% in 2019 to 3.5% in 2021. In summary, our results from real-world deployment in a large integrated healthcare system suggest that autonomous AI is associated with improvement in overall DED testing adherence, patient access, and health equity.
中文摘要: 糖尿病眼病(DED)是世界上导致失明的主要原因。建议成人糖尿病患者每年进行一次 DED 检测,但历史上对该指南的遵守率很低。 2020 年,约翰霍普金斯医学院 (JHM) 开始部署自主人工智能进行 DED 测试。在这项研究中,我们的目的是确定自主人工智能的实施是否与增加每年 DED 测试的依从性相关,以及这在不同患者群体中有何不同。 JHM 初级保健站点被归类为“非 AI”(无自主 AI 部署)或“AI 切换”(到 2021 年实现自主 AI 部署)。我们进行了倾向得分加权分析,以比较 2019 年至 2021 年非人工智能和人工智能转换网站之间的遵守率变化。我们的研究涵盖了 JHM 管理的所有成年糖尿病患者(>17,000 名),并得出了三项主要发现。首先,从 2019 年到 2021 年,AI 切换站点的 DED 测试增幅比非 AI 站点高 7.6 个百分点 (p< 0.001)。其次,黑人/非裔美国人的遵守率在 AI 切换网站中增加了 12.2 个百分点,但在非 AI 网站中下降了 0.6 个百分点 (p< 0.001),这表明自主 AI 部署改善了历史上弱势群体获得视网膜评估的机会。第三,自主人工智能与改善健康公平相关,例如亚裔美国人和黑人/非裔美国人之间的依从率差距从 2019 年的 15.6% 缩小到 2021 年的 3.5%。总而言之,我们在大型综合医疗保健系统中的实际部署结果表明,自主人工智能与整体 DED 测试依从性、患者可及性和健康公平性的改善相关。
182. Estimating the household secondary attack rate and serial interval of COVID-19 using social media.
使用社交媒体估计家庭二次发作率和 COVID-19 的连续间隔。
PMID: 39033238 | DOI: 10.1038/s41746-024-01160-2 | 日期: 2024-07-20
摘要: We propose a method to estimate the household secondary attack rate (hSAR) of COVID-19 in the United Kingdom based on activity on the social media platform X, formerly known as Twitter. Conventional methods of hSAR estimation are resource intensive, requiring regular contact tracing of COVID-19 cases. Our proposed framework provides a complementary method that does not rely on conventional contact tracing or laboratory involvement, including the collection, processing, and analysis of biological samples. We use a text classifier to identify reports of people tweeting about themselves and/or members of their household having COVID-19 infections. A probabilistic analysis is then performed to estimate the hSAR based on the number of self or household, and self and household tweets of COVID-19 infection. The analysis includes adjustments for a reluctance of Twitter users to tweet about household members, and the possibility that the secondary infection was not acquired within the household. Experimental results for the UK, both monthly and weekly, are reported for the period from January 2020 to February 2022. Our results agree with previously reported hSAR estimates, varying with the primary variants of concern, e.g. delta and omicron. The serial interval (SI) is based on the time between the two tweets that indicate a primary and secondary infection. Experimental results, though larger than the consensus, are qualitatively similar. The estimation of hSAR and SI using social media data constitutes a new tool that may help in characterizing, forecasting and managing outbreaks and pandemics in a faster, affordable, and more efficient manner.
中文摘要: 我们提出了一种根据社交媒体平台 X(以前称为 Twitter)上的活动来估计英国 COVID-19 家庭二次攻击率 (hSAR) 的方法。传统的 hSAR 估计方法需要大量资源,需要定期追踪 COVID-19 病例的接触者。我们提出的框架提供了一种补充方法,不依赖于传统的接触者追踪或实验室参与,包括生物样本的收集、处理和分析。我们使用文本分类器来识别人们在推特上发布有关自己和/或其家庭成员感染了 COVID-19 感染的信息的报告。然后进行概率分析,根据自我或家庭数量以及自我和家庭感染 COVID-19 的推文来估计 hSAR。该分析包括对推特用户不愿发布有关家庭成员的推文以及二次感染并非在家庭内获得的可能性的调整。英国的实验结果(每月和每周)报告的时间为 2020 年 1 月至 2022 年 2 月。我们的结果与之前报告的 hSAR 估计值一致,但因关注的主要变量而异,例如Delta 和 omicron。序列间隔 (SI) 基于指示原发和继发感染的两条推文之间的时间。实验结果虽然大于共识,但在质量上是相似的。使用社交媒体数据估计 hSAR 和 SI 构成了一种新工具,可能有助于以更快、更经济、更有效的方式描述、预测和管理疫情和流行病。
183. Large language models outperform mental and medical health care professionals in identifying obsessive-compulsive disorder.
在识别强迫症方面,大型语言模型的表现优于心理和医疗保健专业人员。
PMID: 39030292 | DOI: 10.1038/s41746-024-01181-x | 日期: 2024-07-19
摘要: Despite the promising capacity of large language model (LLM)-powered chatbots to diagnose diseases, they have not been tested for obsessive-compulsive disorder (OCD). We assessed the diagnostic accuracy of LLMs in OCD using vignettes and found that LLMs outperformed medical and mental health professionals. This highlights the potential benefit of LLMs in assisting in the timely and accurate diagnosis of OCD, which usually entails a long delay in diagnosis and treatment.
中文摘要: 尽管大型语言模型(LLM)驱动的聊天机器人在诊断疾病方面具有广阔的前景,但它们尚未经过强迫症(OCD)测试。我们使用小插图评估了法学硕士在强迫症方面的诊断准确性,发现法学硕士的表现优于医疗和心理健康专业人员。这凸显了法学硕士在协助及时、准确诊断强迫症方面的潜在好处,这通常会导致诊断和治疗的长期延误。
184. A systematic review of the impacts of remote patient monitoring (RPM) interventions on safety, adherence, quality-of-life and cost-related outcomes.
系统回顾远程患者监测 (RPM) 干预措施对安全性、依从性、生活质量和成本相关结果的影响。
PMID: 39025937 | DOI: 10.1038/s41746-024-01182-w | 日期: 2024-07-18
摘要: Due to rapid technological advancements, remote patient monitoring (RPM) technology has gained traction in recent years. While the effects of specific RPM interventions are known, few published reviews examine RPM in the context of care transitions from an inpatient hospital setting to a home environment. In this systematic review, we addressed this gap by examining the impacts of RPM interventions on patient safety, adherence, clinical and quality of life outcomes and cost-related outcomes during care transition from inpatient care to a home setting. We searched five academic databases (PubMed, CINAHL, PsycINFO, Embase and SCOPUS), screened 2606 articles, and included 29 studies from 16 countries. These studies examined seven types of RPM interventions (communication tools, computer-based systems, smartphone applications, web portals, augmented clinical devices with monitoring capabilities, wearables and standard clinical tools for intermittent monitoring). RPM interventions demonstrated positive outcomes in patient safety and adherence. RPM interventions also improved patients’ mobility and functional statuses, but the impact on other clinical and quality-of-life measures, such as physical and mental health symptoms, remains inconclusive. In terms of cost-related outcomes, there was a clear downward trend in the risks of hospital admission/readmission, length of stay, number of outpatient visits and non-hospitalisation costs. Future research should explore whether incorporating intervention components with a strong human element alongside the deployment of technology enhances the effectiveness of RPM. The review highlights the need for more economic evaluations and implementation studies that shed light on the facilitators and barriers to adopting RPM interventions in different care settings.
中文摘要: 由于技术的快速进步,远程患者监护 (RPM) 技术近年来受到关注。虽然特定 RPM 干预措施的效果是已知的,但很少有发表的评论在从住院医院环境到家庭环境的护理过渡的背景下研究 RPM。在本次系统综述中,我们通过研究 RPM 干预措施对患者安全、依从性、临床和生活质量结果以及从住院护理过渡到家庭环境期间与成本相关的结果的影响来解决这一差距。我们检索了五个学术数据库(PubMed、CINAHL、PsycINFO、Embase 和 SCOPUS),筛选了 2606 篇文章,纳入了来自 16 个国家的 29 项研究。这些研究检查了七种类型的 RPM 干预措施(通信工具、计算机系统、智能手机应用程序、门户网站、具有监测功能的增强临床设备、可穿戴设备和用于间歇监测的标准临床工具)。 RPM 干预措施在患者安全和依从性方面表现出积极的成果。 RPM 干预措施还改善了患者的活动能力和功能状态,但对其他临床和生活质量指标(例如身心健康症状)的影响仍不确定。在费用相关结果方面,入院/再入院风险、住院时间、门诊次数和非住院费用均呈明显下降趋势。未来的研究应该探索将具有强大人为因素的干预措施与技术部署结合起来是否可以提高 RPM 的有效性。该审查强调需要进行更多的经济评估和实施研究,以阐明在不同护理环境中采用 RPM 干预措施的促进因素和障碍。
185. A survey of skin tone assessment in prospective research.
前瞻性研究中肤色评估的调查。
PMID: 39014060 | DOI: 10.1038/s41746-024-01176-8 | 日期: 2024-07-17
摘要: Increasing evidence supports reduced accuracy of noninvasive assessment tools, such as pulse oximetry, temperature probes, and AI skin diagnosis benchmarks, in patients with darker skin tones. The FDA is exploring potential strategies for device regulation to improve performance across diverse skin tones by including skin tone criteria. However, there is no consensus about how prospective studies should perform skin tone assessment in order to take this bias into account. There are several tools available to conduct skin tone assessments including administered visual scales (e.g., Fitzpatrick Skin Type, Pantone, Monk Skin Tone) and color measurement tools (e.g., reflectance colorimeters, reflectance spectrophotometers, cameras), although none are consistently used or validated across multiple medical domains. Accurate and consistent skin tone measurement depends on many factors including standardized environments, lighting, body parts assessed, patient conditions, and choice of skin tone assessment tool(s). As race and ethnicity are inadequate proxies for skin tone, these considerations can be helpful in standardizing the effect of skin tone on studies such as AI dermatology diagnoses, pulse oximetry, and temporal thermometers. Skin tone bias in medical devices is likely due to systemic factors that lead to inadequate validation across diverse skin tones. There is an opportunity for researchers to use skin tone assessment methods with standardized considerations in prospective studies of noninvasive tools that may be affected by skin tone. We propose considerations that researchers must take in order to improve device robustness to skin tone bias.
中文摘要: 越来越多的证据表明,对于肤色较深的患者,脉搏血氧仪、温度探头和人工智能皮肤诊断基准等无创评估工具的准确性会降低。 FDA 正在探索潜在的设备监管策略,通过纳入肤色标准来提高不同肤色的性能。然而,关于前瞻性研究应如何进行肤色评估以考虑到这种偏差,目前尚未达成共识。有多种工具可用于进行肤色评估,包括管理视觉量表(例如菲茨帕特里克皮肤类型、潘通色、僧侣肤色)和颜色测量工具(例如反射比色计、反射分光光度计、相机),尽管没有一个工具在多个医学领域得到一致使用或验证。准确且一致的肤色测量取决于许多因素,包括标准化环境、照明、评估的身体部位、患者状况以及肤色评估工具的选择。由于种族和民族不足以代表肤色,因此这些考虑因素有助于标准化肤色对人工智能皮肤病诊断、脉搏血氧饱和度和时间温度计等研究的影响。医疗设备中的肤色偏差可能是由于系统因素导致对不同肤色的验证不充分。研究人员有机会在可能受肤色影响的非侵入性工具的前瞻性研究中使用标准化考虑因素的肤色评估方法。我们提出了研究人员必须考虑的注意事项,以提高设备对肤色偏差的鲁棒性。
186. From virtual patients to digital twins in immuno-oncology: lessons learned from mechanistic quantitative systems pharmacology modeling.
从虚拟患者到免疫肿瘤学中的数字双胞胎:从机械定量系统药理学模型中吸取的教训。
PMID: 39014005 | DOI: 10.1038/s41746-024-01188-4 | 日期: 2024-07-16
摘要: Virtual patients and digital patients/twins are two similar concepts gaining increasing attention in health care with goals to accelerate drug development and improve patients’ survival, but with their own limitations. Although methods have been proposed to generate virtual patient populations using mechanistic models, there are limited number of applications in immuno-oncology research. Furthermore, due to the stricter requirements of digital twins, they are often generated in a study-specific manner with models customized to particular clinical settings (e.g., treatment, cancer, and data types). Here, we discuss the challenges for virtual patient generation in immuno-oncology with our most recent experiences, initiatives to develop digital twins, and how research on these two concepts can inform each other.
中文摘要: 虚拟患者和数字患者/双胞胎是两个相似的概念,在医疗保健领域受到越来越多的关注,其目标是加速药物开发和提高患者的生存率,但也有其自身的局限性。尽管已经提出了使用机械模型生成虚拟患者群体的方法,但在免疫肿瘤学研究中的应用数量有限。此外,由于数字双胞胎的要求更严格,它们通常以特定于研究的方式生成,并使用针对特定临床环境(例如治疗、癌症和数据类型)定制的模型。在这里,我们通过最新的经验讨论免疫肿瘤学中虚拟患者生成的挑战、开发数字双胞胎的举措,以及对这两个概念的研究如何相互告知。
187. Digital biomarkers for non-motor symptoms in Parkinson’s disease: the state of the art.
帕金森病非运动症状的数字生物标志物:最先进的技术。
PMID: 38992186 | DOI: 10.1038/s41746-024-01144-2 | 日期: 2024-07-11
摘要: Digital biomarkers that remotely monitor symptoms have the potential to revolutionize outcome assessments in future disease-modifying trials in Parkinson’s disease (PD), by allowing objective and recurrent measurement of symptoms and signs collected in the participant’s own living environment. This biomarker field is developing rapidly for assessing the motor features of PD, but the non-motor domain lags behind. Here, we systematically review and assess digital biomarkers under development for measuring non-motor symptoms of PD. We also consider relevant developments outside the PD field. We focus on technological readiness level and evaluate whether the identified digital non-motor biomarkers have potential for measuring disease progression, covering the spectrum from prodromal to advanced disease stages. Furthermore, we provide perspectives for future deployment of these biomarkers in trials. We found that various wearables show high promise for measuring autonomic function, constipation and sleep characteristics, including REM sleep behavior disorder. Biomarkers for neuropsychiatric symptoms are less well-developed, but show increasing accuracy in non-PD populations. Most biomarkers have not been validated for specific use in PD, and their sensitivity to capture disease progression remains untested for prodromal PD where the need for digital progression biomarkers is greatest. External validation in real-world environments and large longitudinal cohorts remains necessary for integrating non-motor biomarkers into research, and ultimately also into daily clinical practice.
中文摘要: 通过允许对参与者自己的生活环境中收集的症状和体征进行客观和反复的测量,远程监测症状的数字生物标记有可能彻底改变未来帕金森病 (PD) 疾病修饰试验的结果评估。用于评估帕金森病运动特征的生物标志物领域正在迅速发展,但非运动领域却滞后。在这里,我们系统地回顾和评估了正在开发的用于测量帕金森病非运动症状的数字生物标志物。我们还考虑 PD 领域之外的相关发展。我们专注于技术准备水平,并评估已识别的数字非运动生物标志物是否具有测量疾病进展的潜力,涵盖从前驱期到晚期疾病阶段的范围。此外,我们还为未来在试验中部署这些生物标志物提供了前景。我们发现各种可穿戴设备在测量自主功能、便秘和睡眠特征(包括快速眼动睡眠行为障碍)方面显示出良好的前景。神经精神症状的生物标志物尚不成熟,但在非帕金森病人群中显示出越来越高的准确性。大多数生物标志物尚未经过验证可用于帕金森病,并且它们捕获疾病进展的敏感性尚未针对前驱帕金森病进行测试,而在帕金森病前驱期,对数字进展生物标志物的需求最大。将非运动生物标志物纳入研究并最终纳入日常临床实践仍然需要在现实环境和大型纵向队列中进行外部验证。
188. UCHealth’s virtual health center: How Colorado’s largest health system creates and integrates technology into patient care.
UCHealth 的虚拟健康中心:科罗拉多州最大的医疗系统如何创建技术并将其集成到患者护理中。
PMID: 38992097 | DOI: 10.1038/s41746-024-01184-8 | 日期: 2024-07-11
摘要: In the face of formidable healthcare challenges, such as staffing shortages and rising costs, technology has emerged as a crucial ally in enhancing patient care. UCHealth, Colorado’s largest health system, has pioneered the integration of technology into patient care through its Virtual Health Center (VHC). In this Comment, we explore UCHealth’s journey in creating a centralized hub that harnesses innovative digital health solutions to address patient care needs across its 12 hospitals, spanning over 600,000 emergency department visits and nearly 150,000 inpatient and observation encounters annually. The VHC has proven to be a transformative force, providing high-quality care at scale, reducing staff burden, and establishing new career pathways in virtual health. The transformation process involved multiple steps: (a) identifying a need, (b) vetting within health system solutions, © searching for industry solutions, and scrutinizing these through meetings with our innovations center, (d) piloting the solution, and (e) sustaining the solution by integrating them within the electronic health record (EHR).
中文摘要: 面对人员短缺和成本上升等严峻的医疗保健挑战,技术已成为加强患者护理的重要盟友。科罗拉多州最大的医疗系统 UCHealth 率先通过其虚拟医疗中心 (VHC) 将技术整合到患者护理中。在这篇评论中,我们探讨了 UCHealth 创建一个集中中心的历程,该中心利用创新的数字健康解决方案来满足其 12 家医院的患者护理需求,每年接待超过 600,000 次急诊科就诊和近 150,000 次住院和观察。 VHC 已被证明是一股变革力量,可大规模提供高质量护理、减轻员工负担并在虚拟健康领域建立新的职业道路。转型过程涉及多个步骤:(a) 确定需求,(b) 审查卫生系统解决方案,© 寻找行业解决方案,并通过与我们的创新中心举行会议仔细审查这些解决方案,(d) 试点解决方案,以及 (e) 通过将解决方案集成到电子健康记录 (EHR) 中来维持解决方案。
189. Identification of Parkinson’s disease PACE subtypes and repurposing treatments through integrative analyses of multimodal data.
通过多模态数据的综合分析识别帕金森病 PACE 亚型并重新调整治疗方案。
PMID: 38982243 | DOI: 10.1038/s41746-024-01175-9 | 日期: 2024-07-09
摘要: Parkinson’s disease (PD) is a serious neurodegenerative disorder marked by significant clinical and progression heterogeneity. This study aimed at addressing heterogeneity of PD through integrative analysis of various data modalities. We analyzed clinical progression data (≥5 years) of individuals with de novo PD using machine learning and deep learning, to characterize individuals’ phenotypic progression trajectories for PD subtyping. We discovered three pace subtypes of PD exhibiting distinct progression patterns: the Inching Pace subtype (PD-I) with mild baseline severity and mild progression speed; the Moderate Pace subtype (PD-M) with mild baseline severity but advancing at a moderate progression rate; and the Rapid Pace subtype (PD-R) with the most rapid symptom progression rate. We found cerebrospinal fluid P-tau/α-synuclein ratio and atrophy in certain brain regions as potential markers of these subtypes. Analyses of genetic and transcriptomic profiles with network-based approaches identified molecular modules associated with each subtype. For instance, the PD-R-specific module suggested STAT3, FYN, BECN1, APOA1, NEDD4, and GATA2 as potential driver genes of PD-R. It also suggested neuroinflammation, oxidative stress, metabolism, PI3K/AKT, and angiogenesis pathways as potential drivers for rapid PD progression (i.e., PD-R). Moreover, we identified repurposable drug candidates by targeting these subtype-specific molecular modules using network-based approach and cell line drug-gene signature data. We further estimated their treatment effects using two large-scale real-world patient databases; the real-world evidence we gained highlighted the potential of metformin in ameliorating PD progression. In conclusion, this work helps better understand clinical and pathophysiological complexity of PD progression and accelerate precision medicine.
中文摘要: 帕金森病 (PD) 是一种严重的神经退行性疾病,具有显着的临床和进展异质性。本研究旨在通过对各种数据模式的综合分析来解决PD的异质性。我们使用机器学习和深度学习分析了新发帕金森病患者的临床进展数据(≥5年),以表征帕金森病亚型的个体表型进展轨迹。我们发现 PD 的三种起搏亚型表现出不同的进展模式:渐进起搏亚型 (PD-I) 具有轻度基线严重性和轻度进展速度;中度起搏亚型 (PD-M),基线严重程度为轻度,但进展速度为中等;以及症状进展速度最快的快速亚型(PD-R)。我们发现脑脊液 P-tau/α-突触核蛋白比率和某些大脑区域的萎缩是这些亚型的潜在标志。使用基于网络的方法对遗传和转录组谱进行分析,确定了与每个亚型相关的分子模块。例如,PD-R 特异性模块建议 STAT3、FYN、BECN1、APOA1、NEDD4 和 GATA2 作为 PD-R 的潜在驱动基因。它还表明神经炎症、氧化应激、代谢、PI3K/AKT 和血管生成途径是 PD 快速进展(即 PD-R)的潜在驱动因素。此外,我们通过使用基于网络的方法和细胞系药物基因特征数据针对这些亚型特异性分子模块,确定了可重新利用的候选药物。我们使用两个大型真实世界患者数据库进一步估计了他们的治疗效果;我们获得的现实世界证据强调了二甲双胍在改善帕金森病进展方面的潜力。总之,这项工作有助于更好地了解 PD 进展的临床和病理生理学复杂性,并加速精准医疗。
190. From silos to synergy: integrating academic health informatics with operational IT for healthcare transformation.
从孤岛到协同:将学术健康信息学与运营 IT 相结合,实现医疗保健转型。
PMID: 38982211 | DOI: 10.1038/s41746-024-01179-5 | 日期: 2024-07-09
摘要: We have entered a new age of health informatics—applied health informatics—where digital health innovation cannot be pursued without considering operational needs. In this new digital health era, creating an integrated applied health informatics system will be essential for health systems to achieve informatics healthcare goals. Integration of information technology (IT) and health informatics does not naturally occur without a deliberate and intentional shift towards unification. Recognizing this, NYU Langone Health’s (NYULH) Medical Center IT (MCIT) has taken proactive measures to vertically integrate academic informatics and operational IT through the establishment of the MCIT Department of Health Informatics (DHI). The creation of the NYULH DHI showcases the drivers, challenges, and ultimate successes of our enterprise effort to align academic health informatics with IT; providing a model for the creation of the applied health informatics programs required for academic health systems to thrive in the increasingly digitized healthcare landscape.
中文摘要: 我们已经进入了健康信息学的新时代——应用健康信息学——如果不考虑运营需求,就无法追求数字健康创新。在这个新的数字健康时代,创建集成的应用健康信息学系统对于卫生系统实现信息学医疗保健目标至关重要。如果没有经过深思熟虑和有意向统一的转变,信息技术 (IT) 和健康信息学的整合就不会自然发生。认识到这一点,纽约大学朗格健康中心 (NYULH) 医疗中心 IT (MCIT) 已采取积极措施,通过建立 MCIT 健康信息学系 (DHI) 来垂直整合学术信息学和运营 IT。 NYULH DHI 的创建展示了我们企业努力将学术健康信息学与 IT 结合起来的驱动力、挑战和最终成功;为创建学术卫生系统在日益数字化的医疗保健领域蓬勃发展所需的应用健康信息学项目提供了一个模型。
191. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs).
ChatGPT 在医学和医疗保健领域的伦理:大型语言模型 (LLM) 的系统评价。
PMID: 38977771 | DOI: 10.1038/s41746-024-01157-x | 日期: 2024-07-08
摘要: With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite potential benefits, researchers have underscored various ethical implications. While individual instances have garnered attention, a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them is lacking. Against this background, this work maps the ethical landscape surrounding the current deployment of LLMs in medicine and healthcare through a systematic review. Electronic databases and preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Methodological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Four general fields of applications emerged showcasing a dynamic exploration phase. Advantages of using LLMs are attributed to their capacity in data analysis, information provisioning, support in decision-making or mitigating information loss and enhancing information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency, and privacy. A distinctive concern is the tendency to produce harmful or convincing but inaccurate content. Calls for ethical guidance and human oversight are recurrent. We suggest that the ethical guidance debate should be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of settings, varying potentials for harm, and different acceptable thresholds for performance and certainty in healthcare. Additionally, critical inquiry is needed to evaluate the necessity and justification of LLMs’ current experimental use.
中文摘要: 随着 ChatGPT 的引入,大型语言模型 (LLM) 在医疗保健领域受到了极大的关注。尽管有潜在的好处,研究人员还是强调了各种伦理影响。虽然个别实例引起了人们的关注,但缺乏对当前研究的实际应用以及与之相关的伦理问题的系统和全面的概述。在此背景下,这项工作通过系统回顾,描绘了当前医学和医疗保健领域法学硕士部署的伦理格局。使用综合搜索策略对电子数据库和预印本服务器进行查询,生成了 796 条记录。按照修改后的快速审查方法筛选和提取研究。使用混合方法评估方法学质量。对 53 条记录进行了元聚合综合。出现了四个一般应用领域,展示了动态探索阶段。使用法学硕士的优势在于其数据分析、信息提供、决策支持或减少信息丢失以及增强信息可访问性的能力。然而,我们的研究还发现了与公平、偏见、非恶意、透明度和隐私相关的反复出现的道德问题。一个独特的担忧是产生有害或令人信服但不准确的内容的倾向。对道德指导和人类监督的呼声一再出现。我们建议应该重新构建道德指导辩论,重点关注定义跨应用范围可接受的人类监督的内容。这涉及到考虑环境的多样性、不同的潜在危害以及医疗保健绩效和确定性的不同可接受阈值。此外,需要进行批判性调查来评估法学硕士当前实验使用的必要性和合理性。
192. Accuracy of dental implant placement using different dynamic navigation and robotic systems: an in vitro study.
使用不同的动态导航和机器人系统进行牙种植体植入的准确性:一项体外研究。
PMID: 38971937 | DOI: 10.1038/s41746-024-01178-6 | 日期: 2024-07-06
摘要: Computer-aided implant surgery has undergone continuous development in recent years. In this study, active and passive systems of dynamic navigation were divided into active dynamic navigation system group and passive dynamic navigation system group (ADG and PDG), respectively. Active, passive and semi-active implant robots were divided into active robot group, passive robot group and semi-active robot group (ARG, PRG and SRG), respectively. Each group placed two implants (FDI tooth positions 31 and 36) in a model 12 times. The accuracy of 216 implants in 108 models were analysed. The coronal deviations of ADG, PDG, ARG, PRG and SRG were 0.85 ± 0.17 mm, 1.05 ± 0.42 mm, 0.29 ± 0.15 mm, 0.40 ± 0.16 mm and 0.33 ± 0.14 mm, respectively. The apical deviations of the five groups were 1.11 ± 0.23 mm, 1.07 ± 0.38 mm, 0.29 ± 0.15 mm, 0.50 ± 0.19 mm and 0.36 ± 0.16 mm, respectively. The axial deviations of the five groups were 1.78 ± 0.73°, 1.99 ± 1.20°, 0.61 ± 0.25°, 1.04 ± 0.37° and 0.42 ± 0.18°, respectively. The coronal, apical and axial deviations of ADG were higher than those of ARG, PRG and SRG (all P < 0.001). Similarly, the coronal, apical and axial deviations of PDG were higher than those of ARG, PRG, and SRG (all P < 0.001). Dynamic and robotic computer-aided implant surgery may show good implant accuracy in vitro. However, the accuracy and stability of implant robots are higher than those of dynamic navigation systems.
中文摘要: 近年来计算机辅助种植手术不断发展。本研究将主动和被动动态导航系统分别分为主动动态导航系统组和被动动态导航系统组(ADG和PDG)。主动、被动和半主动植入机器人分别分为主动机器人组、被动机器人组和半主动机器人组(ARG、PRG和SRG)。每组在模型中放置两颗种植体(FDI 牙位 31 和 36)12 次。对 108 个模型中 216 个种植体的准确性进行了分析。 ADG、PDG、ARG、PRG、SRG的冠状位偏差分别为0.85±0.17mm、1.05±0.42mm、0.29±0.15mm、0.40±0.16mm、分别为0.33 ±0.14 mm。五组根尖偏差分别为1.11±0.23mm、1.07±0.38mm、0.29±0.15mm、0.50±0.19mm和0.36±0.16mm。分别。五组的轴向偏差分别为1.78±0.73°、1.99±1.20°、0.61±0.25°、1.04±0.37°和0.42±0.18°。 ADG的冠状面、心尖部和轴向偏差均高于ARG、PRG和SRG(均P<0.001)。同样,PDG 的冠状面、心尖部和轴向偏差均高于 ARG、PRG 和 SRG(均 P<0.001)。动态和机器人计算机辅助种植手术可能在体外表现出良好的种植精度。然而,植入机器人的精度和稳定性高于动态导航系统。
193. Deep learning for multi-type infectious keratitis diagnosis: A nationwide, cross-sectional, multicenter study.
多型感染性角膜炎诊断的深度学习:一项全国性、横断面、多中心研究。
PMID: 38971902 | DOI: 10.1038/s41746-024-01174-w | 日期: 2024-07-06
摘要: The main cause of corneal blindness worldwide is keratitis, especially the infectious form caused by bacteria, fungi, viruses, and Acanthamoeba. The key to effective management of infectious keratitis hinges on prompt and precise diagnosis. Nevertheless, the current gold standard, such as cultures of corneal scrapings, remains time-consuming and frequently yields false-negative results. Here, using 23,055 slit-lamp images collected from 12 clinical centers nationwide, this study constructed a clinically feasible deep learning system, DeepIK, that could emulate the diagnostic process of a human expert to identify and differentiate bacterial, fungal, viral, amebic, and noninfectious keratitis. DeepIK exhibited remarkable performance in internal, external, and prospective datasets (all areas under the receiver operating characteristic curves > 0.96) and outperformed three other state-of-the-art algorithms (DenseNet121, InceptionResNetV2, and Swin-Transformer). Our study indicates that DeepIK possesses the capability to assist ophthalmologists in accurately and swiftly identifying various infectious keratitis types from slit-lamp images, thereby facilitating timely and targeted treatment.
中文摘要: 全世界角膜失明的主要原因是角膜炎,尤其是由细菌、真菌、病毒和棘阿米巴引起的感染形式。有效治疗感染性角膜炎的关键在于及时、准确的诊断。然而,当前的黄金标准,例如角膜刮片培养,仍然耗时且经常产生假阴性结果。本研究利用从全国 12 个临床中心收集的 23,055 张裂隙灯图像,构建了一个临床上可行的深度学习系统 DeepIK,该系统可以模拟人类专家的诊断过程,以识别和区分细菌、真菌、病毒、阿米巴和非感染性角膜炎。 DeepIK 在内部、外部和前瞻性数据集(接收器操作特性曲线下的所有区域 > 0.96)中表现出卓越的性能,并且优于其他三种最先进的算法(DenseNet121、InceptionResNetV2 和 Swin-Transformer)。我们的研究表明,DeepIK能够帮助眼科医生从裂隙灯图像中准确、快速地识别各种感染性角膜炎类型,从而有助于及时、有针对性的治疗。
194. Quantifying impairment and disease severity using AI models trained on healthy subjects.
使用在健康受试者上训练的人工智能模型来量化损伤和疾病严重程度。
PMID: 38969786 | DOI: 10.1038/s41746-024-01173-x | 日期: 2024-07-06
摘要: Automatic assessment of impairment and disease severity is a key challenge in data-driven medicine. We propose a framework to address this challenge, which leverages AI models trained exclusively on healthy individuals. The COnfidence-Based chaRacterization of Anomalies (COBRA) score exploits the decrease in confidence of these models when presented with impaired or diseased patients to quantify their deviation from the healthy population. We applied the COBRA score to address a key limitation of current clinical evaluation of upper-body impairment in stroke patients. The gold-standard Fugl-Meyer Assessment (FMA) requires in-person administration by a trained assessor for 30-45 minutes, which restricts monitoring frequency and precludes physicians from adapting rehabilitation protocols to the progress of each patient. The COBRA score, computed automatically in under one minute, is shown to be strongly correlated with the FMA on an independent test cohort for two different data modalities: wearable sensors (ρ = 0.814, 95% CI [0.700,0.888]) and video (ρ = 0.736, 95% C.I [0.584, 0.838]). To demonstrate the generalizability of the approach to other conditions, the COBRA score was also applied to quantify severity of knee osteoarthritis from magnetic-resonance imaging scans, again achieving significant correlation with an independent clinical assessment (ρ = 0.644, 95% C.I [0.585,0.696]).
中文摘要: 自动评估损伤和疾病严重程度是数据驱动医学的一个关键挑战。我们提出了一个框架来应对这一挑战,该框架利用专门针对健康个体训练的人工智能模型。基于置信度的异常特征 (COBRA) 评分利用这些模型在出现受损或患病患者时置信度的下降来量化他们与健康人群的偏差。我们应用 COBRA 评分来解决当前中风患者上身损伤临床评估的一个关键限制。黄金标准 Fugl-Meyer 评估 (FMA) 需要训练有素的评估员亲自进行 30-45 分钟的管理,这限制了监测频率,并妨碍医生根据每位患者的进展情况调整康复方案。 COBRA 分数在一分钟内自动计算,结果显示与独立测试队列中两种不同数据模式的 FMA 密切相关:可穿戴传感器(ρ = 0.814,95% CI [0.700,0.888])和视频(ρ = 0.736,95% CI [0.584, 0.838])。为了证明该方法对其他疾病的普适性,还应用 COBRA 评分来量化磁共振成像扫描中膝骨关节炎的严重程度,再次与独立临床评估取得显着相关性(ρ = 0.644,95% C.I [0.585,0.696])。
195. A systematic umbrella review and meta-meta-analysis of eHealth and mHealth interventions for improving lifestyle behaviours.
对改善生活方式行为的电子医疗和移动医疗干预措施进行系统性总体审查和荟萃分析。
PMID: 38969775 | DOI: 10.1038/s41746-024-01172-y | 日期: 2024-07-05
摘要: The aim of this meta-meta-analysis was to systematically review randomised controlled trial (RCT) evidence examining the effectiveness of e- and m-Health interventions designed to improve physical activity, sedentary behaviour, healthy eating and sleep. Nine electronic databases were searched for eligible studies published from inception to 1 June 2023. Systematic reviews with meta-analyses of RCTs that evaluate e- and m-Health interventions designed to improve physical activity, sedentary behaviour, sleep and healthy eating in any adult population were included. Forty-seven meta-analyses were included, comprising of 507 RCTs and 206,873 participants. Interventions involved mobile apps, web-based and SMS interventions, with 14 focused on physical activity, 3 for diet, 4 for sleep and 26 evaluating multiple behaviours. Meta-meta-analyses showed that e- and m-Health interventions resulted in improvements in steps/day (mean difference, MD = 1329 [95% CI = 593.9, 2065.7] steps/day), moderate-to-vigorous physical activity (MD = 55.1 [95% CI = 13.8, 96.4] min/week), total physical activity (MD = 44.8 [95% CI = 21.6, 67.9] min/week), sedentary behaviour (MD = -426.3 [95% CI = -850.2, -2.3] min/week), fruit and vegetable consumption (MD = 0.57 [95% CI = 0.11, 1.02] servings/day), energy intake (MD = -102.9 kcals/day), saturated fat consumption (MD = -5.5 grams/day), and bodyweight (MD = -1.89 [95% CI = -2.42, -1.36] kg). Analyses based on standardised mean differences (SMD) showed improvements in sleep quality (SMD = 0.56, 95% CI = 0.40, 0.72) and insomnia severity (SMD = -0.90, 95% CI = -1.14, -0.65). Most subgroup analyses were not significant, suggesting that a variety of e- and m-Health interventions are effective across diverse age and health populations. These interventions offer scalable and accessible approaches to help individuals adopt and sustain healthier behaviours, with implications for broader public health and healthcare challenges.
中文摘要: 这项荟萃分析的目的是系统地回顾随机对照试验(RCT)证据,以检验旨在改善身体活动、久坐行为、健康饮食和睡眠的电子和移动健康干预措施的有效性。检索了九个电子数据库,查找从开始到 2023 年 6 月 1 日发表的合格研究。其中包括对旨在改善任何成年人群的身体活动、久坐行为、睡眠和健康饮食的电子和移动健康干预措施进行评估和随机对照试验荟萃分析的系统评价。纳入了 47 项荟萃分析,其中包括 507 项随机对照试验和 206,873 名参与者。干预措施涉及移动应用程序、基于网络的干预措施和短信干预措施,其中 14 项重点关注身体活动,3 项重点关注饮食,4 项重点关注睡眠,26 项重点评估多种行为。荟萃分析显示,电子和移动健康干预措施改善了每日步数(平均差,MD = 1329 [95% CI = 593.9, 2065.7]步/天)、中度至剧烈体力活动(MD = 55.1 [95% CI = 13.8, 96.4])分钟/周)、总体力活动(MD = 44.8 [95% CI = 21.6, 67.9] 分钟/周)、久坐行为(MD = -426.3 [95% CI = -850.2, -2.3] 分钟/周)、水果和蔬菜摄入量(MD = 0.57 [95% CI = 0.11, 1.02] 份/天)、能量摄入量(MD = -102.9 kcals/天)、饱和脂肪消耗量(MD = -5.5 克/天)和体重(MD = -1.89 [95% CI = -2.42, -1.36] kg)。基于标准化平均差(SMD)的分析显示,睡眠质量(SMD = 0.56,95% CI = 0.40,0.72)和失眠严重程度(SMD = -0.90,95% CI = -1.14,-0.65)有所改善。大多数亚组分析并不显着,这表明各种电子和移动医疗干预措施对不同年龄和健康人群都有效。这些干预措施提供了可扩展且易于使用的方法,帮助个人采取和维持更健康的行为,对更广泛的公共卫生和医疗保健挑战产生影响。
196. Competing interests: digital health and indigenous data sovereignty.
利益竞争:数字健康和本土数据主权。
PMID: 38965365 | DOI: 10.1038/s41746-024-01171-z | 日期: 2024-07-04
摘要: Digital health is increasingly promoting open health data. Although this open approach promises a number of benefits, it also leads to tensions with Indigenous data sovereignty movements led by Indigenous peoples around the world who are asserting control over the use of health data as a part of self-determination. Digital health has a role in improving access to services and delivering improved health outcomes for Indigenous communities. However, we argue that in order to be effective and ethical, it is essential that the field engages more with Indigenous peoples´ rights and interests. We discuss challenges and possible improvements for data acquisition, management, analysis, and integration as they pertain to the health of Indigenous communities around the world.
中文摘要: 数字健康正在日益推动开放健康数据。尽管这种开放方法带来了许多好处,但它也导致了与世界各地土著人民领导的土著数据主权运动的紧张关系,这些运动主张对健康数据的使用进行控制,作为自决的一部分。数字健康在改善原住民社区获得服务的机会和改善健康结果方面发挥着作用。然而,我们认为,为了有效和道德,该领域必须更多地关注土著人民的权利和利益。我们讨论数据采集、管理、分析和集成方面的挑战和可能的改进,因为它们与世界各地原住民社区的健康有关。
197. Pediatric sex estimation using AI-enabled ECG analysis: influence of pubertal development.
使用人工智能心电图分析进行儿科性别估计:青春期发育的影响。
PMID: 38956410 | DOI: 10.1038/s41746-024-01165-x | 日期: 2024-07-02
摘要: AI-enabled ECGs have previously been shown to accurately predict patient sex in adults and correlate with sex hormone levels. We aimed to test the ability of AI-enabled ECGs to predict sex in the pediatric population and study the influence of pubertal development. AI-enabled ECG models were created using a convolutional neural network trained on pediatric 10-second, 12-lead ECGs. The first model was trained de novo using pediatric data. The second model used transfer learning from a previously validated adult data-derived algorithm. We analyzed the first ECG from 90,133 unique pediatric patients (aged ≤18 years) recorded between 1987-2022, and divided the cohort into training, validation, and testing datasets. Subgroup analysis was performed on prepubertal (0-7 years), peripubertal (8-14 years), and postpubertal (15-18 years) patients. The cohort was 46.7% male, with 21,678 prepubertal, 26,740 peripubertal, and 41,715 postpubertal children. The de novo pediatric model demonstrated 81% accuracy and an area under the curve (AUC) of 0.91. Model sensitivity was 0.79, specificity was 0.83, positive predicted value was 0.84, and the negative predicted value was 0.78, for the entire test cohort. The model’s discriminatory ability was highest in postpubertal (AUC = 0.98), lower in the peripubertal age group (AUC = 0.91), and poor in the prepubertal age group (AUC = 0.67). There was no significant performance difference observed between the transfer learning and de novo models. AI-enabled interpretation of ECG can estimate sex in peripubertal and postpubertal children with high accuracy.
中文摘要: 此前,人工智能心电图已被证明可以准确预测成人患者性别,并与性激素水平相关。我们的目的是测试人工智能心电图预测儿科人群性别的能力,并研究青春期发育的影响。支持 AI 的心电图模型是使用在儿科 10 秒 12 导联心电图上训练的卷积神经网络创建的。第一个模型是使用儿科数据从头训练的。第二个模型使用了先前验证的成人数据派生算法的迁移学习。我们分析了 1987 年至 2022 年间记录的 90,133 名独特儿科患者(年龄≤18 岁)的第一份心电图,并将该队列分为训练、验证和测试数据集。对青春期前(0-7岁)、青春期前后(8-14岁)和青春期后(15-18岁)患者进行亚组分析。该队列中男性占 46.7%,其中包括 21,678 名青春期前儿童、26,740 名青春期前后儿童和 41,715 名青春期后儿童。 de novo 儿科模型的准确度为 81%,曲线下面积 (AUC) 为 0.91。对于整个测试队列,模型敏感性为 0.79,特异性为 0.83,阳性预测值为 0.84,阴性预测值为 0.78。该模型的辨别能力在青春期后组最高(AUC = 0.98),在青春期周围年龄组较低(AUC = 0.91),在青春期前年龄组较差(AUC = 0.67)。迁移学习模型和从头模型之间没有观察到显着的性能差异。基于人工智能的心电图解读可以高精度地估计青春期前后和青春期后儿童的性别。
198. How can regulation and reimbursement better accommodate flexible suites of digital health technologies?
监管和报销如何更好地适应灵活的数字医疗技术套件?
PMID: 38956279 | DOI: 10.1038/s41746-024-01156-y | 日期: 2024-07-02
摘要: Individual digital health devices are increasingly being bundled together as interacting, multicomponent suites, to deliver clinical services (e.g., teleconsultation and ‘hospital-at-home services’). In the first article of this two-article series we described the challenges in implementation and the current limitations in frameworks for the regulation, health technology assessment, and reimbursement of these device suites and linked novel care pathways. A flexible and fit-for-purpose evaluation framework that can analyze the strengths and weaknesses of digital technology suites is needed. In this second article we describe adaptations that could enable this new technological paradigm while maintaining patient safety and fair value.
中文摘要: 个人数字健康设备越来越多地捆绑在一起作为交互的多组件套件,以提供临床服务(例如远程会诊和“家庭医院服务”)。在这个两篇文章系列的第一篇文章中,我们描述了实施中的挑战以及这些设备套件的监管、卫生技术评估和报销框架的当前限制以及相关的新颖护理途径。需要一个灵活且适合目的的评估框架来分析数字技术套件的优势和劣势。在第二篇文章中,我们描述了可以在保持患者安全和公平价值的同时实现这种新技术范例的调整。
199. Recommendations to advance digital health equity: a systematic review of qualitative studies.
促进数字健康公平的建议:定性研究的系统回顾。
PMID: 38951666 | DOI: 10.1038/s41746-024-01177-7 | 日期: 2024-06-29
摘要: The World Health Organisation advocates Digital Health Technologies (DHTs) for advancing population health, yet concerns about inequitable outcomes persist. Differences in access and use of DHTs across different demographic groups can contribute to inequities. Academics and policy makers have acknowledged this issue and called for inclusive digital health strategies. This systematic review synthesizes literature on these strategies and assesses facilitators and barriers to their implementation. We searched four large databases for qualitative studies using terms relevant to digital technology, health inequities, and socio-demographic factors associated with digital exclusion summarised by the CLEARS framework (Culture, Limiting conditions, Education, Age, Residence, Socioeconomic status). Following the PRISMA guidelines, 10,401 articles were screened independently by two reviewers, with ten articles meeting our inclusion criteria. Strategies were grouped into either outreach programmes or co-design approaches. Narrative synthesis of these strategies highlighted three key themes: firstly, using user-friendly designs, which included software and website interfaces that were easy to navigate and compatible with existing devices, culturally appropriate content, and engaging features. Secondly, providing supportive infrastructure to users, which included devices, free connectivity, and non-digital options to help access healthcare. Thirdly, providing educational support from family, friends, or professionals to help individuals develop their digital literacy skills to support the use of DHTs. Recommendations for advancing digital health equity include adopting a collaborative working approach to meet users’ needs, and using effective advertising to raise awareness of the available support. Further research is needed to assess the feasibility and impact of these recommendations in practice.
中文摘要: 世界卫生组织提倡数字健康技术 (DHT) 来促进人口健康,但对不公平结果的担忧依然存在。不同人口群体在获取和使用 DHT 方面的差异可能会导致不平等。学术界和政策制定者已经认识到这个问题,并呼吁制定包容性的数字健康战略。本系统综述综合了有关这些策略的文献,并评估了其实施的促进因素和障碍。我们使用与数字技术、健康不平等以及与 CLEARS 框架(文化、限制条件、教育、年龄、居住地、社会经济地位)总结的数字排斥相关的社会人口因素相关的术语,检索了四个大型数据库进行定性研究。遵循 PRISMA 指南,两位审稿人独立筛选了 10,401 篇文章,其中 10 篇文章符合我们的纳入标准。策略分为外展计划或共同设计方法。这些策略的叙述综合突出了三个关键主题:首先,使用用户友好的设计,其中包括易于导航并与现有设备兼容的软件和网站界面、适合文化的内容和引人入胜的功能。其次,为用户提供支持性基础设施,其中包括设备、免费连接和非数字选项,以帮助获得医疗保健。第三,提供家人、朋友或专业人士的教育支持,帮助个人发展数字素养技能,以支持 DHT 的使用。促进数字健康公平的建议包括采用协作工作方法来满足用户的需求,并使用有效的广告来提高对可用支持的认识。需要进一步研究来评估这些建议在实践中的可行性和影响。
200. Do engagement and behavioural mechanisms underpin the effectiveness of the Drink Less app?
参与和行为机制是否支撑“少喝”应用程序的有效性?
PMID: 38951560 | DOI: 10.1038/s41746-024-01169-7 | 日期: 2024-06-29
摘要: This is a process evaluation of a large UK-based randomised controlled trial (RCT) (n = 5602) evaluating the effectiveness of recommending an alcohol reduction app, Drink Less, compared with usual digital care in reducing alcohol consumption in increasing and higher risk drinkers. The aim was to understand whether participants’ engagement (‘self-reported adherence’) and behavioural characteristics were mechanisms of action underpinning the effectiveness of Drink Less. Self-reported adherence with both digital tools was over 70% (Drink Less: 78.0%, 95% CI = 77.6-78.4; usual digital care: 71.5%, 95% CI = 71.0-71.9). Self-reported adherence to the intervention (average causal mediation effect [ACME] = -0.250, 95% CI = -0.42, -0.11) and self-monitoring behaviour (ACME = -0.235, 95% CI = -0.44, -0.03) both partially mediated the effect of the intervention (versus comparator) on alcohol reduction. Following the recommendation (self-reported adherence) and the tracking (self-monitoring behaviour) feature of the Drink Less app appear to be important mechanisms of action for alcohol reduction among increasing and higher risk drinkers.
中文摘要: 这是对英国一项大型随机对照试验 (RCT) (n = 5602) 的过程评估,该试验评估了推荐戒酒应用程序 Drink Less 与常规数字护理在减少饮酒风险增加和较高风险者的饮酒量方面的有效性。目的是了解参与者的参与(“自我报告的遵守情况”)和行为特征是否是支撑“少饮”有效性的行动机制。自我报告的两种数字工具的依从性超过 70%(少喝:78.0%,95% CI = 77.6-78.4;常规数字护理:71.5%,95% CI = 71.0-71.9)。自我报告的对干预的依从性(平均因果中介效应[ACME] = -0.250,95% CI = -0.42,-0.11)和自我监控行为(ACME = -0.235,95% CI = -0.44,-0.03)都部分介导了干预的效果减少饮酒的干预措施(与对照相比)。遵循“少喝”应用程序的建议(自我报告的遵守情况)和跟踪(自我监控行为)功能似乎是减少饮酒风险增加和较高风险的重要行动机制。
201. Deep learning to quantify care manipulation activities in neonatal intensive care units.
深度学习量化新生儿重症监护病房的护理操作活动。
PMID: 38937643 | DOI: 10.1038/s41746-024-01164-y | 日期: 2024-06-27
摘要: Early-life exposure to stress results in significantly increased risk of neurodevelopmental impairments with potential long-term effects into childhood and even adulthood. As a crucial step towards monitoring neonatal stress in neonatal intensive care units (NICUs), our study aims to quantify the duration, frequency, and physiological responses of care manipulation activities, based on bedside videos and physiological signals. Leveraging 289 h of video recordings and physiological data within 330 sessions collected from 27 neonates in 2 NICUs, we develop and evaluate a deep learning method to detect manipulation activities from the video, to estimate their duration and frequency, and to further integrate physiological signals for assessing their responses. With a 13.8% relative error tolerance for activity duration and frequency, our results were statistically equivalent to human annotations. Further, our method proved effective for estimating short-term physiological responses, for detecting activities with marked physiological deviations, and for quantifying the neonatal infant stressor scale scores.
中文摘要: 生命早期暴露于压力会导致神经发育障碍的风险显着增加,并可能对儿童甚至成年产生长期影响。作为监测新生儿重症监护病房 (NICU) 新生儿应激的关键一步,我们的研究旨在根据床边视频和生理信号量化护理操作活动的持续时间、频率和生理反应。利用从 2 个 NICU 的 27 名新生儿收集的 330 次会话中的 289 个小时的视频记录和生理数据,我们开发和评估了一种深度学习方法,以检测视频中的操作活动,估计其持续时间和频率,并进一步整合生理信号以评估他们的反应。活动持续时间和频率的相对误差容限为 13.8%,我们的结果在统计上与人工注释相当。此外,我们的方法被证明可以有效地估计短期生理反应、检测具有明显生理偏差的活动以及量化新生儿应激源量表分数。
202. A multi-center study on the adaptability of a shared foundation model for electronic health records.
电子健康记录共享基础模型适应性的多中心研究。
PMID: 38937550 | DOI: 10.1038/s41746-024-01166-w | 日期: 2024-06-27
摘要: Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across hospitals and their performance in local tasks. This multi-center study examined the adaptability of a publicly accessible structured EHR foundation model (FMSM), trained on 2.57 M patient records from Stanford Medicine. Experiments used EHR data from The Hospital for Sick Children (SickKids) and Medical Information Mart for Intensive Care (MIMIC-IV). We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of locally training models from scratch, including a local foundation model. Evaluations on 8 clinical prediction tasks showed that adapting the off-the-shelf FMSM matched the performance of gradient boosting machines (GBM) locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. Continued pretraining on local data showed FMSM required fewer than 1% of training examples to match the fully trained GBM’s performance, and was 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings demonstrate that adapting EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.
中文摘要: 基础模型通过提供适用于各种下游任务的模块化组件,正在改变医疗保健领域的人工智能 (AI),使 AI 开发更具可扩展性和成本效益。结构化电子健康记录 (EHR) 的基础模型经过数百万患者的编码医疗记录的训练,证明了其好处,包括通过更少的训练标签提高性能,以及提高对分布变化的鲁棒性。然而,在医院之间共享这些模型的可行性及其在本地任务中的表现仍然存在疑问。这项多中心研究检验了可公开访问的结构化 EHR 基础模型 (FMSM) 的适应性,该模型基于斯坦福大学医学院的 2.57 M 患者记录进行了训练。实验使用来自病童医院 (SickKids) 和重症监护医疗信息集市 (MIMIC-IV) 的 EHR 数据。我们通过对本地数据的持续预训练来评估适应性,并与从头开始的本地训练模型的基线(包括本地基础模型)相比来评估任务适应性。对 8 项临床预测任务的评估表明,采用现成的 FMSM 与在所有数据上进行本地训练的梯度增强机 (GBM) 的性能相匹配,同时在几乎没有特定于任务的训练标签的设置中提供了 13% 的改进。对本地数据的持续预训练表明,FMSM 只需不到 1% 的训练样本即可匹配经过充分训练的 GBM 的性能,并且样本效率比从头开始训练本地基础模型高 60% 到 90%。我们的研究结果表明,跨医院调整 EHR 基础模型可以以更低的成本提高预测性能,强调基础基础模型作为模块化组件的实用性,可以简化医疗保健人工智能的开发。
203. Wearable sensor-based quantitative gait analysis in Parkinson’s disease patients with different motor subtypes.
基于可穿戴传感器的不同运动亚型帕金森病患者的定量步态分析。
PMID: 38926552 | DOI: 10.1038/s41746-024-01163-z | 日期: 2024-06-26
摘要: Gait impairments are among the most common and disabling symptoms of Parkinson’s disease and worsen as the disease progresses. Early detection and diagnosis of subtype-specific gait deficits, as well as progression monitoring, can help to implement effective and preventive personalized treatment for PD patients. Yet, the gait features have not been fully studied in PD and its motor subtypes. To characterize comprehensive and objective gait alterations and to identify the potential gait biomarkers for early diagnosis, subtype differentiation, and disease severity monitoring. We analyzed gait parameters related to upper/lower limbs, trunk and lumbar, and postural transitions from 24 tremor-dominant (TD) and 20 postural instability gait difficulty (PIGD) dominant PD patients who were in early stage and 39 matched healthy controls (HC) during the Timed Up and Go test using wearable sensors. Results show: (1) Both TD and PIGD groups showed restricted backswing range in bilateral lower extremities and more affected side (MAS) arm, reduced trunk and lumbar rotation range in the coronal plane, and low turning efficiency. The receiver operating characteristic (ROC) analysis revealed these objective gait features had high discriminative value in distinguishing both PD subtypes from the HC with the area under the curve (AUC) values of 0.7~0.9 (p < 0.01). (2) Subtle but measurable gait differences existed between TD and PIGD patients before the onset of clinically apparent gait impairment. (3) Specific gait parameters were significantly associated with disease severity in TD and PIGD subtypes. Objective gait biomarkers based on wearable sensors may facilitate timely and personalized gait treatments in PD subtypes through early diagnosis, subtype differentiation, and disease severity monitoring.
中文摘要: 步态障碍是帕金森病最常见和致残的症状之一,并且随着疾病的进展而恶化。早期检测和诊断亚型特异性步态缺陷以及进展监测,有助于对帕金森病患者实施有效的预防性个性化治疗。然而,帕金森病及其运动亚型的步态特征尚未得到充分研究。表征全面、客观的步态改变,并确定用于早期诊断、亚型区分和疾病严重程度监测的潜在步态生物标志物。我们使用可穿戴传感器在 Timed Up and Go 测试中分析了与上/下肢、躯干和腰椎相关的步态参数,以及 24 名以震颤为主 (TD) 和 20 名以姿势不稳定步态困难 (PIGD) 为主的早期 PD 患者和 39 名匹配的健康对照 (HC) 的姿势转变。结果显示:(1)TD组和PIGD组均表现出双侧下肢和受影响侧(MAS)臂的后摆幅度受限,躯干和腰椎在冠状面的旋转幅度减少,转身效率低。受试者工作特征(ROC)分析显示,这些客观步态特征在区分PD亚型和HC亚型方面具有较高的判别价值,曲线下面积(AUC)值为0.7~0.9(p<0.01)。 (2) 在出现临床明显步态障碍之前,TD 和 PIGD 患者之间存在细微但可测量的步态差异。 (3) TD 和 PIGD 亚型的特定步态参数与疾病严重程度显着相关。基于可穿戴传感器的客观步态生物标志物可以通过早期诊断、亚型区分和疾病严重程度监测,促进及时、个性化的PD亚型步态治疗。
204. Artificial intelligence-enhanced electrocardiography derived body mass index as a predictor of future cardiometabolic disease.
人工智能增强心电图得出的体重指数作为未来心脏代谢疾病的预测因子。
PMID: 38918595 | DOI: 10.1038/s41746-024-01170-0 | 日期: 2024-06-25
摘要: The electrocardiogram (ECG) can capture obesity-related cardiac changes. Artificial intelligence-enhanced ECG (AI-ECG) can identify subclinical disease. We trained an AI-ECG model to predict body mass index (BMI) from the ECG alone. Developed from 512,950 12-lead ECGs from the Beth Israel Deaconess Medical Center (BIDMC), a secondary care cohort, and validated on UK Biobank (UKB) (n = 42,386), the model achieved a Pearson correlation coefficient ® of 0.65 and 0.62, and an R2 of 0.43 and 0.39 in the BIDMC cohort and UK Biobank, respectively for AI-ECG BMI vs. measured BMI. We found delta-BMI, the difference between measured BMI and AI-ECG-predicted BMI (AI-ECG-BMI), to be a biomarker of cardiometabolic health. The top tertile of delta-BMI showed increased risk of future cardiometabolic disease (BIDMC: HR 1.15, p < 0.001; UKB: HR 1.58, p < 0.001) and diabetes mellitus (BIDMC: HR 1.25, p < 0.001; UKB: HR 2.28, p < 0.001) after adjusting for covariates including measured BMI. Significant enhancements in model fit, reclassification and improvements in discriminatory power were observed with the inclusion of delta-BMI in both cohorts. Phenotypic profiling highlighted associations between delta-BMI and cardiometabolic diseases, anthropometric measures of truncal obesity, and pericardial fat mass. Metabolic and proteomic profiling associates delta-BMI positively with valine, lipids in small HDL, syntaxin-3, and carnosine dipeptidase 1, and inversely with glutamine, glycine, colipase, and adiponectin. A genome-wide association study revealed associations with regulators of cardiovascular/metabolic traits, including SCN10A, SCN5A, EXOG and RXRG. In summary, our AI-ECG-BMI model accurately predicts BMI and introduces delta-BMI as a non-invasive biomarker for cardiometabolic risk stratification.
中文摘要: 心电图(ECG)可以捕捉与肥胖相关的心脏变化。人工智能增强心电图(AI-ECG)可以识别亚临床疾病。我们训练了一个 AI-ECG 模型,仅根据 ECG 来预测体重指数 (BMI)。该模型由二级护理队列贝斯以色列女执事医疗中心 (BIDMC) 的 512,950 份 12 导联心电图开发而成,并在英国生物银行 (UKB) 上进行了验证(n = 42,386),该模型的 Pearson 相关系数 ® 为 0.65 和 0.62,R2 为 0.43 和BIDMC 队列和英国生物银行的 AI-ECG BMI 与测量的 BMI 分别为 0.39。我们发现 delta-BMI,即测量的 BMI 和 AI-ECG 预测的 BMI (AI-ECG-BMI) 之间的差异,是心脏代谢健康的生物标志物。 Delta-BMI 的前三分位显示未来心脏代谢疾病(BIDMC:HR 1.15,p < 0.001;UKB:HR 1.58,p < 0.001)和糖尿病(BIDMC:HR 1.25,p < 0.001;UKB:HR 2.28, p < 0.001)在调整包括测量的 BMI 在内的协变量后。在两个队列中纳入 delta-BMI 后,我们观察到模型拟合、重新分类和辨别能力显着增强。表型分析强调了 delta-BMI 与心脏代谢疾病、躯干肥胖的人体测量指标和心包脂肪量之间的关联。代谢和蛋白质组学分析将 delta-BMI 与缬氨酸、小 HDL 中的脂质、突触蛋白 3 和肌肽酶 1 呈正相关,与谷氨酰胺、甘氨酸、辅脂酶和脂联素呈负相关。一项全基因组关联研究揭示了与心血管/代谢特征调节因子的关联,包括 SCN10A、SCN5A、EXOG 和 RXRG。总之,我们的 AI-ECG-BMI 模型可以准确预测 BMI,并引入 delta-BMI 作为心脏代谢风险分层的非侵入性生物标志物。
205. A scoping review assessing the usability of digital health technologies targeting people with multiple sclerosis.
一项范围审查,评估针对多发性硬化症患者的数字健康技术的可用性。
PMID: 38918483 | DOI: 10.1038/s41746-024-01162-0 | 日期: 2024-06-25
摘要: Digital health technologies (DHTs) have become progressively more integrated into the healthcare of people with multiple sclerosis (MS). To ensure that DHTs meet end-users’ needs, it is essential to assess their usability. The objective of this study was to determine how DHTs targeting people with MS incorporate usability characteristics into their design and/or evaluation. We conducted a scoping review of DHT studies in MS published from 2010 to the present using PubMed, Web of Science, OVID Medline, CINAHL, Embase, and medRxiv. Covidence was used to facilitate the review. We included articles that focused on people with MS and/or their caregivers, studied DHTs (including mhealth, telehealth, and wearables), and employed quantitative, qualitative, or mixed methods designs. Thirty-two studies that assessed usability were included, which represents a minority of studies (26%) that assessed DHTs in MS. The most common DHT was mobile applications (n = 23, 70%). Overall, studies were highly heterogeneous with respect to what usability principles were considered and how usability was assessed. These findings suggest that there is a major gap in the application of standardized usability assessments to DHTs in MS. Improvements in the standardization of usability assessments will have implications for the future of digital health care for people with MS.
中文摘要: 数字健康技术 (DHT) 已逐渐融入多发性硬化症 (MS) 患者的医疗保健中。为了确保 DHT 满足最终用户的需求,评估其可用性至关重要。本研究的目的是确定针对多发性硬化症患者的 DHT 如何将可用性特征纳入其设计和/或评估中。我们对 2010 年至今使用 PubMed、Web of Science、OVID Medline、CINAHL、Embase 和 medRxiv 发表的 MS 中的 DHT 研究进行了范围界定审查。 Covidence 用于促进审查。我们纳入了关注多发性硬化症患者和/或其护理人员的文章,研究了 DHT(包括移动医疗、远程医疗和可穿戴设备),并采用了定量、定性或混合方法设计。其中包括 32 项评估可用性的研究,这代表了评估 DHT 在 MS 中的研究的少数(26%)。最常见的 DHT 是移动应用程序(n = 23,70%)。总体而言,在考虑哪些可用性原则以及如何评估可用性方面,研究存在很大差异。这些发现表明,在 MS 中 DHT 标准化可用性评估的应用方面存在重大差距。可用性评估标准化的改进将对多发性硬化症患者的数字医疗保健的未来产生影响。
206. Can artificial intelligence improve medicine’s uncomfortable relationship with Maths?
人工智能能否改善医学与数学之间不舒服的关系?
PMID: 38909083 | DOI: 10.1038/s41746-024-01168-8 | 日期: 2024-06-22
207. Validation and application of computer vision algorithms for video-based tremor analysis.
基于视频的震颤分析的计算机视觉算法的验证和应用。
PMID: 38906946 | DOI: 10.1038/s41746-024-01153-1 | 日期: 2024-06-21
摘要: Tremor is one of the most common neurological symptoms. Its clinical and neurobiological complexity necessitates novel approaches for granular phenotyping. Instrumented neurophysiological analyses have proven useful, but are highly resource-intensive and lack broad accessibility. In contrast, bedside scores are simple to administer, but lack the granularity to capture subtle but relevant tremor features. We utilise the open-source computer vision pose tracking algorithm Mediapipe to track hands in clinical video recordings and use the resulting time series to compute canonical tremor features. This approach is compared to marker-based 3D motion capture, wrist-worn accelerometry, clinical scoring and a second, specifically trained tremor-specific algorithm in two independent clinical cohorts. These cohorts consisted of 66 patients diagnosed with essential tremor, assessed in different task conditions and states of deep brain stimulation therapy. We find that Mediapipe-derived tremor metrics exhibit high convergent clinical validity to scores (Spearman’s ρ = 0.55-0.86, p≤ .01) as well as an accuracy of up to 2.60 mm (95% CI [-3.13, 8.23]) and ≤0.21 Hz (95% CI [-0.05, 0.46]) for tremor amplitude and frequency measurements, matching gold-standard equipment. Mediapipe, but not the disease-specific algorithm, was capable of analysing videos involving complex configurational changes of the hands. Moreover, it enabled the extraction of tremor features with diagnostic and prognostic relevance, a dimension which conventional tremor scores were unable to provide. Collectively, this demonstrates that current computer vision algorithms can be transformed into an accurate and highly accessible tool for video-based tremor analysis, yielding comparable results to gold standard tremor recordings.
中文摘要: 震颤是最常见的神经系统症状之一。其临床和神经生物学的复杂性需要新的颗粒表型分析方法。仪器化神经生理学分析已被证明是有用的,但资源高度密集且缺乏广泛的可及性。相比之下,床边评分易于管理,但缺乏捕捉细微但相关震颤特征的粒度。我们利用开源计算机视觉姿势跟踪算法 Mediapipe 来跟踪临床视频记录中的手部,并使用生成的时间序列来计算典型震颤特征。在两个独立的临床队列中,将该方法与基于标记的 3D 运动捕捉、腕戴式加速度测量、临床评分以及第二个经过专门训练的震颤特异性算法进行比较。这些队列由 66 名被诊断患有原发性震颤的患者组成,在不同的任务条件和深部脑刺激治疗状态下进行评估。我们发现 Mediapipe 衍生的震颤指标对评分表现出较高的收敛临床有效性 (Spearman’s ρ = 0.55-0.86, p≤ .01),准确度高达 2.60 mm (95% CI [-3.13, 8.23]) 和 ≤0.21 Hz (95% CI [-0.05, 0.46])用于震颤幅度和频率测量,匹配金标准设备。 Mediapipe(但不是针对特定疾病的算法)能够分析涉及复杂的手部配置变化的视频。此外,它还能够提取具有诊断和预后相关性的震颤特征,这是传统震颤评分无法提供的维度。总的来说,这表明当前的计算机视觉算法可以转变为一种准确且易于访问的工具,用于基于视频的震颤分析,产生与金标准震颤记录相当的结果。
208. Development of an effective predictive screening tool for prostate cancer using the ClarityDX machine learning platform.
使用 ClarityDX 机器学习平台开发有效的前列腺癌预测筛查工具。
PMID: 38902526 | DOI: 10.1038/s41746-024-01167-9 | 日期: 2024-06-20
摘要: The current prostate cancer (PCa) screen test, prostate-specific antigen (PSA), has a high sensitivity for PCa but low specificity for high-risk, clinically significant PCa (csPCa), resulting in overdiagnosis and overtreatment of non-csPCa. Early identification of csPCa while avoiding unnecessary biopsies in men with non-csPCa is challenging. We built an optimized machine learning platform (ClarityDX) and showed its utility in generating models predicting csPCa. Integrating the ClarityDX platform with blood-based biomarkers for clinically significant PCa and clinical biomarker data from a 3448-patient cohort, we developed a test to stratify patients’ risk of csPCa; called ClarityDX Prostate. When predicting high risk cancer in the validation cohort, ClarityDX Prostate showed 95% sensitivity, 35% specificity, 54% positive predictive value, and 91% negative predictive value, at a ≥ 25% threshold. Using ClarityDX Prostate at this threshold could avoid up to 35% of unnecessary prostate biopsies. ClarityDX Prostate showed higher accuracy for predicting the risk of csPCa than PSA alone and the tested model-based risk calculators. Using this test as a reflex test in men with elevated PSA levels may help patients and their healthcare providers decide if a prostate biopsy is necessary.
中文摘要: 目前的前列腺癌(PCa)筛查测试前列腺特异性抗原(PSA)对PCa具有较高的敏感性,但对高风险、有临床意义的PCa(csPCa)特异性较低,导致对非csPCa的过度诊断和过度治疗。早期识别 csPCa 同时避免对非 csPCa 男性进行不必要的活检具有挑战性。我们构建了一个优化的机器学习平台 (ClarityDX),并展示了其在生成预测 csPCa 的模型方面的实用性。将 ClarityDX 平台与具有临床意义的 PCa 血液生物标志物以及来自 3448 名患者队列的临床生物标志物数据相结合,我们开发了一项测试来对患者的 csPCa 风险进行分层;称为 ClarityDX 前列腺。在预测验证队列中的高风险癌症时,ClarityDX Prostate 在 25% 阈值下显示出 95% 的敏感性、35% 的特异性、54% 的阳性预测值和 91% 的阴性预测值。在此阈值下使用 ClarityDX Prostate 可以避免高达 35% 的不必要的前列腺活检。 ClarityDX Prostate 在预测 csPCa 风险方面表现出比单独使用 PSA 和经过测试的基于模型的风险计算器更高的准确性。使用此测试作为 PSA 水平升高的男性的反射测试可以帮助患者及其医疗保健提供者决定是否需要进行前列腺活检。
209. Development and validation of a smartphone-based deep-learning-enabled system to detect middle-ear conditions in otoscopic images.
开发和验证基于智能手机的深度学习系统,用于检测耳镜图像中的中耳状况。
PMID: 38902477 | DOI: 10.1038/s41746-024-01159-9 | 日期: 2024-06-20
摘要: Middle-ear conditions are common causes of primary care visits, hearing impairment, and inappropriate antibiotic use. Deep learning (DL) may assist clinicians in interpreting otoscopic images. This study included patients over 5 years old from an ambulatory ENT practice in Strasbourg, France, between 2013 and 2020. Digital otoscopic images were obtained using a smartphone-attached otoscope (Smart Scope, Karl Storz, Germany) and labeled by a senior ENT specialist across 11 diagnostic classes (reference standard). An Inception-v2 DL model was trained using 41,664 otoscopic images, and its diagnostic accuracy was evaluated by calculating class-specific estimates of sensitivity and specificity. The model was then incorporated into a smartphone app called i-Nside. The DL model was evaluated on a validation set of 3,962 images and a held-out test set comprising 326 images. On the validation set, all class-specific estimates of sensitivity and specificity exceeded 98%. On the test set, the DL model achieved a sensitivity of 99.0% (95% confidence interval: 94.5-100) and a specificity of 95.2% (91.5-97.6) for the binary classification of normal vs. abnormal images; wax plugs were detected with a sensitivity of 100% (94.6-100) and specificity of 97.7% (95.0-99.1); other class-specific estimates of sensitivity and specificity ranged from 33.3% to 92.3% and 96.0% to 100%, respectively. We present an end-to-end DL-enabled system able to achieve expert-level diagnostic accuracy for identifying normal tympanic aspects and wax plugs within digital otoscopic images. However, the system’s performance varied for other middle-ear conditions. Further prospective validation is necessary before wider clinical deployment.
中文摘要: 中耳疾病是初级保健就诊、听力障碍和抗生素使用不当的常见原因。深度学习(DL)可以帮助临床医生解释耳镜图像。这项研究包括 2013 年至 2020 年间在法国斯特拉斯堡门诊耳鼻喉诊所就诊的 5 岁以上患者。数字耳镜图像是使用智能手机连接的耳镜(Smart Scope,Karl Storz,德国)获得的,并由高级耳鼻喉科专家在 11 个诊断类别(参考标准)中进行标记。使用 41,664 个耳镜图像训练 Inception-v2 DL 模型,并通过计算特定类别的敏感性和特异性估计来评估其诊断准确性。该模型随后被整合到名为 i-Nside 的智能手机应用程序中。 DL 模型在包含 3,962 个图像的验证集和包含 326 个图像的保留测试集上进行评估。在验证集上,所有特定类别的敏感性和特异性估计值均超过 98%。在测试集上,DL 模型对正常与异常图像的二元分类实现了 99.0% 的敏感性(95% 置信区间:94.5-100)和 95.2%(91.5-97.6)的特异性;检测蜡塞的灵敏度为 100%(94.6-100),特异性为 97.7%(95.0-99.1);其他特定类别的敏感性和特异性估计值分别为 33.3% 至 92.3% 和 96.0% 至 100%。我们提出了一种支持深度学习的端到端系统,能够实现专家级的诊断准确性,用于识别数字耳镜图像中的正常鼓室方面和蜡塞。然而,该系统的性能因其他中耳状况而异。在更广泛的临床部署之前,需要进一步的前瞻性验证。
210. Five million nights: temporal dynamics in human sleep phenotypes.
五百万个夜晚:人类睡眠表型的时间动态。
PMID: 38902390 | DOI: 10.1038/s41746-024-01125-5 | 日期: 2024-06-20
摘要: Sleep monitoring has become widespread with the rise of affordable wearable devices. However, converting sleep data into actionable change remains challenging as diverse factors can cause combinations of sleep parameters to differ both between people and within people over time. Researchers have attempted to combine sleep parameters to improve detecting similarities between nights of sleep. The cluster of similar combinations of sleep parameters from a night of sleep defines that night’s sleep phenotype. To date, quantitative models of sleep phenotype made from data collected from large populations have used cross-sectional data, which preclude longitudinal analyses that could better quantify differences within individuals over time. In analyses reported here, we used five million nights of wearable sleep data to test (a) whether an individual’s sleep phenotype changes over time and (b) whether these changes elucidate new information about acute periods of illness (e.g., flu, fever, COVID-19). We found evidence for 13 sleep phenotypes associated with sleep quality and that individuals transition between these phenotypes over time. Patterns of transitions significantly differ (i) between individuals (with vs. without a chronic health condition; chi-square test; p-value < 1e-100) and (ii) within individuals over time (before vs. during an acute condition; Chi-Square test; p-value < 1e-100). Finally, we found that the patterns of transitions carried more information about chronic and acute health conditions than did phenotype membership alone (longitudinal analyses yielded 2-10× as much information as cross-sectional analyses). These results support the use of temporal dynamics in the future development of longitudinal sleep analyses.
中文摘要: 随着经济实惠的可穿戴设备的兴起,睡眠监测已变得普遍。然而,将睡眠数据转化为可操作的改变仍然具有挑战性,因为随着时间的推移,不同的因素可能会导致睡眠参数的组合在人与人之间以及人内部产生差异。研究人员试图结合睡眠参数来改进检测夜间睡眠之间的相似性。一晚睡眠中的一系列相似的睡眠参数组合定义了该晚的睡眠表型。迄今为止,根据从大量人群收集的数据建立的睡眠表型定量模型使用的是横截面数据,这排除了可以更好地量化个体随时间变化的差异的纵向分析。在此报告的分析中,我们使用了 500 万个夜晚的可穿戴设备睡眠数据来测试 (a) 个人的睡眠表型是否随时间变化,以及 (b) 这些变化是否阐明了有关急性疾病时期(例如流感、发烧、COVID-19)的新信息。我们发现了 13 种与睡眠质量相关的睡眠表型的证据,并且个体随着时间的推移在这些表型之间进行转变。转变模式在 (i) 个体之间(有慢性健康状况与无慢性健康状况;卡方检验;p 值 < 1e-100)之间和 (ii) 个体内部随时间的推移(急性疾病之前与期间;卡方检验;p 值< 1e-100)之间存在显着差异。最后,我们发现转变模式比单独的表型成员携带更多关于慢性和急性健康状况的信息(纵向分析产生的信息是横向分析的 2-10 倍)。这些结果支持在纵向睡眠分析的未来发展中使用时间动力学。
211. PatchSorter: a high throughput deep learning digital pathology tool for object labeling.
PatchSorter:一种用于对象标记的高通量深度学习数字病理学工具。
PMID: 38902336 | DOI: 10.1038/s41746-024-01150-4 | 日期: 2024-06-20
摘要: The discovery of patterns associated with diagnosis, prognosis, and therapy response in digital pathology images often requires intractable labeling of large quantities of histological objects. Here we release an open-source labeling tool, PatchSorter, which integrates deep learning with an intuitive web interface. Using >100,000 objects, we demonstrate a >7x improvement in labels per second over unaided labeling, with minimal impact on labeling accuracy, thus enabling high-throughput labeling of large datasets.
中文摘要: 在数字病理图像中发现与诊断、预后和治疗反应相关的模式通常需要对大量组织学对象进行棘手的标记。在这里,我们发布了一个开源标签工具 PatchSorter,它将深度学习与直观的 Web 界面集成在一起。使用超过 100,000 个对象,我们证明每秒标签速度比无辅助标签提高了 7 倍以上,并且对标签准确性的影响最小,从而实现了大型数据集的高吞吐量标签。
212. From wearable sensor data to digital biomarker development: ten lessons learned and a framework proposal.
从可穿戴传感器数据到数字生物标记开发:十个经验教训和框架建议。
PMID: 38890529 | DOI: 10.1038/s41746-024-01151-3 | 日期: 2024-06-18
摘要: Wearable sensor technologies are becoming increasingly relevant in health research, particularly in the context of chronic disease management. They generate real-time health data that can be translated into digital biomarkers, which can provide insights into our health and well-being. Scientific methods to collect, interpret, analyze, and translate health data from wearables to digital biomarkers vary, and systematic approaches to guide these processes are currently lacking. This paper is based on an observational, longitudinal cohort study, BarKA-MS, which collected wearable sensor data on the physical rehabilitation of people living with multiple sclerosis (MS). Based on our experience with BarKA-MS, we provide and discuss ten lessons we learned in relation to digital biomarker development across key study phases. We then summarize these lessons into a guiding framework (DACIA) that aims to informs the use of wearable sensor data for digital biomarker development and chronic disease management for future research and teaching.
中文摘要: 可穿戴传感器技术在健康研究中变得越来越重要,特别是在慢性病管理方面。它们生成实时健康数据,可以转化为数字生物标记,从而深入了解我们的健康和福祉。收集、解释、分析和将可穿戴设备的健康数据转化为数字生物标记的科学方法各不相同,目前缺乏指导这些过程的系统方法。本文基于一项观察性纵向队列研究 BarKA-MS,该研究收集了有关多发性硬化症 (MS) 患者身体康复的可穿戴传感器数据。根据我们在 BarKA-MS 方面的经验,我们提供并讨论了我们在关键研究阶段的数字生物标志物开发方面学到的十个经验教训。然后,我们将这些经验教训总结成一个指导框架(DACIA),旨在为未来的研究和教学使用可穿戴传感器数据进行数字生物标志物开发和慢性病管理提供信息。
213. Head movement dynamics in dystonia: a multi-centre retrospective study using visual perceptive deep learning.
肌张力障碍的头部运动动力学:使用视觉感知深度学习的多中心回顾性研究。
PMID: 38890413 | DOI: 10.1038/s41746-024-01140-6 | 日期: 2024-06-18
摘要: Dystonia is a neurological movement disorder characterised by abnormal involuntary movements and postures, particularly affecting the head and neck. However, current clinical assessment methods for dystonia rely on simplified rating scales which lack the ability to capture the intricate spatiotemporal features of dystonic phenomena, hindering clinical management and limiting understanding of the underlying neurobiology. To address this, we developed a visual perceptive deep learning framework that utilizes standard clinical videos to comprehensively evaluate and quantify disease states and the impact of therapeutic interventions, specifically deep brain stimulation. This framework overcomes the limitations of traditional rating scales and offers an efficient and accurate method that is rater-independent for evaluating and monitoring dystonia patients. To evaluate the framework, we leveraged semi-standardized clinical video data collected in three retrospective, longitudinal cohort studies across seven academic centres. We extracted static head angle excursions for clinical validation and derived kinematic variables reflecting naturalistic head dynamics to predict dystonia severity, subtype, and neuromodulation effects. The framework was also applied to a fully independent cohort of generalised dystonia patients for comparison between dystonia sub-types. Computer vision-derived measurements of head angle excursions showed a strong correlation with clinically assigned scores. Across comparisons, we identified consistent kinematic features from full video assessments encoding information critical to disease severity, subtype, and effects of neural circuit interventions, independent of static head angle deviations used in scoring. Our visual perceptive machine learning framework reveals kinematic pathosignatures of dystonia, potentially augmenting clinical management, facilitating scientific translation, and informing personalized precision neurology approaches.
中文摘要: 肌张力障碍是一种神经运动障碍,其特征是异常的不自主运动和姿势,特别是影响头部和颈部。然而,目前肌张力障碍的临床评估方法依赖于简化的评分量表,缺乏捕捉肌张力障碍现象复杂的时空特征的能力,阻碍了临床管理并限制了对潜在神经生物学的理解。为了解决这个问题,我们开发了一个视觉感知深度学习框架,利用标准临床视频来全面评估和量化疾病状态以及治疗干预措施(特别是深部脑刺激)的影响。该框架克服了传统评定量表的局限性,并提供了一种独立于评定者的高效、准确的方法来评估和监测肌张力障碍患者。为了评估该框架,我们利用了在七个学术中心的三项回顾性纵向队列研究中收集的半标准化临床视频数据。我们提取静态头部角度偏移进行临床验证,并导出反映自然头部动力学的运动学变量,以预测肌张力障碍的严重程度、亚型和神经调节效应。该框架还应用于完全独立的全身性肌张力障碍患者队列,以比较肌张力障碍亚型。计算机视觉衍生的头角偏移测量结果显示与临床分配的分数有很强的相关性。通过比较,我们从完整的视频评估中确定了一致的运动学特征,这些评估编码对疾病严重程度、亚型和神经回路干预效果至关重要的信息,与评分中使用的静态头角偏差无关。我们的视觉感知机器学习框架揭示了肌张力障碍的运动学病理特征,有可能增强临床管理,促进科学转化,并为个性化精准神经病学方法提供信息。
214. Digital health technologies need regulation and reimbursement that enable flexible interactions and groupings.
数字医疗技术需要监管和报销,以实现灵活的交互和分组。
PMID: 38890404 | DOI: 10.1038/s41746-024-01147-z | 日期: 2024-06-18
摘要: Digital Health Technologies (DHTs) are being applied in a widening range of scenarios in medicine. We describe the emerging phenomenon of the grouping of individual DHTs, with a clinical use case and regulatory approval in their own right, into packages to perform specific clinical tasks in defined settings. Example groupings include suites of devices for remote monitoring, or for smart clinics. In this first article of a two-article series, we describe challenges in implementation and limitations in frameworks for the regulation, health technology assessment, and reimbursement of these device suites and linked novel care pathways.
中文摘要: 数字健康技术(DHT)正在广泛应用于医学领域。我们描述了将单个 DHT 分组的新兴现象,这些 DHT 本身具有临床用例和监管批准,分组到包中以在定义的环境中执行特定的临床任务。示例分组包括用于远程监控或智能诊所的设备套件。在两篇文章系列的第一篇文章中,我们描述了这些设备套件和相关新型护理途径的监管、卫生技术评估和报销框架的实施挑战和限制。
215. Exergaming and cognitive functions in people with mild cognitive impairment and dementia: a meta-analysis.
轻度认知障碍和痴呆症患者的运动游戏和认知功能:荟萃分析。
PMID: 38879695 | DOI: 10.1038/s41746-024-01142-4 | 日期: 2024-06-15
摘要: Exergaming is a combination of exercise and gaming. Evidence shows an association between exercise and cognition in older people. However, previous studies showed inconsistent results on the cognitive benefits of exergaming in people with cognitive impairment. Therefore, this study aims to examine the effect of exergaming intervention on cognitive functions in people with MCI or dementia. A systematic literature search was conducted via OVID databases. Randomized controlled trials (RCTs) examined the effect of an exergaming intervention on cognitive functions in people with MCI or dementia were included. Subgroup analyses were conducted according to the type of intervention and training duration. Twenty RCTs with 1152 participants were identified, including 14 trials for MCI and 6 trials for dementia. In people with MCI, 13 studies used virtual-reality (VR)-based exergaming. Those who received VR-based exergaming showed significantly better global cognitive function [SMD (95%CI) = 0.67 (0.23-1.11)], learning and memory [immediate recall test: 0.79 (0.31-1.27); delayed recall test: 0.75 (0.20-1.31)], working memory [5.83 (2.27-9.39)], verbal fluency [0.58 (0.12-1.03)], and faster in executive function than the controls. For people with dementia, all studies used video-based exergaming intervention. Participants with exergaming intervention showed significantly better global cognitive function than the controls [0.38 (0.10-0.67)]. Subgroup analyses showed that longer training duration generated larger effects. The findings suggest that exergaming impacts cognitive functions in people with MCI and dementia. Cognitive benefits are demonstrated for those with a longer training duration. With technological advancement, VR-based exergaming attracts the attention of people with MCI and performs well in improving cognitive functions.
中文摘要: 运动游戏是运动和游戏的结合。有证据表明老年人的运动和认知之间存在关联。然而,之前的研究显示运动游戏对认知障碍患者的认知益处并不一致。因此,本研究旨在探讨运动游戏干预对轻度认知障碍或痴呆症患者认知功能的影响。通过 OVID 数据库进行系统文献检索。随机对照试验 (RCT) 检查了运动游戏干预对轻度认知障碍 (MCI) 或痴呆症患者认知功能的影响。根据干预类型和培训持续时间进行亚组分析。确定了 20 项随机对照试验,共有 1152 名参与者,其中包括 14 项针对 MCI 的试验和 6 项针对痴呆症的试验。在 MCI 患者中,13 项研究使用了基于虚拟现实 (VR) 的运动游戏。接受基于 VR 的运动游戏的人表现出明显更好的整体认知功能 [SMD (95%CI) = 0.67 (0.23-1.11)]、学习和记忆 [立即回忆测试:0.79 (0.31-1.27);延迟回忆测试:0.75(0.20-1.31)],工作记忆[5.83(2.27-9.39)],言语流畅性[0.58(0.12-1.03)],执行功能比对照组更快。对于痴呆症患者,所有研究都使用基于视频的运动游戏干预。接受运动游戏干预的参与者的整体认知功能明显优于对照组[0.38 (0.10-0.67)]。亚组分析表明,较长的训练时间产生的效果更大。研究结果表明,运动游戏会影响轻度认知障碍和痴呆症患者的认知功能。对于那些训练时间较长的人来说,认知方面的好处得到了证明。随着技术的进步,基于VR的运动游戏吸引了MCI患者的注意力,并且在改善认知功能方面表现良好。
216. Effectiveness of telehealth versus in-person care during the COVID-19 pandemic: a systematic review.
COVID-19 大流行期间远程医疗与面对面护理的有效性:系统评价。
PMID: 38879682 | DOI: 10.1038/s41746-024-01152-2 | 日期: 2024-06-15
摘要: In this systematic review, we compared the effectiveness of telehealth with in-person care during the pandemic using PubMed, CINAHL, PsycINFO, and the Cochrane Central Register of Controlled Trials from March 2020 to April 2023. We included English-language, U.S.-healthcare relevant studies comparing telehealth with in-person care conducted after the onset of the pandemic. Two reviewers independently screened search results, serially extracted data, and independently assessed the risk of bias and strength of evidence. We identified 77 studies, the majority of which (47, 61%) were judged to have a serious or high risk of bias. Differences, if any, in healthcare utilization and clinical outcomes between in-person and telehealth care were generally small and/or not clinically meaningful and varied across the type of outcome and clinical area. For process outcomes, there was a mostly lower rate of missed visits and changes in therapy/medication and higher rates of therapy/medication adherence among patients receiving an initial telehealth visit compared with those receiving in-person care. However, the rates of up-to-date labs/paraclinical assessment were also lower among patients receiving an initial telehealth visit compared with those receiving in-person care. Most studies lacked a standardized approach to assessing outcomes. While we refrain from making an overall conclusion about the performance of telehealth versus in-person visits the use of telehealth is comparable to in-person care across a variety of outcomes and clinical areas. As we transition through the COVID-19 era, models for integrating telehealth with traditional care become increasingly important, and ongoing evaluations of telehealth will be particularly valuable.
中文摘要: 在本次系统评价中,我们使用 PubMed、CINAHL、PsycINFO 和 Cochrane 对照试验中央登记册(2020 年 3 月至 2023 年 4 月)比较了大流行期间远程医疗与现场护理的有效性。我们纳入了英语、美国医疗保健相关研究,对大流行爆发后进行的远程医疗与现场护理进行了比较。两名评审员独立筛选检索结果,连续提取数据,并独立评估偏倚风险和证据强度。我们确定了 77 项研究,其中大多数(47 项,61%)被认为存在严重或高偏倚风险。现场医疗保健和远程医疗保健之间的医疗保健利用率和临床结果之间的差异(如果有的话)通常很小和/或没有临床意义,并且在结果类型和临床领域中存在差异。就流程结果而言,与接受面对面护理的患者相比,接受初次远程医疗就诊的患者的漏诊率和治疗/药物变更率大多较低,且治疗/药物依从率较高。然而,与接受面对面护理的患者相比,接受初次远程医疗就诊的患者进行最新实验室/临床旁评估的比率也较低。大多数研究缺乏评估结果的标准化方法。虽然我们不对远程医疗与面对面就诊的表现做出总体结论,但在各种结果和临床领域,远程医疗的使用与面对面护理相当。随着我们过渡到 COVID-19 时代,将远程医疗与传统护理相结合的模式变得越来越重要,对远程医疗的持续评估将特别有价值。
217. Learnings from the first AI-enabled skin cancer device for primary care authorized by FDA.
从 FDA 授权的首款用于初级护理的人工智能皮肤癌设备中汲取经验。
PMID: 38879640 | DOI: 10.1038/s41746-024-01161-1 | 日期: 2024-06-15
摘要: The U.S. Food and Drug Administration’s (FDA) recent authorization of DermaSensor, an AI-enabled device for skin cancer detection in primary care, marks a pivotal moment in digital health innovation. Clinically, the authorization of the first AI-enabled device for use by non-specialists for detecting skin cancer reinforces the feasibility of digital health technologies to bridge gaps in access and expertise in medical practice. The authorization also establishes a new regulatory precedent for FDA authorization of medical devices incorporating AI and machine learning (ML) technologies within dermatology. Together, this article uses the DermaSensor authorization to examine the clinical evidence and regulatory implications of emerging AI-enabled technologies in dermatology.
中文摘要: 美国食品和药物管理局 (FDA) 最近批准了 DermaSensor,这是一款用于初级保健中皮肤癌检测的人工智能设备,标志着数字健康创新的关键时刻。在临床上,第一款支持人工智能的设备被授权供非专业人士用于检测皮肤癌,这增强了数字健康技术的可行性,以弥补医疗实践中的获取和专业知识方面的差距。该授权还为 FDA 授权在皮肤科领域整合人工智能和机器学习 (ML) 技术的医疗设备开创了新的监管先例。总之,本文使用 DermaSensor 授权来检查皮肤科新兴人工智能技术的临床证据和监管影响。
218. Capturing relationships between suturing sub-skills to improve automatic suturing assessment.
捕获缝合子技能之间的关系以改进自动缝合评估。
PMID: 38862627 | DOI: 10.1038/s41746-024-01143-3 | 日期: 2024-06-11
摘要: Suturing skill scores have demonstrated strong predictive capabilities for patient functional recovery. The suturing can be broken down into several substep components, including needle repositioning, needle entry angle, etc. Artificial intelligence (AI) systems have been explored to automate suturing skill scoring. Traditional approaches to skill assessment typically focus on evaluating individual sub-skills required for particular substeps in isolation. However, surgical procedures require the integration and coordination of multiple sub-skills to achieve successful outcomes. Significant associations among the technical sub-skill have been established by existing studies. In this paper, we propose a framework for joint skill assessment that takes into account the interconnected nature of sub-skills required in surgery. The prior known relationships among sub-skills are firstly identified. Our proposed AI system is then empowered by the prior known relationships to perform the suturing skill scoring for each sub-skill domain simultaneously. Our approach can effectively improve skill assessment performance through the prior known relationships among sub-skills. Through the proposed approach to joint skill assessment, we aspire to enhance the evaluation of surgical proficiency and ultimately improve patient outcomes in surgery.
中文摘要: 缝合技能评分已显示出对患者功能恢复的强大预测能力。缝合可以分为几个子步骤,包括针重新定位、针进入角度等。人工智能 (AI) 系统已被探索用于自动缝合技能评分。传统的技能评估方法通常侧重于单独评估特定子步骤所需的个人子技能。然而,外科手术需要多种子技能的整合和协调才能取得成功的结果。现有的研究已经建立了技术子技能之间的显着关联。在本文中,我们提出了一个关节技能评估框架,该框架考虑了手术所需子技能的相互关联性。首先确定子技能之间先前已知的关系。然后,我们提出的人工智能系统由先前已知的关系授权,可以同时对每个子技能领域进行缝合技能评分。我们的方法可以通过子技能之间先前已知的关系有效地提高技能评估绩效。通过提出的关节技能评估方法,我们渴望加强对手术熟练程度的评估,并最终改善患者的手术结果。
219. Computationally derived transition points across phases of clinical care.
跨临床护理阶段的计算得出的过渡点。
PMID: 38862589 | DOI: 10.1038/s41746-024-01145-1 | 日期: 2024-06-11
摘要: The objective of this study is to use statistical techniques for the identification of transition points along the life course, aiming to identify fundamental changes in patient multimorbidity burden across phases of clinical care. This retrospective cohort analysis utilized 5.2 million patient encounters from 2013 to 2022, collected from a large academic institution and its affiliated hospitals. Structured information was systematically gathered for each encounter and three methodologies - clustering analysis, False Nearest Neighbor, and transitivity analysis - were employed to pinpoint transitions in patients’ clinical phase. Clustering analysis identified transition points at age 2, 17, 41, and 66, FNN at 4.27, 5.83, 5.85, 14.12, 20.62, 24.30, 25.10, 29.08, 33.12, 35.7, 38.69, 55.66, 70.03, and transitivity analysis at 7.27, 23.58, 29.04, 35.00, 61.29, 67.03, 77.11. Clustering analysis identified transition points that align with the current clinical gestalt of pediatric, adult, and geriatric phases of care. Notably, over half of the transition points identified by FNN and transitivity analysis were between ages 20 and 40, a population that is traditionally considered to be clinically homogeneous. Few transition points were identified between ages 3 and 17. Despite large social and developmental transition at those ages, the burden of multimorbidities may be consistent across the age range. Transition points derived through unsupervised machine learning approaches identify changes in the clinical phase that align with true differences in underlying multimorbidity burden. These transitions may be different from conventional pediatric and geriatric phases, which are often influenced by policy rather than clinical changes.
中文摘要: 本研究的目的是使用统计技术来识别生命历程中的过渡点,旨在确定临床护理各个阶段患者多重疾病负担的根本变化。这项回顾性队列分析利用了 2013 年至 2022 年从一家大型学术机构及其附属医院收集的 520 万例患者。我们系统地收集每次遭遇的结构化信息,并采用三种方法——聚类分析、虚假最近邻和传递性分析——来精确定位患者临床阶段的转变。聚类分析确定了 2、17、41 和 66 岁的转变点,FNN 为 4.27、5.83、5.85、14.12、20.62、24.30、25.10、29.08、33.12、35.7、38.69、55.66、70.03 和及物性分析于 7.27、23.58、29.04、35.00、61.29、67.03、77.11。聚类分析确定了与当前儿科、成人和老年护理阶段的临床形态相一致的转变点。值得注意的是,FNN 和传递性分析确定的过渡点中有一半以上位于 20 岁至 40 岁之间,传统上认为该人群在临床上具有同质性。 3 岁至 17 岁之间几乎没有确定的转变点。尽管这些年龄段发生了巨大的社会和发展转变,但多种疾病的负担在整个年龄范围内可能是一致的。通过无监督机器学习方法得出的过渡点可以识别临床阶段的变化,这些变化与潜在的多病负担的真正差异相一致。这些转变可能与传统的儿科和老年阶段不同,后者通常受到政策而不是临床变化的影响。
220. Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system.
评估大型医疗保健系统中部署的机器学习营养不良预测模型的校准和偏差。
PMID: 38844546 | DOI: 10.1038/s41746-024-01141-5 | 日期: 2024-06-06
摘要: Malnutrition is a frequently underdiagnosed condition leading to increased morbidity, mortality, and healthcare costs. The Mount Sinai Health System (MSHS) deployed a machine learning model (MUST-Plus) to detect malnutrition upon hospital admission. However, in diverse patient groups, a poorly calibrated model may lead to misdiagnosis, exacerbating health care disparities. We explored the model’s calibration across different variables and methods to improve calibration. Data from adult patients admitted to five MSHS hospitals from January 1, 2021 - December 31, 2022, were analyzed. We compared MUST-Plus prediction to the registered dietitian’s formal assessment. Hierarchical calibration was assessed and compared between the recalibration sample (N = 49,562) of patients admitted between January 1, 2021 - December 31, 2022, and the hold-out sample (N = 17,278) of patients admitted between January 1, 2023 - September 30, 2023. Statistical differences in calibration metrics were tested using bootstrapping with replacement. Before recalibration, the overall model calibration intercept was -1.17 (95% CI: -1.20, -1.14), slope was 1.37 (95% CI: 1.34, 1.40), and Brier score was 0.26 (95% CI: 0.25, 0.26). Both weak and moderate measures of calibration were significantly different between White and Black patients and between male and female patients. Logistic recalibration significantly improved calibration of the model across race and gender in the hold-out sample. The original MUST-Plus model showed significant differences in calibration between White vs. Black patients. It also overestimated malnutrition in females compared to males. Logistic recalibration effectively reduced miscalibration across all patient subgroups. Continual monitoring and timely recalibration can improve model accuracy.
中文摘要: 营养不良是一种经常被诊断不足的疾病,导致发病率、死亡率和医疗费用增加。西奈山卫生系统 (MSHS) 部署了机器学习模型 (MUST-Plus) 来检测入院时的营养不良情况。然而,在不同的患者群体中,校准不当的模型可能会导致误诊,从而加剧医疗保健差异。我们探索了模型在不同变量和方法上的校准,以改进校准。对 2021 年 1 月 1 日至 2022 年 12 月 31 日期间五家 MSHS 医院收治的成年患者的数据进行了分析。我们将 MUST-Plus 预测与注册营养师的正式评估进行了比较。对2021年1月1日至2022年12月31日期间入院患者的重新校准样本(N = 49,562)和2023年1月1日至2023年9月30日期间入院患者的保留样本(N = 17,278)之间进行了层次校准评估和比较。使用引导法测试了校准指标的统计差异与更换。重新校准前,整体模型校准截距为-1.17(95% CI:-1.20,-1.14),斜率为1.37(95% CI:1.34,1.40),Brier评分为0.26(95% CI:0.25,0.26)。白人和黑人患者以及男性和女性患者之间的弱和中等校准测量值均存在显着差异。逻辑重新校准显着改善了保留样本中跨种族和性别模型的校准。最初的 MUST-Plus 模型显示白人与黑人患者之间的校准存在显着差异。与男性相比,它还高估了女性的营养不良情况。逻辑重新校准有效减少了所有患者亚组的错误校准。持续监测和及时重新校准可以提高模型准确性。
221. Deployment and validation of the CLL treatment infection model adjoined to an EHR system.
与 EHR 系统相连的 CLL 治疗感染模型的部署和验证。
PMID: 38839920 | DOI: 10.1038/s41746-024-01132-6 | 日期: 2024-06-05
摘要: Research algorithms are seldom externally validated or integrated into clinical practice, leaving unknown challenges in deployment. In such efforts, one needs to address challenges related to data harmonization, the performance of an algorithm in unforeseen missingness, automation and monitoring of predictions, and legal frameworks. We here describe the deployment of a high-dimensional data-driven decision support model into an EHR and derive practical guidelines informed by this deployment that includes the necessary processes, stakeholders and design requirements for a successful deployment. For this, we describe our deployment of the chronic lymphocytic leukemia (CLL) treatment infection model (CLL-TIM) as a stand-alone platform adjoined to an EPIC-based Danish Electronic Health Record (EHR), with the presentation of personalized predictions in a clinical context. CLL-TIM is an 84-variable data-driven prognostic model utilizing 7-year medical patient records and predicts the 2-year risk composite outcome of infection and/or treatment post-CLL diagnosis. As an independent validation cohort for this deployment, we used a retrospective population-based cohort of patients diagnosed with CLL from 2018 onwards (n = 1480). Unexpectedly high levels of missingness for key CLL-TIM variables were exhibited upon deployment. High dimensionality, with the handling of missingness, and predictive confidence were critical design elements that enabled trustworthy predictions and thus serves as a priority for prognostic models seeking deployment in new EHRs. Our setup for deployment, including automation and monitoring into EHR that meets Medical Device Regulations, may be used as step-by-step guidelines for others aiming at designing and deploying research algorithms into clinical practice.
中文摘要: 研究算法很少经过外部验证或集成到临床实践中,从而在部署中留下未知的挑战。在这些努力中,人们需要解决与数据协调、算法在不可预见的缺失中的性能、预测的自动化和监控以及法律框架相关的挑战。我们在这里描述了将高维数据驱动的决策支持模型部署到 EHR 中,并根据此部署得出实用指南,其中包括成功部署所需的流程、利益相关者和设计要求。为此,我们将慢性淋巴细胞白血病 (CLL) 治疗感染模型 (CLL-TIM) 的部署描述为与基于 EPIC 的丹麦电子健康记录 (EHR) 相连的独立平台,并在临床背景下呈现个性化预测。 CLL-TIM 是一种 84 变量数据驱动的预后模型,利用 7 年的医疗记录,预测 CLL 诊断后感染和/或治疗的 2 年风险综合结果。作为此部署的独立验证队列,我们使用了 2018 年以来诊断为 CLL 的基于人群的回顾性队列 (n = 1480)。部署后,关键 CLL-TIM 变量出现了意外的高水平缺失。高维度、缺失处理和预测置信度是实现可信预测的关键设计元素,因此成为寻求在新 EHR 中部署的预后模型的优先事项。我们的部署设置,包括符合医疗器械法规的 EHR 自动化和监控,可以用作其他旨在设计研究算法并将其部署到临床实践中的分步指南。
222. Circadian rhythm analysis using wearable-based accelerometry as a digital biomarker of aging and healthspan.
使用基于可穿戴设备的加速度测量作为衰老和健康寿命的数字生物标记进行昼夜节律分析。
PMID: 38834756 | DOI: 10.1038/s41746-024-01111-x | 日期: 2024-06-04
摘要: Recognizing the pivotal role of circadian rhythm in the human aging process and its scalability through wearables, we introduce CosinorAge, a digital biomarker of aging developed from wearable-derived circadian rhythmicity from 80,000 midlife and older adults in the UK and US. A one-year increase in CosinorAge corresponded to 8-12% higher all-cause and cause-specific mortality risks and 3-14% increased prospective incidences of age-related diseases. CosinorAge also captured a non-linear decline in resilience and physical functioning, evidenced by an 8-33% reduction in self-rated health and a 3-23% decline in health-related quality of life score, adjusting for covariates and multiple testing. The associations were robust in sensitivity analyses and external validation using an independent cohort from a disparate geographical region using a different wearable device. Moreover, we illustrated a heterogeneous impact of circadian parameters associated with biological aging, with young (<45 years) and fast agers experiencing a substantially delayed acrophase with a 25-minute difference in peak timing compared to slow agers, diminishing to a 7-minute difference in older adults (>65 years). We demonstrated a significant enhancement in the predictive performance when integrating circadian rhythmicity in the estimation of biological aging over physical activity. Our findings underscore CosinorAge’s potential as a scalable, economic, and digital solution for promoting healthy longevity, elucidating the critical and multifaceted circadian rhythmicity in aging processes. Consequently, our research contributes to advancing preventive measures in digital medicine.
中文摘要: 认识到昼夜节律在人类衰老过程中的关键作用及其通过可穿戴设备的可扩展性,我们推出了 CosinorAge,这是一种衰老的数字生物标记,是根据英国和美国 80,000 名中年和老年人的可穿戴设备衍生的昼夜节律开发的。 CosinorAge 增加一年,全因和特定原因死亡风险增加 8-12%,与年龄相关疾病的预期发病率增加 3-14%。 CosinorAge 还捕捉到了复原力和身体机能的非线性下降,根据协变量和多重测试进行调整后,自评健康状况下降 8-33%,健康相关生活质量评分下降 3-23%。这些关联在敏感性分析和使用来自不同地理区域的独立队列使用不同可穿戴设备的外部验证中是稳健的。此外,我们还说明了与生物衰老相关的昼夜节律参数的异质性影响,年轻人(<45岁)和快速衰老者经历了大幅延迟的末期,与慢速衰老者相比,峰值时间有25分钟的差异,而老年人(>65岁)的峰值时间差异缩小到7分钟。我们证明,将昼夜节律整合到身体活动的生物衰老估计中时,预测性能显着增强。我们的研究结果强调了 CosinorAge 作为可扩展、经济和数字化解决方案的潜力,可促进健康长寿,阐明衰老过程中关键和多方面的昼夜节律。因此,我们的研究有助于推进数字医学的预防措施。
223. The effectiveness of digital twins in promoting precision health across the entire population: a systematic review.
数字孪生在促进全民精准健康方面的有效性:系统评价。
PMID: 38831093 | DOI: 10.1038/s41746-024-01146-0 | 日期: 2024-06-03
摘要: Digital twins represent a promising technology within the domain of precision healthcare, offering significant prospects for individualized medical interventions. Existing systematic reviews, however, mainly focus on the technological dimensions of digital twins, with a limited exploration of their impact on health-related outcomes. Therefore, this systematic review aims to explore the efficacy of digital twins in improving precision healthcare at the population level. The literature search for this study encompassed PubMed, Embase, Web of Science, Cochrane Library, CINAHL, SinoMed, CNKI, and Wanfang Database to retrieve potentially relevant records. Patient health-related outcomes were synthesized employing quantitative content analysis, whereas the Joanna Briggs Institute (JBI) scales were used to evaluate the quality and potential bias inherent in each selected study. Following established inclusion and exclusion criteria, 12 studies were screened from an initial 1321 records for further analysis. These studies included patients with various conditions, including cancers, type 2 diabetes, multiple sclerosis, heart failure, qi deficiency, post-hepatectomy liver failure, and dental issues. The review coded three types of interventions: personalized health management, precision individual therapy effects, and predicting individual risk, leading to a total of 45 outcomes being measured. The collective effectiveness of these outcomes at the population level was calculated at 80% (36 out of 45). No studies exhibited unacceptable differences in quality. Overall, employing digital twins in precision health demonstrates practical advantages, warranting its expanded use to facilitate the transition from the development phase to broad application.PROSPERO registry: CRD42024507256.
中文摘要: 数字孪生是精准医疗领域一项有前景的技术,为个性化医疗干预提供了广阔的前景。然而,现有的系统评价主要集中在数字孪生的技术层面,而对其对健康相关结果的影响的探索有限。因此,本系统综述旨在探讨数字孪生在改善人群层面精准医疗保健方面的功效。本研究的文献检索包括 PubMed、Embase、Web of Science、Cochrane Library、CINAHL、SinoMed、CNKI 和万方数据库,以检索潜在的相关记录。采用定量内容分析综合患者健康相关结果,而乔安娜·布里格斯研究所 (JBI) 量表用于评估每项选定研究固有的质量和潜在偏差。根据既定的纳入和排除标准,从最初的 1321 项记录中筛选出 12 项研究进行进一步分析。这些研究纳入了患有各种疾病的患者,包括癌症、2型糖尿病、多发性硬化症、心力衰竭、气虚、肝切除术后肝衰竭和牙齿问题。该审查编码了三种类型的干预措施:个性化健康管理、精准个体治疗效果和预测个体风险,总共测量了 45 种结果。这些结果在人群层面的集体有效性经计算为 80%(45 项中的 36 项)。没有研究表现出不可接受的质量差异。总体而言,在精准健康中使用数字孪生显示出实际优势,保证了其广泛使用,以促进从开发阶段到广泛应用的过渡。PROSPERO 注册:CRD42024507256。
224. Towards automatic home-based sleep apnea estimation using deep learning.
使用深度学习自动估计家庭睡眠呼吸暂停。
PMID: 38824175 | DOI: 10.1038/s41746-024-01139-z | 日期: 2024-06-01
摘要: Apnea and hypopnea are common sleep disorders characterized by the obstruction of the airways. Polysomnography (PSG) is a sleep study typically used to compute the Apnea-Hypopnea Index (AHI), the number of times a person has apnea or certain types of hypopnea per hour of sleep, and diagnose the severity of the sleep disorder. Early detection and treatment of apnea can significantly reduce morbidity and mortality. However, long-term PSG monitoring is unfeasible as it is costly and uncomfortable for patients. To address these issues, we propose a method, named DRIVEN, to estimate AHI at home from wearable devices and detect when apnea, hypopnea, and periods of wakefulness occur throughout the night. The method can therefore assist physicians in diagnosing the severity of apneas. Patients can wear a single sensor or a combination of sensors that can be easily measured at home: abdominal movement, thoracic movement, or pulse oximetry. For example, using only two sensors, DRIVEN correctly classifies 72.4% of all test patients into one of the four AHI classes, with 99.3% either correctly classified or placed one class away from the true one. This is a reasonable trade-off between the model’s performance and the patient’s comfort. We use publicly available data from three large sleep studies with a total of 14,370 recordings. DRIVEN consists of a combination of deep convolutional neural networks and a light-gradient-boost machine for classification. It can be implemented for automatic estimation of AHI in unsupervised long-term home monitoring systems, reducing costs to healthcare systems and improving patient care.
中文摘要: 呼吸暂停和呼吸不足是常见的睡眠障碍,其特征是气道阻塞。多导睡眠图 (PSG) 是一项睡眠研究,通常用于计算呼吸暂停-呼吸不足指数 (AHI),即一个人每小时睡眠时出现呼吸暂停或某些类型呼吸不足的次数,并诊断睡眠障碍的严重程度。早期发现和治疗呼吸暂停可以显着降低发病率和死亡率。然而,长期 PSG 监测是不可行的,因为它成本高昂且对患者来说不舒服。为了解决这些问题,我们提出了一种名为 DRIVEN 的方法,通过可穿戴设备在家中估算 AHI,并检测整个晚上何时发生呼吸暂停、呼吸不足和清醒期。因此,该方法可以帮助医生诊断呼吸暂停的严重程度。患者可以佩戴单个传感器或多个传感器组合,以便在家中轻松测量:腹部运动、胸部运动或脉搏血氧饱和度。例如,仅使用两个传感器,DRIVEN 就可以将 72.4% 的测试患者正确分类为四个 AHI 类别之一,其中 99.3% 的患者要么正确分类,要么与真实类别相差一类。这是模型性能和患者舒适度之间的合理权衡。我们使用来自三项大型睡眠研究的公开数据,总共 14,370 条记录。 DRIVEN 由深度卷积神经网络和用于分类的光梯度提升机组合而成。它可以在无人监督的长期家庭监测系统中自动估计 AHI,从而降低医疗保健系统的成本并改善患者护理。
225. Deep learning to assess microsatellite instability directly from histopathological whole slide images in endometrial cancer.
深度学习直接从子宫内膜癌的组织病理学全幻灯片图像评估微卫星不稳定性。
PMID: 38811811 | DOI: 10.1038/s41746-024-01131-7 | 日期: 2024-05-29
摘要: Molecular classification, particularly microsatellite instability-high (MSI-H), has gained attention for immunotherapy in endometrial cancer (EC). MSI-H is associated with DNA mismatch repair defects and is a crucial treatment predictor. The NCCN guidelines recommend pembrolizumab and nivolumab for advanced or recurrent MSI-H/mismatch repair deficient (dMMR) EC. However, evaluating MSI in all cases is impractical due to time and cost constraints. To overcome this challenge, we present an effective and efficient deep learning-based model designed to accurately and rapidly assess MSI status of EC using H&E-stained whole slide images. Our framework was evaluated on a comprehensive dataset of gigapixel histopathology images of 529 patients from the Cancer Genome Atlas (TCGA). The experimental results have shown that the proposed method achieved excellent performances in assessing MSI status, obtaining remarkably high results with 96%, 94%, 93% and 100% for endometrioid carcinoma G1G2, respectively, and 87%, 84%, 81% and 94% for endometrioid carcinoma G3, in terms of F-measure, accuracy, precision and sensitivity, respectively. Furthermore, the proposed deep learning framework outperforms four state-of-the-art benchmarked methods by a significant margin (p < 0.001) in terms of accuracy, precision, sensitivity and F-measure, respectively. Additionally, a run time analysis demonstrates that the proposed method achieves excellent quantitative results with high efficiency in AI inference time (1.03 seconds per slide), making the proposed framework viable for practical clinical usage. These results highlight the efficacy and efficiency of the proposed model to assess MSI status of EC directly from histopathological slides.
中文摘要: 分子分类,特别是高微卫星不稳定性(MSI-H),在子宫内膜癌(EC)的免疫治疗中引起了人们的关注。 MSI-H 与 DNA 错配修复缺陷相关,是重要的治疗预测因子。 NCCN 指南推荐派姆单抗和纳武单抗用于晚期或复发性 MSI-H/错配修复缺陷 (dMMR) EC。然而,由于时间和成本的限制,评估所有情况下的 MSI 是不切实际的。为了克服这一挑战,我们提出了一种有效且高效的基于深度学习的模型,旨在使用 H&E 染色的整个幻灯片图像准确快速地评估 EC 的 MSI 状态。我们的框架在来自癌症基因组图谱 (TCGA) 的 529 名患者的十亿像素组织病理学图像综合数据集上进行了评估。实验结果表明,该方法在评估MSI状态方面取得了优异的性能,在F测量、准确度、精密度和灵敏度方面,对子宫内膜样癌G1G2分别获得了96%、94%、93%和100%的结果,对子宫内膜样癌G3分别获得了87%、84%、81%和94%的结果。此外,所提出的深度学习框架在准确度、精确度、灵敏度和 F 测量方面分别显着优于四种最先进的基准方法(p<0.001)。此外,运行时分析表明,所提出的方法取得了优异的定量结果,并且人工智能推理时间高效(每张幻灯片 1.03 秒),使得所提出的框架可用于实际临床使用。这些结果强调了所提出的模型直接从组织病理学切片评估 EC MSI 状态的功效和效率。
226. A wearable sensor and machine learning estimate step length in older adults and patients with neurological disorders.
可穿戴传感器和机器学习可估计老年人和神经系统疾病患者的步长。
PMID: 38796519 | DOI: 10.1038/s41746-024-01136-2 | 日期: 2024-05-25
摘要: Step length is an important diagnostic and prognostic measure of health and disease. Wearable devices can estimate step length continuously (e.g., in clinic or real-world settings), however, the accuracy of current estimation methods is not yet optimal. We developed machine-learning models to estimate step length based on data derived from a single lower-back inertial measurement unit worn by 472 young and older adults with different neurological conditions, including Parkinson’s disease and healthy controls. Studying more than 80,000 steps, the best model showed high accuracy for a single step (root mean square error, RMSE = 6.08 cm, ICC(2,1) = 0.89) and higher accuracy when averaged over ten consecutive steps (RMSE = 4.79 cm, ICC(2,1) = 0.93), successfully reaching the predefined goal of an RMSE below 5 cm (often considered the minimal-clinically-important-difference). Combining machine-learning with a single, wearable sensor generates accurate step length measures, even in patients with neurologic disease. Additional research may be needed to further reduce the errors in certain conditions.
中文摘要: 步长是健康和疾病的重要诊断和预后指标。可穿戴设备可以连续估计步长(例如,在临床或现实环境中),但是当前估计方法的准确性尚未达到最佳。我们开发了机器学习模型,根据来自 472 名患有不同神经系统疾病(包括帕金森病)和健康对照的年轻人和老年人佩戴的单个下背部惯性测量装置的数据来估计步长。通过研究 80,000 多个步骤,最佳模型显示出单步的高精度(均方根误差,RMSE = 6.08 cm,ICC(2,1) = 0.89),并且连续十个步骤的平均精度更高(RMSE = 4.79 cm,ICC(2,1) = 0.93),成功达到RMSE 低于 5cm 的预定义目标(通常被认为是最小临床重要差异)。将机器学习与单个可穿戴传感器相结合,即使对于患有神经系统疾病的患者,也能产生准确的步长测量。可能需要进行额外的研究以进一步减少某些条件下的错误。
227. The three-year evolution of Germany’s Digital Therapeutics reimbursement program and its path forward.
德国数字治疗报销计划的三年演变及其前进道路。
PMID: 38789620 | DOI: 10.1038/s41746-024-01137-1 | 日期: 2024-05-24
摘要: The 2019 German Digital Healthcare Act introduced the Digital Health Application program, known in German as ‘Digitale Gesundheitsanwendungen’ (DiGA). The program has established a pioneering model for integrating Digital Therapeutics (DTx) into a healthcare system with scalable and effective reimbursement strategies. To date, the continuous upward trend enabled by this framework has resulted in more than 374,000 DiGA prescriptions, increasingly cementing its role in the German healthcare system. This perspective provides a synthesis of the DiGA program’s evolution since its inception three years ago, highlighting trends regarding prescriptions and pricing as well as criticisms and identified shortcomings. It further discusses forthcoming legislative amendments, including the anticipated integration of higher-risk medical devices, which have the potential to significantly transform the program. Despite encountering challenges related to effectiveness, evidence requirements, and integration within the healthcare system, the DiGA program continues to evolve and serves as a seminal example for the integration of DTx, offering valuable insights for healthcare systems globally.
中文摘要: 2019 年德国数字医疗法案引入了数字健康应用程序,德语称为“Digital Gesundheitsanwendungen”(DiGA)。该计划建立了一个开创性的模型,将数字治疗 (DTx) 集成到具有可扩展且有效的报销策略的医疗保健系统中。迄今为止,该框架带来的持续上升趋势已导致超过 374,000 个 DiGA 处方,日益巩固其在德国医疗保健系统中的作用。这一视角综合了 DiGA 计划自三年前启动以来的演变,强调了处方和定价方面的趋势以及批评和已发现的缺陷。它进一步讨论了即将到来的立法修正案,包括预期的高风险医疗设备的整合,这有可能显着改变该计划。尽管遇到了与有效性、证据要求和医疗保健系统内集成相关的挑战,DiGA 计划仍在不断发展,并成为 DTx 集成的开创性示例,为全球医疗保健系统提供了宝贵的见解。
228. A digital twin model incorporating generalized metabolic fluxes to identify and predict chronic kidney disease in type 2 diabetes mellitus.
结合广义代谢流的数字孪生模型,用于识别和预测 2 型糖尿病慢性肾病。
PMID: 38789510 | DOI: 10.1038/s41746-024-01108-6 | 日期: 2024-05-24
摘要: We have developed a digital twin-based CKD identification and prediction model that leverages generalized metabolic fluxes (GMF) for patients with Type 2 Diabetes Mellitus (T2DM). GMF digital twins utilized basic clinical and physiological biomarkers as inputs for identification and prediction of CKD. We employed four diverse multi-ethnic cohorts (n = 7072): a Singaporean cohort (EVAS, n = 289) and a North American cohort (NHANES, n = 1044) for baseline CKD identification, and two multi-center Singaporean cohorts (CDMD, n = 2119 and SDR, n = 3627) for 3-year CKD prediction and risk stratification. We subsequently conducted a comprehensive study utilizing a single dataset to evaluate the clinical utility of GMF for CKD prediction. The GMF-based identification model performed strongly, achieving an AUC between 0.80 and 0.82. In prediction, the GMF generated with complete parameters attained high performance with an AUC of 0.86, while with incomplete parameters, it achieved an AUC of 0.75. The GMF-based prediction model utilizing complete inputs is the standard implementation of our algorithm: HealthVector Diabetes®. We have established the GMF digital twin-based model as a robust clinical tool capable of predicting and stratifying the risk of future CKD within a 3-year time horizon. We report the correlation of GMF with basic input parameters, their ability to differentiate between future health states and medication status at baseline, and their capability to quantify CKD progression rates. This holistic methodology provides insights into patients’ health states and CKD progression rates based on GMF metabolic profile differences, enabling personalized care plans.
中文摘要: 我们开发了一种基于数字孪生的 CKD 识别和预测模型,该模型利用 2 型糖尿病 (T2DM) 患者的广义代谢通量 (GMF)。 GMF 数字双胞胎利用基本的临床和生理生物标志物作为 CKD 识别和预测的输入。我们采用了四个不同的多种族队列(n = 7072):一个新加坡队列(EVAS,n = 289)和一个北美队列(NHANES,n = 1044)用于基线 CKD 识别,两个多中心新加坡队列(CDMD,n = 2119 和 SDR,n = 3627)用于 3 年 CKD 预测和风险分层。我们随后利用单个数据集进行了一项全面的研究,以评估 GMF 在 CKD 预测中的临床效用。基于 GMF 的识别模型表现强劲,AUC 达到 0.80 至 0.82 之间。在预测中,使用完整参数生成的 GMF 获得了较高的性能,AUC 为 0.86,而使用不完整参数生成的 GMF 则获得了 0.75 的 AUC。利用完整输入的基于 GMF 的预测模型是我们算法的标准实现:HealthVector Diabetes®。我们建立了基于 GMF 数字孪生的模型,作为一种强大的临床工具,能够预测和分层 3 年内未来 CKD 的风险。我们报告了 GMF 与基本输入参数的相关性、它们区分未来健康状态和基线药物状态的能力,以及它们量化 CKD 进展率的能力。这种整体方法可以根据 GMF 代谢特征差异深入了解患者的健康状况和 CKD 进展率,从而实现个性化护理计划。
229. Association of physical activity pattern and risk of Parkinson’s disease.
体力活动模式与帕金森病风险的关联。
PMID: 38783073 | DOI: 10.1038/s41746-024-01135-3 | 日期: 2024-05-23
摘要: Increasing evidence suggests an association between exercise duration and Parkinson’s disease. However, no high-quality prospective evidence exists confirming whether differences exist between the two modes of exercise, weekend warrior and equal distribution of exercise duration, and Parkinson’s risk. Hence, this study aimed to explore the association between different exercise patterns and Parkinson’s risk using exercise data from the UK Biobank. The study analyzed data from 89,400 UK Biobank participants without Parkinson’s disease. Exercise data were collected using the Axivity AX3 wrist-worn triaxial accelerometer. Participants were categorized into three groups: inactive, regularly active, and engaged in the weekend warrior (WW) pattern. The relationship between these exercise patterns and Parkinson’s risk was assessed using a multifactorial Cox model. During a mean follow-up of 12.32 years, 329 individuals developed Parkinson’s disease. In a multifactorial Cox model, using the World Health Organization-recommended threshold of 150 min of moderate-to-vigorous physical activity per week, both the active WW group [hazard ratio (HR) = 0.58; 95% confidence interval (CI) = 0.43-0.78; P < 0.001] and the active regular group (HR = 0.44; 95% CI = 0.34-0.57; P < 0.001) exhibited a lower risk of developing Parkinson’s disease compared with the inactive group. Further, no statistically significant difference was observed between the active WW and the active regular groups (HR = 0.77; 95% CI = 0.56-1.05; P = 0.099). In conclusion, in this cohort study, both the WW exercise pattern and an equal distribution of exercise hours were equally effective in reducing Parkinson’s risk.
中文摘要: 越来越多的证据表明运动时间与帕金森病之间存在关联。然而,没有高质量的前瞻性证据来证实两种运动方式(周末勇士和平均分配运动时间)以及帕金森病风险之间是否存在差异。因此,本研究旨在利用英国生物银行的运动数据探讨不同运动模式与帕金森病风险之间的关联。该研究分析了 89,400 名没有帕金森病的英国生物银行参与者的数据。使用 Axivity AX3 腕戴式三轴加速度计收集运动数据。参与者被分为三组:不活跃、经常活跃和参与周末战士 (WW) 模式。使用多因素 Cox 模型评估这些运动模式与帕金森病风险之间的关系。在平均 12.32 年的随访期间,329 人患上了帕金森病。在多因素 Cox 模型中,使用世界卫生组织建议的每周 150 分钟中度至剧烈体力活动阈值,活跃 WW 组[风险比 (HR) = 0.58; 95%置信区间(CI) = 0.43-0.78; P < 0.001]和经常活动组(HR = 0.44;95% CI= 0.34-0.57;P < 0.001)与不活动组相比,患帕金森病的风险较低。此外,活跃WW组和活跃常规组之间没有观察到统计学上的显着差异(HR = 0.77;95% CI = 0.56-1.05;P = 0.099)。总之,在这项队列研究中,WW 运动模式和平均分配运动时间对于降低帕金森病风险同样有效。
230. Evaluation of stenoses using AI video models applied to coronary angiography.
使用应用于冠状动脉造影的 AI 视频模型评估狭窄。
PMID: 38783037 | DOI: 10.1038/s41746-024-01134-4 | 日期: 2024-05-23
摘要: The coronary angiogram is the gold standard for evaluating the severity of coronary artery disease stenoses. Presently, the assessment is conducted visually by cardiologists, a method that lacks standardization. This study introduces DeepCoro, a ground-breaking AI-driven pipeline that integrates advanced vessel tracking and a video-based Swin3D model that was trained and validated on a dataset comprised of 182,418 coronary angiography videos spanning 5 years. DeepCoro achieved a notable precision of 71.89% in identifying coronary artery segments and demonstrated a mean absolute error of 20.15% (95% CI: 19.88-20.40) and a classification AUROC of 0.8294 (95% CI: 0.8215-0.8373) in stenosis percentage prediction compared to traditional cardiologist assessments. When compared to two expert interventional cardiologists, DeepCoro achieved lower variability than the clinical reports (19.09%; 95% CI: 18.55-19.58 vs 21.00%; 95% CI: 20.20-21.76, respectively). In addition, DeepCoro can be fine-tuned to a different modality type. When fine-tuned on quantitative coronary angiography assessments, DeepCoro attained an even lower mean absolute error of 7.75% (95% CI: 7.37-8.07), underscoring the reduced variability inherent to this method. This study establishes DeepCoro as an innovative video-based, adaptable tool in coronary artery disease analysis, significantly enhancing the precision and reliability of stenosis assessment.
中文摘要: 冠状动脉造影是评估冠状动脉疾病狭窄严重程度的金标准。目前,评估是由心脏病专家通过视觉进行,这种方法缺乏标准化。本研究介绍了 DeepCoro,这是一种突破性的人工智能驱动管道,集成了先进的血管跟踪和基于视频的 Swin3D 模型,该模型在由 5 年 182,418 个冠状动脉造影视频组成的数据集上进行了训练和验证。与传统心脏病专家评估相比,DeepCoro 在识别冠状动脉节段方面实现了 71.89% 的显着精度,并在狭窄百分比预测方面显示出平均绝对误差为 20.15% (95% CI: 19.88-20.40) 和分类 AUROC 为 0.8294 (95% CI: 0.8215-0.8373)。与两位介入心脏病专家相比,DeepCoro 的变异性低于临床报告(分别为 19.09%;95% CI:18.55-19.58 vs 21.00%;95% CI:20.20-21.76)。此外,DeepCoro 可以针对不同的模态类型进行微调。当对定量冠状动脉造影评估进行微调时,DeepCoro 的平均绝对误差甚至更低,为 7.75%(95% CI:7.37-8.07),强调了该方法固有的变异性降低。这项研究将 DeepCoro 确立为冠状动脉疾病分析中基于视频的创新型适应性工具,显着提高了狭窄评估的精确度和可靠性。
231. Early adverse physiological event detection using commercial wearables: challenges and opportunities.
使用商业可穿戴设备进行早期不良生理事件检测:挑战和机遇。
PMID: 38783001 | DOI: 10.1038/s41746-024-01129-1 | 日期: 2024-05-23
摘要: Data from commercial off-the-shelf (COTS) wearables leveraged with machine learning algorithms provide an unprecedented potential for the early detection of adverse physiological events. However, several challenges inhibit this potential, including (1) heterogeneity among and within participants that make scaling detection algorithms to a general population less precise, (2) confounders that lead to incorrect assumptions regarding a participant’s healthy state, (3) noise in the data at the sensor level that limits the sensitivity of detection algorithms, and (4) imprecision in self-reported labels that misrepresent the true data values associated with a given physiological event. The goal of this study was two-fold: (1) to characterize the performance of such algorithms in the presence of these challenges and provide insights to researchers on limitations and opportunities, and (2) to subsequently devise algorithms to address each challenge and offer insights on future opportunities for advancement. Our proposed algorithms include techniques that build on determining suitable baselines for each participant to capture important physiological changes and label correction techniques as it pertains to participant-reported identifiers. Our work is validated on potentially one of the largest datasets available, obtained with 8000+ participants and 1.3+ million hours of wearable data captured from Oura smart rings. Leveraging this extensive dataset, we achieve pre-symptomatic detection of COVID-19 with a performance receiver operator characteristic (ROC) area under the curve (AUC) of 0.725 without correction techniques, 0.739 with baseline correction, 0.740 with baseline correction and label correction on the training set, and 0.777 with baseline correction and label correction on both the training and the test set. Using the same respective paradigms, we achieve ROC AUCs of 0.919, 0.938, 0.943 and 0.994 for the detection of self-reported fever, and 0.574, 0.611, 0.601, and 0.635 for detection of self-reported shortness of breath. These techniques offer improvements across almost all metrics and events, including PR AUC, sensitivity at 75% specificity, and precision at 75% recall. The ring allows continuous monitoring for detection of event onset, and we further demonstrate an improvement in the early detection of COVID-19 from an average of 3.5 days to an average of 4.1 days before a reported positive test result.
中文摘要: 来自商用现成 (COTS) 可穿戴设备的数据与机器学习算法相结合,为不良生理事件的早期检测提供了前所未有的潜力。然而,一些挑战抑制了这种潜力,包括(1)参与者之间和内部的异质性,使得将检测算法扩展到一般人群不太精确,(2)混杂因素导致对参与者健康状态的错误假设,(3)传感器级别数据中的噪声限制了检测算法的灵敏度,以及(4)自我报告标签的不精确性,歪曲了与给定生理事件相关的真实数据值。这项研究的目标有两个:(1) 表征此类算法在面临这些挑战时的性能,并为研究人员提供有关局限性和机遇的见解;(2) 随后设计算法来应对每个挑战,并提供有关未来发展机会的见解。我们提出的算法包括为每个参与者确定合适的基线以捕获重要的生理变化和标签校正技术(因为它与参与者报告的标识符相关)的技术。我们的工作在可能是最大的可用数据集之一上进行了验证,该数据集由 8000 多名参与者和从 Oura 智能戒指捕获的 130 万小时以上的可穿戴数据获得。利用这个广泛的数据集,我们实现了 COVID-19 的症状前检测,在没有校正技术的情况下,性能接收者操作特征 (ROC) 曲线下面积 (AUC) 为 0.725,在使用基线校正时为 0.739,在训练集上使用基线校正和标签校正时为 0.740,在训练集和测试集上使用基线校正和标签校正时为 0.777。使用相同的各自范例,我们在检测自我报告的发烧时实现了 0.919、0.938、0.943 和 0.994 的 ROC AUC,在检测自我报告的呼吸急促时实现了 0.574、0.611、0.601 和 0.635。这些技术提供了几乎所有指标和事件的改进,包括 PR AUC、75% 特异性的灵敏度和 75% 召回率的精确度。该环可以持续监测以检测事件发生,并且我们进一步证明了在报告阳性检测结果之前,对 COVID-19 的早期检测从平均 3.5 天提高到平均 4.1 天。
232. The EU passes the AI Act and its implications for digital medicine are unclear.
欧盟通过了人工智能法案,其对数字医学的影响尚不清楚。
PMID: 38778162 | DOI: 10.1038/s41746-024-01116-6 | 日期: 2024-05-22
摘要: On 13 March 2024, the much-anticipated AI Act was passed by the EU parliament and will soon be adopted as EU law. It will apply new requirements for developers and deployers of AI-enabled digital health tools (DHTs), including for a defined class of high-risk AI systems and for general-purpose AI. Although the text of the law is available, complete in all but the final checks of legal wording much is still not known about how the AI Act will affect the digital health landscape in the EU and beyond. The wording of many aspects of the Act is ambiguous, and often high-level objectives are stated, with the detail to come later in associated guidance, standards, and member state law and policy. It is also uncertain how the Act will intersect with pre-existing sector-specific legislation for medical AI. There are future steps in the legislative process that can clarify ambiguity, including standards, guidelines, and implementing laws, and the author remains optimistic the EU will get the implementation right.
中文摘要: 2024 年 3 月 13 日,万众期待的《人工智能法案》在欧盟议会获得通过,并将很快被采纳为欧盟法律。它将对人工智能数字健康工具(DHT)的开发者和部署者提出新的要求,包括针对特定类别的高风险人工智能系统和通用人工智能。尽管该法律的文本已经全部完成,但法律措辞的最终检查仍不清楚人工智能法案将如何影响欧盟及其他地区的数字健康格局。该法案的许多方面的措辞都含糊不清,并且通常阐述了高层目标,详细信息将在稍后的相关指南、标准以及成员国法律和政策中提供。目前还不确定该法案将如何与现有的医疗人工智能行业特定立法相交叉。立法过程中的未来步骤可以澄清模糊性,包括标准、指南和实施法律,作者对欧盟将得到正确的实施保持乐观。
233. Generalization-a key challenge for responsible AI in patient-facing clinical applications.
泛化——负责任的人工智能在面向患者的临床应用中的一个关键挑战。
PMID: 38773304 | DOI: 10.1038/s41746-024-01127-3 | 日期: 2024-05-21
摘要: Generalization – the ability of AI systems to apply and/or extrapolate their knowledge to new data which might differ from the original training data – is a major challenge for the effective and responsible implementation of human-centric AI applications. Current debate in bioethics proposes selective prediction as a solution. Here we explore data-based reasons for generalization challenges and look at how selective predictions might be implemented technically, focusing on clinical AI applications in real-world healthcare settings.
中文摘要: 泛化——人工智能系统将其知识应用和/或推断到可能与原始训练数据不同的新数据的能力——是以人为本的人工智能应用的有效和负责任的实施的一个主要挑战。当前生物伦理学的争论提出选择性预测作为解决方案。在这里,我们探讨基于数据的泛化挑战的原因,并研究如何在技术上实施选择性预测,重点关注现实世界医疗保健环境中的临床人工智能应用。
234. Virtual fitness buddy ecosystem: a mixed reality precision health physical activity intervention for children.
虚拟健身伙伴生态系统:针对儿童的混合现实精准健康体育活动干预。
PMID: 38773297 | DOI: 10.1038/s41746-024-01133-5 | 日期: 2024-05-21
摘要: 6-11-year-old children provide a critical window for physical activity (PA) interventions. The Virtual Fitness Buddy ecosystem is a precision health PA intervention for children integrating mixed reality technology to connect people and devices. A cluster randomized, controlled trial was conducted across 19 afterschool sites over two 6-month cohorts to test its efficacy in increasing PA and decreasing sedentary behavior. In the treatment group, a custom virtual dog via a mixed reality kiosk helped children set PA goals while sharing progress with parents to receive feedback and support. Children in the control group set PA goals using a computer without support from the virtual dog or parents. 303 children had 8+ hours of PA data on at least one day of each of the 3 intervention time intervals. Conversion of sedentary time was primarily to light-intensity PA and was strongest for children with low baseline moderate-to-vigorous PA than children above 45 min of baseline moderate-to-vigorous PA. Findings suggest that the VFB ecosystem can promote sustainable PA in children and may be rapidly diffused for widespread public health impact.
中文摘要: 6-11岁儿童是体力活动(PA)干预的关键窗口期。 Virtual Fitness Buddy 生态系统是一种针对儿童的精准健康 PA 干预措施,集成了混合现实技术以连接人和设备。在 19 个课外地点进行了一项为期 6 个月的整群随机对照试验,以测试其在增加 PA 和减少久坐行为方面的功效。在治疗组中,定制的虚拟狗通过混合现实信息亭帮助孩子们设定 PA 目标,同时与父母分享进展以获得反馈和支持。对照组的孩子在没有虚拟狗或父母支持的情况下使用电脑设定 PA 目标。 303 名儿童在 3 个干预时间间隔中的每一天至少有 1 天有 8 小时以上的 PA 数据。久坐时间的转换主要是向轻强度 PA 的转化,对于基线中度至强度 PA 较低的儿童,比基线中度至强度 PA 高于 45 分钟的儿童效果最强。研究结果表明,VFB 生态系统可以促进儿童的可持续 PA,并可能迅速扩散,产生广泛的公共卫生影响。
235. Self-supervised learning of accelerometer data provides new insights for sleep and its association with mortality.
加速度计数据的自我监督学习为睡眠及其与死亡率的关系提供了新的见解。
PMID: 38769347 | DOI: 10.1038/s41746-024-01065-0 | 日期: 2024-05-20
摘要: Sleep is essential to life. Accurate measurement and classification of sleep/wake and sleep stages is important in clinical studies for sleep disorder diagnoses and in the interpretation of data from consumer devices for monitoring physical and mental well-being. Existing non-polysomnography sleep classification techniques mainly rely on heuristic methods developed in relatively small cohorts. Thus, we aimed to establish the accuracy of wrist-worn accelerometers for sleep stage classification and subsequently describe the association between sleep duration and efficiency (proportion of total time asleep when in bed) with mortality outcomes. We developed a self-supervised deep neural network for sleep stage classification using concurrent laboratory-based polysomnography and accelerometry. After exclusion, 1448 participant nights of data were used for training. The difference between polysomnography and the model classifications on the external validation was 34.7 min (95% limits of agreement (LoA): -37.8-107.2 min) for total sleep duration, 2.6 min for REM duration (95% LoA: -68.4-73.4 min) and 32.1 min (95% LoA: -54.4-118.5 min) for NREM duration. The sleep classifier was deployed in the UK Biobank with 100,000 participants to study the association of sleep duration and sleep efficiency with all-cause mortality. Among 66,214 UK Biobank participants, 1642 mortality events were observed. Short sleepers (<6 h) had a higher risk of mortality compared to participants with normal sleep duration of 6-7.9 h, regardless of whether they had low sleep efficiency (Hazard ratios (HRs): 1.58; 95% confidence intervals (CIs): 1.19-2.11) or high sleep efficiency (HRs: 1.45; 95% CIs: 1.16-1.81). Deep-learning-based sleep classification using accelerometers has a fair to moderate agreement with polysomnography. Our findings suggest that having short overnight sleep confers mortality risk irrespective of sleep continuity.
中文摘要: 睡眠对于生命来说至关重要。睡眠/觉醒和睡眠阶段的准确测量和分类对于睡眠障碍诊断的临床研究以及解读用于监测身心健康的消费设备数据非常重要。现有的非多导睡眠图睡眠分类技术主要依赖于在相对较小的队列中开发的启发式方法。因此,我们的目的是确定腕戴式加速度计用于睡眠阶段分类的准确性,并随后描述睡眠持续时间和效率(床上睡眠总时间的比例)与死亡率结果之间的关联。我们开发了一种自我监督的深度神经网络,使用基于实验室的多导睡眠图和加速度测量法进行睡眠阶段分类。排除后,1448 个参与者夜晚的数据被用于训练。多导睡眠图和外部验证模型分类之间的总睡眠持续时间差异为 34.7 分钟(95% 一致限 (LoA):-37.8-107.2 分钟),快速眼动持续时间为 2.6 分钟(95% LoA:-68.4-73.4 分钟)和 32.1 分钟(95% LoA:-68.4-73.4 分钟)。 -54.4-118.5 min) NREM 持续时间。英国生物银行部署了睡眠分类器,共有 100,000 名参与者参与,以研究睡眠持续时间和睡眠效率与全因死亡率之间的关系。在 66,214 名英国生物银行参与者中,观察到 1642 起死亡事件。与正常睡眠时间为 6-7.9 小时的参与者相比,短睡眠者(<6 小时)的死亡风险较高,无论他们的睡眠效率是否低(风险比 (HR):1.58;95% 置信区间 (CI):1.19-2.11)或高睡眠效率(HR:1.45;95% CI:1.16-1.81)。使用加速度计的基于深度学习的睡眠分类与多导睡眠图具有中等至中等的一致性。我们的研究结果表明,无论睡眠连续性如何,夜间睡眠时间短都会带来死亡风险。
236. Predicting recurrent chat contact in a psychological intervention for the youth using natural language processing.
使用自然语言处理预测青少年心理干预中的反复聊天接触。
PMID: 38762694 | DOI: 10.1038/s41746-024-01121-9 | 日期: 2024-05-18
摘要: Chat-based counseling hotlines emerged as a promising low-threshold intervention for youth mental health. However, despite the resulting availability of large text corpora, little work has investigated Natural Language Processing (NLP) applications within this setting. Therefore, this preregistered approach (OSF: XA4PN) utilizes a sample of approximately 19,000 children and young adults that received a chat consultation from a 24/7 crisis service in Germany. Around 800,000 messages were used to predict whether chatters would contact the service again, as this would allow the provision of or redirection to additional treatment. We trained an XGBoost Classifier on the words of the anonymized conversations, using repeated cross-validation and bayesian optimization for hyperparameter search. The best model was able to achieve an AUROC score of 0.68 (p < 0.01) on the previously unseen 3942 newest consultations. A shapely-based explainability approach revealed that words indicating younger age or female gender and terms related to self-harm and suicidal thoughts were associated with a higher chance of recontacting. We conclude that NLP-based predictions of recurrent contact are a promising path toward personalized care at chat hotlines.
中文摘要: 基于聊天的咨询热线成为青少年心理健康的一种有希望的低门槛干预措施。然而,尽管大型文本语料库变得可用,但在此背景下研究自然语言处理(NLP)应用的工作却很少。因此,这种预先注册的方法(OSF:XA4PN)利用了大约 19,000 名儿童和年轻人的样本,这些儿童和年轻人接受了德国 24/7 危机服务机构的聊天咨询。大约 800,000 条消息被用来预测聊天者是否会再次联系该服务,因为这将允许提供或重定向到额外的治疗。我们使用重复交叉验证和贝叶斯优化进行超参数搜索,对匿名对话的单词训练 XGBoost 分类器。最佳模型在之前未见的 3942 次最新咨询中获得了 0.68 的 AUROC 分数(p< 0.01)。基于形状的可解释性方法显示,表明年龄较小或女性的词语以及与自残和自杀想法相关的术语与更高的重新联系机会相关。我们的结论是,基于 NLP 的经常性联系预测是在聊天热线实现个性化护理的一条有希望的途径。
237. A validated web-application (GFDC) for automatic classification of glaucomatous visual field defects using Hodapp-Parrish-Anderson criteria.
经验证的网络应用程序 (GFDC),用于使用 Hodapp-Parrish-Anderson 标准对青光眼视野缺陷进行自动分类。
PMID: 38762669 | DOI: 10.1038/s41746-024-01122-8 | 日期: 2024-05-18
摘要: Subjectivity and ambiguity of visual field classification limits the accuracy and reliability of glaucoma diagnosis, prognostication, and management decisions. Standardised rules for classifying glaucomatous visual field defects exist, but these are labour-intensive and therefore impractical for day-to-day clinical work. Here a web-application, Glaucoma Field Defect Classifier (GFDC), for automatic application of Hodapp-Parrish-Anderson, is presented and validated in a cross-sectional study. GFDC exhibits perfect accuracy in classifying mild, moderate, and severe glaucomatous field defects. GFDC may thereby improve the accuracy and fairness of clinical decision-making in glaucoma. The application and its source code are freely hosted online for clinicians and researchers to use with glaucoma patients.
中文摘要: 视野分类的主观性和模糊性限制了青光眼诊断、预测和管理决策的准确性和可靠性。存在对青光眼视野缺陷进行分类的标准化规则,但这些规则是劳动密集型的,因此对于日常临床工作来说不切实际。这里提出了一个 Web 应用程序,即青光眼视野缺陷分类器 (GFDC),用于自动应用 Hodapp-Parrish-Anderson,并在横断面研究中进行了验证。 GFDC 在分类轻度、中度和重度青光眼视野缺陷方面表现出完美的准确性。 GFDC 从而可以提高青光眼临床决策的准确性和公平性。该应用程序及其源代码免费在线托管,供临床医生和研究人员用于治疗青光眼患者。
238. Development and validation of machine learning algorithms based on electrocardiograms for cardiovascular diagnoses at the population level.
基于心电图的机器学习算法的开发和验证,用于人群层面的心血管诊断。
PMID: 38762623 | DOI: 10.1038/s41746-024-01130-8 | 日期: 2024-05-18
摘要: Artificial intelligence-enabled electrocardiogram (ECG) algorithms are gaining prominence for the early detection of cardiovascular (CV) conditions, including those not traditionally associated with conventional ECG measures or expert interpretation. This study develops and validates such models for simultaneous prediction of 15 different common CV diagnoses at the population level. We conducted a retrospective study that included 1,605,268 ECGs of 244,077 adult patients presenting to 84 emergency departments or hospitals, who underwent at least one 12-lead ECG from February 2007 to April 2020 in Alberta, Canada, and considered 15 CV diagnoses, as identified by International Classification of Diseases, 10th revision (ICD-10) codes: atrial fibrillation (AF), supraventricular tachycardia (SVT), ventricular tachycardia (VT), cardiac arrest (CA), atrioventricular block (AVB), unstable angina (UA), ST-elevation myocardial infarction (STEMI), non-STEMI (NSTEMI), pulmonary embolism (PE), hypertrophic cardiomyopathy (HCM), aortic stenosis (AS), mitral valve prolapse (MVP), mitral valve stenosis (MS), pulmonary hypertension (PHTN), and heart failure (HF). We employed ResNet-based deep learning (DL) using ECG tracings and extreme gradient boosting (XGB) using ECG measurements. When evaluated on the first ECGs per episode of 97,631 holdout patients, the DL models had an area under the receiver operating characteristic curve (AUROC) of <80% for 3 CV conditions (PTE, SVT, UA), 80-90% for 8 CV conditions (CA, NSTEMI, VT, MVP, PHTN, AS, AF, HF) and an AUROC > 90% for 4 diagnoses (AVB, HCM, MS, STEMI). DL models outperformed XGB models with about 5% higher AUROC on average. Overall, ECG-based prediction models demonstrated good-to-excellent prediction performance in diagnosing common CV conditions.
中文摘要: 支持人工智能的心电图 (ECG) 算法在心血管 (CV) 疾病的早期检测方面越来越受到重视,包括那些传统上与传统心电图测量或专家解释无关的疾病。本研究开发并验证了此类模型,可在人群水平上同时预测 15 种不同的常见 CV 诊断。我们进行了一项回顾性研究,纳入了 244,077 名成年患者的 1,605,268 份心电图,这些患者于 2007 年 2 月至 2020 年 4 月在加拿大艾伯塔省的 84 个急诊科或医院接受了至少一次 12 导联心电图检查,并考虑了国际疾病分类第十版 (ICD-10) 代码确定的 15 种心血管诊断:心房颤动(AF)、室上性心动过速 (SVT)、室性心动过速 (VT)、心脏骤停 (CA)、房室传导阻滞 (AVB)、不稳定型心绞痛 (UA)、ST 抬高型心肌梗死 (STEMI)、非 STEMI (NSTEMI)、肺栓塞 (PE)、肥厚性心肌病 (HCM)、主动脉瓣狭窄 (AS)、二尖瓣脱垂 (MVP)、二尖瓣狭窄(MS)、肺动脉高压(PHTN)和心力衰竭(HF)。我们采用基于 ResNet 的深度学习 (DL)(使用心电图追踪)和极限梯度提升 (XGB)(使用心电图测量)。当对 97,631 名坚持治疗的患者每次发作的第一次心电图进行评估时,DL 模型的受试者工作特征曲线 (AUROC) 下面积 (AUROC) 对于 3 种 CV 状况(PTE、SVT、UA)为 <80%,对于 8 种 CV 状况(CA、NSTEMI、VT、MVP、PHTN、AS、AF、HF)为 80-90%,对于4 种诊断(AVB、HCM、MS、STEMI)。 DL 模型的 AUROC 平均高出约 5%,优于 XGB 模型。总体而言,基于心电图的预测模型在诊断常见的 CV 状况方面表现出良好至优秀的预测性能。
239. StrokeClassifier: ischemic stroke etiology classification by ensemble consensus modeling using electronic health records.
StrokeClassifier:使用电子健康记录通过整体共识模型对缺血性中风病因进行分类。
PMID: 38760474 | DOI: 10.1038/s41746-024-01120-w | 日期: 2024-05-17
摘要: Determining acute ischemic stroke (AIS) etiology is fundamental to secondary stroke prevention efforts but can be diagnostically challenging. We trained and validated an automated classification tool, StrokeClassifier, using electronic health record (EHR) text from 2039 non-cryptogenic AIS patients at 2 academic hospitals to predict the 4-level outcome of stroke etiology adjudicated by agreement of at least 2 board-certified vascular neurologists’ review of the EHR. StrokeClassifier is an ensemble consensus meta-model of 9 machine learning classifiers applied to features extracted from discharge summary texts by natural language processing. StrokeClassifier was externally validated in 406 discharge summaries from the MIMIC-III dataset reviewed by a vascular neurologist to ascertain stroke etiology. Compared with vascular neurologists’ diagnoses, StrokeClassifier achieved the mean cross-validated accuracy of 0.74 and weighted F1 of 0.74 for multi-class classification. In MIMIC-III, its accuracy and weighted F1 were 0.70 and 0.71, respectively. In binary classification, the two metrics ranged from 0.77 to 0.96. The top 5 features contributing to stroke etiology prediction were atrial fibrillation, age, middle cerebral artery occlusion, internal carotid artery occlusion, and frontal stroke location. We designed a certainty heuristic to grade the confidence of StrokeClassifier’s diagnosis as non-cryptogenic by the degree of consensus among the 9 classifiers and applied it to 788 cryptogenic patients, reducing cryptogenic diagnoses from 25.2% to 7.2%. StrokeClassifier is a validated artificial intelligence tool that rivals the performance of vascular neurologists in classifying ischemic stroke etiology. With further training, StrokeClassifier may have downstream applications including its use as a clinical decision support system.
中文摘要: 确定急性缺血性中风(AIS)病因是中风二级预防工作的基础,但诊断上可能具有挑战性。我们使用来自 2 家学术医院 2039 名非隐源性 AIS 患者的电子健康记录 (EHR) 文本来训练和验证自动分类工具 StrokeClassifier,以预测中风病因的 4 级结果,该结果由至少 2 名经过委员会认证的血管神经科医生对 EHR 的审查达成一致。 StrokeClassifier 是 9 个机器学习分类器的集成共识元模型,应用于通过自然语言处理从出院摘要文本中提取的特征。 StrokeClassifier 在来自 MIMIC-III 数据集的 406 份出院摘要中进行了外部验证,并由血管神经科医生审查,以确定中风病因。与血管神经科医生的诊断相比,StrokeClassifier 的多类分类平均交叉验证准确度为 0.74,加权 F1 为 0.74。在MIMIC-III中,其准确度和加权F1分别为0.70和0.71。在二元分类中,这两个指标的范围为 0.77 到 0.96。对中风病因预测贡献最大的 5 个特征是房颤、年龄、大脑中动脉闭塞、颈内动脉闭塞和额叶中风位置。我们设计了一种确定性启发法,根据 9 个分类器之间的共识程度将 StrokeClassifier 诊断的置信度分级为非隐源性,并将其应用于 788 名隐源性患者,将隐源性诊断从 25.2% 减少到 7.2%。 StrokeClassifier 是一种经过验证的人工智能工具,在对缺血性中风病因进行分类方面可与血管神经科医生的表现相媲美。通过进一步的培训,StrokeClassifier 可能具有下游应用,包括用作临床决策支持系统。
240. An operational guide to translational clinical machine learning in academic medical centers.
学术医疗中心转化临床机器学习的操作指南。
PMID: 38760407 | DOI: 10.1038/s41746-024-01094-9 | 日期: 2024-05-17
摘要: Few published data science tools are ever translated from academia to real-world clinical settings for which they were intended. One dimension of this problem is the software engineering task of turning published academic projects into tools that are usable at the bedside. Given the complexity of the data ecosystem in large health systems, this task often represents a significant barrier to the real-world deployment of data science tools for prospective piloting and evaluation. Many information technology companies have created Machine Learning Operations (MLOps) teams to help with such tasks at scale, but the low penetration of home-grown data science tools in regular clinical practice precludes the formation of such teams in healthcare organizations. Based on experiences deploying data science tools at two large academic medical centers (Beth Israel Deaconess Medical Center, Boston, MA; Mayo Clinic, Rochester, MN), we propose a strategy to facilitate this transition from academic product to operational tool, defining the responsibilities of the principal investigator, data scientist, machine learning engineer, health system IT administrator, and clinician end-user throughout the process. We first enumerate the technical resources and stakeholders needed to prepare for model deployment. We then propose an approach to planning how the final product will work from data extraction and analysis to visualization of model outputs. Finally, we describe how the team should execute on this plan. We hope to guide health systems aiming to deploy minimum viable data science tools and realize their value in clinical practice.
中文摘要: 很少有已发表的数据科学工具能够从学术界转化为现实世界的临床环境。这个问题的一个方面是软件工程任务,即将已发表的学术项目转化为可在床边使用的工具。鉴于大型卫生系统中数据生态系统的复杂性,这项任务通常对现实世界中部署数据科学工具进行前瞻性试点和评估构成重大障碍。许多信息技术公司已经创建了机器学习运营 (MLOps) 团队来帮助大规模完成此类任务,但本土数据科学工具在常规临床实践中的渗透率较低,阻碍了医疗保健组织中此类团队的形成。根据在两个大型学术医疗中心(马萨诸塞州波士顿贝斯以色列女执事医疗中心;明尼苏达州罗彻斯特梅奥诊所)部署数据科学工具的经验,我们提出了一项战略,以促进从学术产品到操作工具的转变,定义整个过程中首席研究员、数据科学家、机器学习工程师、卫生系统 IT 管理员和临床医生最终用户的职责。我们首先列举准备模型部署所需的技术资源和利益相关者。然后,我们提出一种方法来规划最终产品从数据提取和分析到模型输出可视化的工作方式。最后,我们描述团队应如何执行该计划。我们希望指导卫生系统部署最小可行的数据科学工具并在临床实践中实现其价值。
241. Patient-centricity in digital measure development: co-evolution of best practice and regulatory guidance.
数字测量开发中以患者为中心:最佳实践和监管指南的共同演化。
PMID: 38755349 | DOI: 10.1038/s41746-024-01110-y | 日期: 2024-05-16
摘要: Digital health technologies (DHTs) have the potential to modernize drug development and clinical trial operations by remotely, passively, and continuously collecting ecologically valid evidence that is meaningful to patients’ lived experiences. Such evidence holds potential for all drug development stakeholders, including regulatory agencies, as it will help create a stronger evidentiary link between approval of new therapeutics and the ultimate aim of improving patient lives. However, only a very small number of novel digital measures have matured from exploratory usage into regulatory qualification or efficacy endpoints. This shows that despite the clear potential, actually gaining regulatory agreement that a new measure is both fit-for-purpose and delivers value remains a serious challenge. One of the key stumbling blocks for developers has been the requirement to demonstrate that a digital measure is meaningful to patients. This viewpoint aims to examine the co-evolution of regulatory guidance in the United States (U.S.) and best practice for integration of DHTs into the development of clinical outcome assessments. Contextualizing guidance on meaningfulness within the larger shift towards a patient-centric drug development approach, this paper reviews the U.S. Food and Drug Administration (FDA) guidance and existing literature surrounding the development of meaningful digital measures and patient engagement, including the recent examples of rejections by the FDA that further emphasize patient-centricity in digital measures. Finally, this paper highlights remaining hurdles and provides insights into the established frameworks for development and adoption of digital measures in clinical research.
中文摘要: 数字健康技术(DHT)有潜力通过远程、被动、持续收集对患者生活经历有意义的生态有效证据,实现药物开发和临床试验操作的现代化。这些证据对包括监管机构在内的所有药物开发利益相关者都具有潜力,因为它将有助于在新疗法的批准与改善患者生活的最终目标之间建立更强有力的证据联系。然而,只有极少数新颖的数字措施已经从探索性使用成熟到监管资格或功效终点。这表明,尽管潜力明显,但实际上获得监管机构的一致认可,即新措施既适合目的又可以带来价值,仍然是一个严峻的挑战。开发人员面临的主要障碍之一是需要证明数字测量对患者有意义。这一观点旨在研究美国监管指南的共同演变以及将 DHT 纳入临床结果评估开发的最佳实践。本文结合向以患者为中心的药物开发方法的更大转变中的意义指导,回顾了美国食品和药物管理局 (FDA) 的指南和围绕有意义的数字措施的开发和患者参与的现有文献,包括 FDA 最近拒绝的例子,这些例子进一步强调了数字措施中以患者为中心的原则。最后,本文强调了剩余的障碍,并提供了对临床研究中数字测量的开发和采用的既定框架的见解。
242. An in-depth evaluation of federated learning on biomedical natural language processing for information extraction.
用于信息提取的生物医学自然语言处理联邦学习的深入评估。
PMID: 38750290 | DOI: 10.1038/s41746-024-01126-4 | 日期: 2024-05-15
摘要: Language models (LMs) such as BERT and GPT have revolutionized natural language processing (NLP). However, the medical field faces challenges in training LMs due to limited data access and privacy constraints imposed by regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR). Federated learning (FL) offers a decentralized solution that enables collaborative learning while ensuring data privacy. In this study, we evaluated FL on 2 biomedical NLP tasks encompassing 8 corpora using 6 LMs. Our results show that: (1) FL models consistently outperformed models trained on individual clients’ data and sometimes performed comparably with models trained with polled data; (2) with the fixed number of total data, FL models training with more clients produced inferior performance but pre-trained transformer-based models exhibited great resilience. (3) FL models significantly outperformed pre-trained LLMs with few-shot prompting.
中文摘要: BERT 和 GPT 等语言模型 (LM) 彻底改变了自然语言处理 (NLP)。然而,由于《健康保险流通与责任法案》(HIPPA) 和《通用数据保护条例》(GDPR) 等法规所施加的数据访问限制和隐私限制,医疗领域在培训 LM 方面面临挑战。联邦学习 (FL) 提供了一种去中心化的解决方案,可以在确保数据隐私的同时实现协作学习。在这项研究中,我们使用 6 个语言模型在 2 个生物医学 NLP 任务(涵盖 8 个语料库)上评估了 FL。我们的结果表明:(1)FL 模型的表现始终优于基于个体客户数据训练的模型,有时与使用民意调查数据训练的模型表现相当; (2) 在总数据数量固定的情况下,使用更多客户端进行 FL 模型训练会产生较差的性能,但预训练的基于 Transformer 的模型表现出很大的弹性。 (3) FL 模型在少量提示的情况下显着优于预训练的 LLM。
243. A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis.
人工智能与临床医生在皮肤癌诊断方面的系统回顾和荟萃分析。
PMID: 38744955 | DOI: 10.1038/s41746-024-01103-x | 日期: 2024-05-14
摘要: Scientific research of artificial intelligence (AI) in dermatology has increased exponentially. The objective of this study was to perform a systematic review and meta-analysis to evaluate the performance of AI algorithms for skin cancer classification in comparison to clinicians with different levels of expertise. Based on PRISMA guidelines, 3 electronic databases (PubMed, Embase, and Cochrane Library) were screened for relevant articles up to August 2022. The quality of the studies was assessed using QUADAS-2. A meta-analysis of sensitivity and specificity was performed for the accuracy of AI and clinicians. Fifty-three studies were included in the systematic review, and 19 met the inclusion criteria for the meta-analysis. Considering all studies and all subgroups of clinicians, we found a sensitivity (Sn) and specificity (Sp) of 87.0% and 77.1% for AI algorithms, respectively, and a Sn of 79.78% and Sp of 73.6% for all clinicians (overall); differences were statistically significant for both Sn and Sp. The difference between AI performance (Sn 92.5%, Sp 66.5%) vs. generalists (Sn 64.6%, Sp 72.8%), was greater, when compared with expert clinicians. Performance between AI algorithms (Sn 86.3%, Sp 78.4%) vs expert dermatologists (Sn 84.2%, Sp 74.4%) was clinically comparable. Limitations of AI algorithms in clinical practice should be considered, and future studies should focus on real-world settings, and towards AI-assistance.
中文摘要: 皮肤科人工智能(AI)的科学研究呈指数级增长。本研究的目的是进行系统回顾和荟萃分析,以与不同专业水平的临床医生相比,评估人工智能算法在皮肤癌分类方面的性能。根据 PRISMA 指南,筛选了 3 个电子数据库(PubMed、Embase 和 Cochrane Library)截至 2022 年 8 月的相关文章。使用 QUADAS-2 评估研究质量。为了人工智能和临床医生的准确性,对敏感性和特异性进行了荟萃分析。系统评价纳入 53 项研究,其中 19 项符合荟萃分析的纳入标准。考虑到所有研究和所有临床医生亚组,我们发现 AI 算法的敏感性 (Sn) 和特异性 (Sp) 分别为 87.0% 和 77.1%,所有临床医生的 Sn 为 79.78%,Sp 为 73.6%(总体); Sn 和 Sp 的差异具有统计学意义。与专家临床医生相比,人工智能表现(Sn 92.5%,Sp 66.5%)与通才(Sn 64.6%,Sp 72.8%)之间的差异更大。 AI 算法(Sn 86.3%,Sp 78.4%)与皮肤科医生专家(Sn 84.2%,Sp 74.4%)之间的性能在临床上具有可比性。应考虑人工智能算法在临床实践中的局限性,未来的研究应侧重于现实环境和人工智能辅助。
244. Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data.
医疗人工智能中的捷径学习阻碍了泛化:无需外部数据即可估计人工智能模型泛化的方法。
PMID: 38744921 | DOI: 10.1038/s41746-024-01118-4 | 日期: 2024-05-14
摘要: Healthcare datasets are becoming larger and more complex, necessitating the development of accurate and generalizable AI models for medical applications. Unstructured datasets, including medical imaging, electrocardiograms, and natural language data, are gaining attention with advancements in deep convolutional neural networks and large language models. However, estimating the generalizability of these models to new healthcare settings without extensive validation on external data remains challenging. In experiments across 13 datasets including X-rays, CTs, ECGs, clinical discharge summaries, and lung auscultation data, our results demonstrate that model performance is frequently overestimated by up to 20% on average due to shortcut learning of hidden data acquisition biases (DAB). Shortcut learning refers to a phenomenon in which an AI model learns to solve a task based on spurious correlations present in the data as opposed to features directly related to the task itself. We propose an open source, bias-corrected external accuracy estimate, PEst, that better estimates external accuracy to within 4% on average by measuring and calibrating for DAB-induced shortcut learning.
中文摘要: 医疗保健数据集变得越来越大、越来越复杂,需要为医疗应用开发准确且通用的人工智能模型。随着深度卷积神经网络和大型语言模型的进步,包括医学成像、心电图和自然语言数据在内的非结构化数据集正在引起人们的关注。然而,在不对外部数据进行广泛验证的情况下估计这些模型在新的医疗保健环境中的普遍适用性仍然具有挑战性。在 13 个数据集(包括 X 射线、CT、心电图、临床出院摘要和肺部听诊数据)的实验中,我们的结果表明,由于隐藏数据采集偏差 (DAB) 的快捷学习,模型性能经常被平均高估高达 20%。捷径学习是指人工智能模型根据数据中存在的虚假相关性而不是与任务本身直接相关的特征来学习解决任务的现象。我们提出了一种开源的、偏差校正的外部精度估计 PEst,通过测量和校准 DAB 引发的快捷学习,可以更好地估计外部精度,平均在 4% 以内。
245. FDA-cleared home sleep apnea testing devices.
FDA 批准的家用睡眠呼吸暂停测试设备。
PMID: 38740907 | DOI: 10.1038/s41746-024-01112-w | 日期: 2024-05-13
摘要: The demand for home sleep apnea testing (HSAT) devices is escalating, particularly in the context of the coronavirus 2019 (COVID-19) pandemic. The absence of standardized development and verification procedures poses a significant challenge. This study meticulously analyzed the approval process characteristics of HSAT devices by the U.S. Food and Drug Administration (FDA) from September 1, 2003, to September 1, 2023, with a primary focus on ensuring safety and clinical effectiveness. We examined 58 reports out of 1046 that underwent FDA clearance via the 510(k) and de novo pathways. A substantial surge in certifications after the 2022 pandemic was observed. Type-3 devices dominated, signifying a growing trend for both home and clinical use. Key measurement items included respiration and sleep analysis, with the apnea-hypopnea index (AHI) and sleep stage emerging as pivotal indicators. The majority of FDA-cleared HSAT devices adhered to electrical safety and biocompatibility standards. Critical considerations encompass performance and function testing, usability, and cybersecurity. This study emphasized the nearly indispensable role of clinical trials in ensuring the clinical effectiveness of HSAT devices. Future studies should propose guidances that specify stringent requirements, robust clinical trial designs, and comprehensive performance criteria to guarantee the minimum safety and clinical effectiveness of HSATs.
中文摘要: 对家庭睡眠呼吸暂停测试 (HSAT) 设备的需求正在不断增加,特别是在 2019 年冠状病毒 (COVID-19) 大流行的背景下。缺乏标准化的开发和验证程序构成了重大挑战。本研究细致分析了美国食品药品监督管理局(FDA)从2003年9月1日至2023年9月1日期间HSAT器械的审批流程特点,主要关注确保安全性和临床有效性。我们检查了 1046 份通过 510(k) 和 de novo 途径获得 FDA 批准的报告中的 58 份。 2022 年大流行之后,认证数量大幅增加。 3 类设备占主导地位,这表明家庭和临床使用的趋势不断增长。主要测量项目包括呼吸和睡眠分析,其中呼吸暂停低通气指数(AHI)和睡眠阶段成为关键指标。大多数经 FDA 批准的 HSAT 设备都遵守电气安全和生物相容性标准。关键考虑因素包括性能和功能测试、可用性和网络安全。这项研究强调了临床试验在确保 HSAT 设备的临床有效性方面几乎不可或缺的作用。未来的研究应提出指导意见,明确严格的要求、稳健的临床试验设计和综合性能标准,以保证 HSAT 的最低安全性和临床有效性。
246. Generalized sleep decoding with basal ganglia signals in multiple movement disorders.
多种运动障碍中基底神经节信号的广义睡眠解码。
PMID: 38729977 | DOI: 10.1038/s41746-024-01115-7 | 日期: 2024-05-10
摘要: Sleep disturbances profoundly affect the quality of life in individuals with neurological disorders. Closed-loop deep brain stimulation (DBS) holds promise for alleviating sleep symptoms, however, this technique necessitates automated sleep stage decoding from intracranial signals. We leveraged overnight data from 121 patients with movement disorders (Parkinson’s disease, Essential Tremor, Dystonia, Essential Tremor, Huntington’s disease, and Tourette’s syndrome) in whom synchronized polysomnograms and basal ganglia local field potentials were recorded, to develop a generalized, multi-class, sleep specific decoder - BGOOSE. This generalized model achieved 85% average accuracy across patients and across disease conditions, even in the presence of recordings from different basal ganglia targets. Furthermore, we also investigated the role of electrocorticography on decoding performances and proposed an optimal decoding map, which was shown to facilitate channel selection for optimal model performances. BGOOSE emerges as a powerful tool for generalized sleep decoding, offering exciting potentials for the precision stimulation delivery of DBS and better management of sleep disturbances in movement disorders.
中文摘要: 睡眠障碍深刻影响神经系统疾病患者的生活质量。闭环深部脑刺激(DBS)有望缓解睡眠症状,然而,该技术需要根据颅内信号自动解码睡眠阶段。我们利用 121 名运动障碍患者(帕金森病、特发性震颤、肌张力障碍、特发性震颤、亨廷顿病和抽动秽语综合征)的夜间数据,记录了同步多导睡眠图和基底神经节局部场电位,开发了一种通用的、多类别的睡眠特定解码器 - BGOOSE。即使存在来自不同基底神经节目标的记录,该广义模型在不同患者和不同疾病条件下的平均准确率达到 85%。此外,我们还研究了皮质电图对解码性能的作用,并提出了最佳解码图,该图被证明有助于通道选择以获得最佳模型性能。 BGOOSE 成为广义睡眠解码的强大工具,为 DBS 的精确刺激传递和更好地管理运动障碍中的睡眠障碍提供了令人兴奋的潜力。
247. The application of machine learning techniques in posttraumatic stress disorder: a systematic review and meta-analysis.
机器学习技术在创伤后应激障碍中的应用:系统评价和荟萃分析。
PMID: 38724610 | DOI: 10.1038/s41746-024-01117-5 | 日期: 2024-05-09
摘要: Posttraumatic stress disorder (PTSD) recently becomes one of the most important mental health concerns. However, no previous study has comprehensively reviewed the application of big data and machine learning (ML) techniques in PTSD. We found 873 studies meet the inclusion criteria and a total of 31 of those in a sample of 210,001 were included in quantitative analysis. ML algorithms were able to discriminate PTSD with an overall accuracy of 0.89. Pooled estimates of classification accuracy from multi-dimensional data (0.96) are higher than single data types (0.86 to 0.90). ML techniques can effectively classify PTSD and models using multi-dimensional data perform better than those using single data types. While selecting optimal combinations of data types and ML algorithms to be clinically applied at the individual level still remains a big challenge, these findings provide insights into the classification, identification, diagnosis and treatment of PTSD.
中文摘要: 创伤后应激障碍(PTSD)最近成为最重要的心理健康问题之一。然而,此前还没有研究全面回顾大数据和机器学习(ML)技术在创伤后应激障碍(PTSD)中的应用。我们发现 873 项研究符合纳入标准,210,001 份样本中总共有 31 项纳入定量分析。 ML 算法能够以 0.89 的总体准确度区分 PTSD。多维数据分类准确度的汇总估计 (0.96) 高于单一数据类型(0.86 至 0.90)。机器学习技术可以有效地对 PTSD 进行分类,并且使用多维数据的模型比使用单一数据类型的模型表现更好。虽然选择在个体层面临床应用的数据类型和机器学习算法的最佳组合仍然是一个巨大的挑战,但这些发现为 PTSD 的分类、识别、诊断和治疗提供了见解。
248. Distribution shift detection for the postmarket surveillance of medical AI algorithms: a retrospective simulation study.
医疗人工智能算法上市后监测的分布变化检测:一项回顾性模拟研究。
PMID: 38724581 | DOI: 10.1038/s41746-024-01085-w | 日期: 2024-05-09
摘要: Distribution shifts remain a problem for the safe application of regulated medical AI systems, and may impact their real-world performance if undetected. Postmarket shifts can occur for example if algorithms developed on data from various acquisition settings and a heterogeneous population are predominantly applied in hospitals with lower quality data acquisition or other centre-specific acquisition factors, or where some ethnicities are over-represented. Therefore, distribution shift detection could be important for monitoring AI-based medical products during postmarket surveillance. We implemented and evaluated three deep-learning based shift detection techniques (classifier-based, deep kernel, and multiple univariate kolmogorov-smirnov tests) on simulated shifts in a dataset of 130’486 retinal images. We trained a deep learning classifier for diabetic retinopathy grading. We then simulated population shifts by changing the prevalence of patients’ sex, ethnicity, and co-morbidities, and example acquisition shifts by changes in image quality. We observed classification subgroup performance disparities w.r.t. image quality, patient sex, ethnicity and co-morbidity presence. The sensitivity at detecting referable diabetic retinopathy ranged from 0.50 to 0.79 for different ethnicities. This motivates the need for detecting shifts after deployment. Classifier-based tests performed best overall, with perfect detection rates for quality and co-morbidity subgroup shifts at a sample size of 1000. It was the only method to detect shifts in patient sex, but required large sample sizes ( > 3 0 ' 000 ). All methods identified easier-to-detect out-of-distribution shifts with small (≤300) sample sizes. We conclude that effective tools exist for detecting clinically relevant distribution shifts. In particular classifier-based tests can be easily implemented components in the post-market surveillance strategy of medical device manufacturers.
中文摘要: 分布变化仍然是受监管医疗人工智能系统安全应用的一个问题,如果未被发现,可能会影响其现实世界的性能。例如,如果根据来自不同采集环境和异质人群的数据开发的算法主要应用于数据采集质量较低或其他中心特定采集因素的医院,或者某些种族代表性过高的医院,则可能会发生上市后变化。因此,分销转移检测对于在上市后监测期间监测基于人工智能的医疗产品可能很重要。我们在 130,486 个视网膜图像数据集中的模拟移位上实现并评估了三种基于深度学习的移位检测技术(基于分类器、深度内核和多个单变量柯尔莫哥洛夫-斯米尔诺夫测试)。我们训练了一个用于糖尿病视网膜病变分级的深度学习分类器。然后,我们通过改变患者性别、种族和合并症的患病率来模拟人口变化,并通过图像质量的变化来模拟采集变化。我们观察到分类亚组的表现差异。图像质量、患者性别、种族和合并症的存在。对于不同种族,检测可参考的糖尿病视网膜病变的敏感性范围为 0.50 至 0.79。这激发了在部署后检测变化的需要。基于分类器的测试总体表现最佳,样本量为 1000 时,质量和共病亚组变化的检出率完美。这是检测患者性别变化的唯一方法,但需要大样本量 (> 3 0 ' 000 )。所有方法都通过小样本量(≤300)识别出更容易检测的分布外变化。我们的结论是,存在检测临床相关分布变化的有效工具。特别是基于分类器的测试可以很容易地成为医疗器械制造商上市后监控策略中的组成部分。
249. Charting a new course in healthcare: early-stage AI algorithm registration to enhance trust and transparency.
制定医疗保健新路线:早期人工智能算法注册以增强信任和透明度。
PMID: 38720011 | DOI: 10.1038/s41746-024-01104-w | 日期: 2024-05-08
摘要: AI holds the potential to transform healthcare, promising improvements in patient care. Yet, realizing this potential is hampered by over-reliance on limited datasets and a lack of transparency in validation processes. To overcome these obstacles, we advocate the creation of a detailed registry for AI algorithms. This registry would document the development, training, and validation of AI models, ensuring scientific integrity and transparency. Additionally, it would serve as a platform for peer review and ethical oversight. By bridging the gap between scientific validation and regulatory approval, such as by the FDA, we aim to enhance the integrity and trustworthiness of AI applications in healthcare.
中文摘要: 人工智能具有改变医疗保健的潜力,有望改善患者护理。然而,过度依赖有限的数据集和验证过程缺乏透明度阻碍了这一潜力的实现。为了克服这些障碍,我们主张为人工智能算法创建详细的注册表。该注册表将记录人工智能模型的开发、培训和验证,确保科学的完整性和透明度。此外,它将作为同行评审和道德监督的平台。通过弥合科学验证与 FDA 等监管机构批准之间的差距,我们的目标是提高医疗保健领域人工智能应用的完整性和可信度。
250. Multimodal data fusion using sparse canonical correlation analysis and cooperative learning: a COVID-19 cohort study.
使用稀疏典型相关分析和合作学习的多模态数据融合:一项 COVID-19 队列研究。
PMID: 38714751 | DOI: 10.1038/s41746-024-01128-2 | 日期: 2024-05-07
摘要: Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
中文摘要: 通过技术创新,可以利用高维、多尺度生物医学数据从多个角度检查患者群体,以对临床表型进行分类并预测结果。在这里,我们的目标是介绍使用无监督和有监督稀疏线性方法在 COVID-19 患者队列中分析多模态数据的方法。这项针对 149 名成年患者的前瞻性队列研究是在三级医疗学术中心进行的。首先,我们使用稀疏典型相关分析(CCA)来识别和量化不同数据模式之间的关系,包括病毒基因组测序、成像、临床数据和实验室结果。然后,我们使用合作学习来预测 COVID-19 患者的临床结果:重症监护病房入院。我们表明,代表严重疾病和急性期反应的血清生物标志物与 LLL 频率通道中的原始和小波放射组学特征相关 (cor(Xu1, Zv1) = 0.596,p 值 < 0.001)。在放射组学特征中,报告偏度、峰度和均匀性的基于直方图的一阶特征具有最低的负系数,而与熵相关的特征具有最高的正系数。此外,对临床数据和实验室结果的无监督分析可以深入了解不同的临床表型。利用全球病毒基因组数据库的可用性,我们证明了 Word2Vec 自然语言处理模型可用于病毒基因组编码。它不仅可以分离主要的 SARS-CoV-2 变体,还可以保留它们之间的系统发育关系。我们使用Word2Vec编码的四元模型在监督任务中取得了更好的预测结果。该模型的曲线下面积 (AUC) 和准确度值分别为 0.87 和 0.77。我们的研究表明,稀疏 CCA 分析和合作学习是处理高维、多模态数据以研究无监督和监督任务中多变量关联的强大技术。
251. Online cognitive monitoring technology for people with Parkinson’s disease and REM sleep behavioural disorder.
针对帕金森病和快速眼动睡眠行为障碍患者的在线认知监测技术。
PMID: 38714742 | DOI: 10.1038/s41746-024-01124-6 | 日期: 2024-05-07
摘要: Automated online cognitive assessments are set to revolutionise clinical research and healthcare. However, their applicability for Parkinson’s Disease (PD) and REM Sleep Behavioural Disorder (RBD), a strong PD precursor, is underexplored. Here, we developed an online battery to measure early cognitive changes in PD and RBD. Evaluating 19 candidate tasks showed significant global accuracy deficits in PD (0.65 SD, p = 0.003) and RBD (0.45 SD, p = 0.027), driven by memory, language, attention and executive underperformance, and global reaction time deficits in PD (0.61 SD, p = 0.001). We identified a brief 20-min battery that had sensitivity to deficits across these cognitive domains while being robust to the device used. This battery was more sensitive to early-stage and prodromal deficits than the supervised neuropsychological scales. It also diverged from those scales, capturing additional cognitive factors sensitive to PD and RBD. This technology offers an economical and scalable method for assessing these populations that can complement standard supervised practices.
中文摘要: 自动化在线认知评估将彻底改变临床研究和医疗保健。然而,它们对帕金森病 (PD) 和快速眼动睡眠行为障碍 (RBD)(PD 的强前兆)的适用性尚未得到充分研究。在这里,我们开发了一种在线电池来测量 PD 和 RBD 的早期认知变化。评估 19 项候选任务显示,PD (0.65 SD,p = 0.003) 和 RBD (0.45 SD,p = 0.027) 存在显着的全局准确性缺陷,这是由记忆、语言、注意力和执行能力不佳以及 PD 的全局反应时间缺陷 (0.61 SD,p = 0.001) 驱动的。我们发现了一种短暂的 20 分钟电池,它对这些认知领域的缺陷敏感,同时对所使用的设备具有鲁棒性。与监督神经心理学量表相比,该电池组对早期和前驱缺陷更敏感。它还与这些量表不同,捕获了对 PD 和 RBD 敏感的其他认知因素。该技术提供了一种经济且可扩展的方法来评估这些人群,可以补充标准的监督实践。
252. Exploring cognitive reserve’s influence: unveiling the dynamics of digital telerehabilitation in Parkinson’s Disease Resilience.
探索认知储备的影响:揭示数字远程康复在帕金森病恢复力中的动态。
PMID: 38710915 | DOI: 10.1038/s41746-024-01113-9 | 日期: 2024-05-06
摘要: Telerehabilitation is emerging as a promising digital method for delivering rehabilitation to Parkinson’s Disease (PD) patients, especially in the early stages to promote brain resilience. This study explores how cognitive reserve (CR), the brain’s ability to withstand aging and disease, impacts the effectiveness of telerehabilitation. It specifically examines the influence of lifelong cognitive activities on the relationship between neural reserve and improved functional abilities following rehabilitation. In the study, 42 PD patients underwent a 4-month neuromotor telerehabilitation program. CR proxies were assessed using the Cognitive Reserve Index questionnaire (CRIq), brain changes via 3T-MRI, and functional response through changes in the 6-Minute Walk Distance (6MWD). Participants were divided into responders (n = 23) and non-responders (n = 19) based on their 6MWD improvement. A multiple regression model was run to test significant predictors of 6MWD after treatment in each group. The results revealed a significant correlation between 6MWD and CRIq scores, but only among responders. Notably, the CRIq Leisure-Time sub-index, along with baseline 6MWD, were predictors of post-treatment 6MWD. These findings highlight CR’s role in enhancing the benefits of telerehabilitation on PD patients’ neuromotor functions. Clinically, these results suggest that neurologists and clinicians should consider patients’ lifestyles and cognitive engagement as important factors in predicting and enhancing the outcomes of telerehabilitation. The study underscores the potential of CR as both a predictor and booster of telerehabilitation’s effects, advocating for a personalized approach to PD treatment that takes into account individual CR levels.
中文摘要: 远程康复正在成为一种有前景的数字方法,可为帕金森病 (PD) 患者提供康复服务,特别是在早期阶段,以促进大脑恢复能力。这项研究探讨了认知储备(CR),即大脑抵御衰老和疾病的能力,如何影响远程康复的有效性。它专门研究了终生认知活动对神经储备和康复后功能能力提高之间关系的影响。在这项研究中,42 名 PD 患者接受了为期 4 个月的神经运动远程康复计划。使用认知储备指数问卷 (CRIq)、通过 3T-MRI 的大脑变化以及通过 6 分钟步行距离 (6MWD) 的变化来评估 CR 代理。根据 6MWD 的改善情况,参与者被分为有反应者 (n = 23) 和无反应者 (n = 19)。运行多元回归模型来测试每组治疗后 6MWD 的显着预测因子。结果显示 6MWD 和 CRIq 评分之间存在显着相关性,但仅限于应答者。值得注意的是,CRIq 休闲时间分指数以及基线 6MWD 是治疗后 6MWD 的预测因子。这些发现强调了 CR 在增强远程康复对 PD 患者神经运动功能的益处方面的作用。在临床上,这些结果表明神经科医生和临床医生应将患者的生活方式和认知参与视为预测和增强远程康复结果的重要因素。该研究强调了 CR 作为远程康复效果的预测因子和增强剂的潜力,提倡考虑个人 CR 水平的个性化 PD 治疗方法。
253. Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
数字病理学中的人工智能:诊断测试准确性的系统回顾和荟萃分析。
PMID: 38704465 | DOI: 10.1038/s41746-024-01106-8 | 日期: 2024-05-04
摘要: Ensuring diagnostic performance of artificial intelligence (AI) before introduction into clinical practice is essential. Growing numbers of studies using AI for digital pathology have been reported over recent years. The aim of this work is to examine the diagnostic accuracy of AI in digital pathology images for any disease. This systematic review and meta-analysis included diagnostic accuracy studies using any type of AI applied to whole slide images (WSIs) for any disease. The reference standard was diagnosis by histopathological assessment and/or immunohistochemistry. Searches were conducted in PubMed, EMBASE and CENTRAL in June 2022. Risk of bias and concerns of applicability were assessed using the QUADAS-2 tool. Data extraction was conducted by two investigators and meta-analysis was performed using a bivariate random effects model, with additional subgroup analyses also performed. Of 2976 identified studies, 100 were included in the review and 48 in the meta-analysis. Studies were from a range of countries, including over 152,000 whole slide images (WSIs), representing many diseases. These studies reported a mean sensitivity of 96.3% (CI 94.1-97.7) and mean specificity of 93.3% (CI 90.5-95.4). There was heterogeneity in study design and 99% of studies identified for inclusion had at least one area at high or unclear risk of bias or applicability concerns. Details on selection of cases, division of model development and validation data and raw performance data were frequently ambiguous or missing. AI is reported as having high diagnostic accuracy in the reported areas but requires more rigorous evaluation of its performance.
中文摘要: 在引入临床实践之前确保人工智能 (AI) 的诊断性能至关重要。近年来,越来越多的研究使用人工智能进行数字病理学。这项工作的目的是检查人工智能在数字病理图像中对任何疾病的诊断准确性。这项系统回顾和荟萃分析包括使用任何类型的人工智能应用于任何疾病的整个幻灯片图像(WSI)的诊断准确性研究。参考标准是通过组织病理学评估和/或免疫组织化学进行诊断。检索于 2022 年 6 月在 PubMed、EMBASE 和 CENTRAL 中进行。使用 QUADAS-2 工具评估偏倚风险和适用性问题。数据提取由两名研究人员进行,并使用双变量随机效应模型进行荟萃分析,还进行了额外的亚组分析。在 2976 项已确定的研究中,100 项纳入综述,48 项纳入荟萃分析。研究来自多个国家,包括超过 152,000 张完整幻灯片图像 (WSI),代表多种疾病。这些研究报告的平均敏感性为 96.3% (CI 94.1-97.7),平均特异性为 93.3% (CI 90.5-95.4)。研究设计存在异质性,99% 确定纳入的研究至少有一个领域存在较高或不明确的偏倚风险或适用性问题。有关案例选择、模型开发和验证数据划分以及原始性能数据的详细信息经常含糊不清或缺失。据报道,人工智能在所报告的领域具有很高的诊断准确性,但需要对其性能进行更严格的评估。
254. Optical coherence tomography choroidal enhancement using generative deep learning.
使用生成深度学习进行光学相干断层扫描脉络膜增强。
PMID: 38704440 | DOI: 10.1038/s41746-024-01119-3 | 日期: 2024-05-04
摘要: Spectral-domain optical coherence tomography (SDOCT) is the gold standard of imaging the eye in clinics. Penetration depth with such devices is, however, limited and visualization of the choroid, which is essential for diagnosing chorioretinal disease, remains limited. Whereas swept-source OCT (SSOCT) devices allow for visualization of the choroid these instruments are expensive and availability in praxis is limited. We present an artificial intelligence (AI)-based solution to enhance the visualization of the choroid in OCT scans and allow for quantitative measurements of choroidal metrics using generative deep learning (DL). Synthetically enhanced SDOCT B-scans with improved choroidal visibility were generated, leveraging matching images to learn deep anatomical features during the training. Using a single-center tertiary eye care institution cohort comprising a total of 362 SDOCT-SSOCT paired subjects, we trained our model with 150,784 images from 410 healthy, 192 glaucoma, and 133 diabetic retinopathy eyes. An independent external test dataset of 37,376 images from 146 eyes was deployed to assess the authenticity and quality of the synthetically enhanced SDOCT images. Experts’ ability to differentiate real versus synthetic images was poor (47.5% accuracy). Measurements of choroidal thickness, area, volume, and vascularity index, from the reference SSOCT and synthetically enhanced SDOCT, showed high Pearson’s correlations of 0.97 [95% CI: 0.96-0.98], 0.97 [0.95-0.98], 0.95 [0.92-0.98], and 0.87 [0.83-0.91], with intra-class correlation values of 0.99 [0.98-0.99], 0.98 [0.98-0.99], and 0.95 [0.96-0.98], 0.93 [0.91-0.95], respectively. Thus, our DL generative model successfully generated realistic enhanced SDOCT data that is indistinguishable from SSOCT images providing improved visualization of the choroid. This technology enabled accurate measurements of choroidal metrics previously limited by the imaging depth constraints of SDOCT. The findings open new possibilities for utilizing affordable SDOCT devices in studying the choroid in both healthy and pathological conditions.
中文摘要: 谱域光学相干断层扫描 (SDOCT) 是临床眼部成像的黄金标准。然而,此类设备的穿透深度有限,并且对于诊断脉络膜视网膜疾病至关重要的脉络膜的可视化仍然有限。虽然扫源 OCT (SSOCT) 设备可以实现脉络膜的可视化,但这些仪器价格昂贵,而且在实践中的可用性有限。我们提出了一种基于人工智能 (AI) 的解决方案,以增强 OCT 扫描中脉络膜的可视化,并允许使用生成深度学习 (DL) 定量测量脉络膜指标。生成了具有改善的脉络膜可视性的综合增强 SDOCT B 扫描,利用匹配图像在训练期间学习深层解剖特征。我们使用由 362 名 SDOCT-SSOCT 配对受试者组成的单中心三级眼保健机构队列,使用来自 410 只健康眼睛、192 只青光眼和 133 只糖尿病视网膜病变眼睛的 150,784 张图像来训练我们的模型。部署了包含来自 146 只眼睛的 37,376 幅图像的独立外部测试数据集,以评估综合增强的 SDOCT 图像的真实性和质量。专家区分真实图像和合成图像的能力很差(准确度为 47.5%)。根据参考 SSOCT 和综合增强 SDOCT 测量的脉络膜厚度、面积、体积和血管分布指数显示,Pearson 相关性高达 0.97 [95% CI:0.96-0.98]、0.97 [0.95-0.98]、0.95 [0.92-0.98] 和 0.87 [0.83-0.91],组内相关值分别为 0.99 [0.98-0.99]、0.98 [0.98-0.99] 和 0.95 [0.96-0.98]、0.93 [0.91-0.95]。因此,我们的深度学习生成模型成功生成了逼真的增强型 SDOCT 数据,该数据与 SSOCT 图像无法区分,从而提供了改进的脉络膜可视化效果。该技术能够准确测量脉络膜指标,而该指标之前受到 SDOCT 成像深度的限制。这些发现为利用经济实惠的 SDOCT 设备研究健康和病理条件下的脉络膜开辟了新的可能性。
255. Sync fast and solve things-best practices for responsible digital health.
快速同步并解决问题 - 负责任的数字健康的最佳实践。
PMID: 38704413 | DOI: 10.1038/s41746-024-01105-9 | 日期: 2024-05-04
摘要: Digital health innovation is expected to transform healthcare, but it also generates ethical and societal concerns, such as privacy risks, and biases that can compound existing health inequalities. While such concerns are widely recognized, existing regulatory principles, oversight methods and ethical frameworks seem out of sync with digital health innovation. New governance and innovation best practices are thus needed to bring such principles to bear with the reality of business, innovation, and regulation.To grant practical insight into best practices for responsible digital health innovation, we conducted a qualitative study based on an interactive engagement methodology. We engaged key stakeholders (n = 46) operating at the translational frontier of digital health. This approach allowed us to identify three clusters of governance and innovation best practices in digital health innovation: i) inclusive co-creation, ii) responsive regulation, and iii) value-driven innovation. Our study shows that realizing responsible digital health requires diverse stakeholders’ commitment to adapt innovation and regulation practices, embracing co-creation as the default modus operandi for digital health development. We describe these collaborative practices and show how they can ensure that innovation is neither slowed by overregulation, nor leads to unethical outcomes.
中文摘要: 数字健康创新有望改变医疗保健,但它也会产生道德和社会问题,例如隐私风险以及可能加剧现有健康不平等的偏见。尽管这些担忧得到了广泛认可,但现有的监管原则、监督方法和道德框架似乎与数字健康创新不同步。因此,需要新的治理和创新最佳实践,以使这些原则适应商业、创新和监管的现实。为了对负责任的数字健康创新的最佳实践提供实际见解,我们基于交互式参与方法进行了定性研究。我们与数字健康转化前沿的主要利益相关者 (n = 46) 进行了接触。这种方法使我们能够确定数字健康创新中的三个治理和创新最佳实践:i) 包容性共同创造,ii) 响应式监管,以及 iii) 价值驱动的创新。我们的研究表明,实现负责任的数字健康需要不同利益相关者致力于调整创新和监管实践,将共同创造作为数字健康发展的默认运作方式。我们描述了这些协作实践,并展示了它们如何确保创新既不会因过度监管而减慢,也不会导致不道德的结果。
256. A physiologically-based digital twin for alcohol consumption-predicting real-life drinking responses and long-term plasma PEth.
基于生理学的数字孪生,用于预测现实生活中的饮酒反应和长期血浆 PEth。
PMID: 38702474 | DOI: 10.1038/s41746-024-01089-6 | 日期: 2024-05-03
摘要: Alcohol consumption is associated with a wide variety of preventable health complications and is a major risk factor for all-cause mortality in the age group 15-47 years. To reduce dangerous drinking behavior, eHealth applications have shown promise. A particularly interesting potential lies in the combination of eHealth apps with mathematical models. However, existing mathematical models do not consider real-life situations, such as combined intake of meals and beverages, and do not connect drinking to clinical markers, such as phosphatidylethanol (PEth). Herein, we present such a model which can simulate real-life situations and connect drinking to long-term markers. The new model can accurately describe both estimation data according to a χ2 -test (187.0 < Tχ2 = 226.4) and independent validation data (70.8 < Tχ2 = 93.5). The model can also be personalized using anthropometric data from a specific individual and can thus be used as a physiologically-based digital twin. This twin is also able to connect short-term consumption of alcohol to the long-term dynamics of PEth levels in the blood, a clinical biomarker of alcohol consumption. Here we illustrate how connecting short-term consumption to long-term markers allows for a new way to determine patient alcohol consumption from measured PEth levels. An additional use case of the twin could include the combined evaluation of patient-reported AUDIT forms and measured PEth levels. Finally, we integrated the new model into an eHealth application, which could help guide individual users or clinicians to help reduce dangerous drinking.
中文摘要: 饮酒与多种可预防的健康并发症有关,并且是 15-47 岁年龄段全因死亡的主要危险因素。为了减少危险的饮酒行为,电子健康应用已显示出希望。一个特别有趣的潜力在于电子医疗应用程序与数学模型的结合。然而,现有的数学模型没有考虑现实生活中的情况,例如膳食和饮料的综合摄入量,并且没有将饮酒与临床标志物(例如磷脂酰乙醇(PEth))联系起来。在这里,我们提出了这样一个模型,它可以模拟现实生活中的情况并将饮酒与长期标记联系起来。新模型可以根据 χ2 检验(187.0 < Tχ2 = 226.4)和独立验证数据(70.8 < Tχ2 = 93.5)准确描述估计数据。该模型还可以使用特定个体的人体测量数据进行个性化,因此可以用作基于生理学的数字双胞胎。这对双胞胎还能够将短期饮酒与血液中 PEth 水平的长期动态联系起来,PEth 水平是饮酒的临床生物标志物。在这里,我们说明如何将短期饮酒量与长期标记物联系起来,从而提供一种新方法,根据测量的 PEth 水平确定患者的饮酒量。双胞胎的另一个用例可能包括对患者报告的审核表和测量的 PEth 水平进行综合评估。最后,我们将新模型集成到电子健康应用程序中,这可以帮助指导个人用户或临床医生帮助减少危险饮酒。
257. FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer.
FFA-GPT:用于眼底荧光素血管造影解释和问答的自动化管道。
PMID: 38702471 | DOI: 10.1038/s41746-024-01101-z | 日期: 2024-05-03
摘要: Fundus fluorescein angiography (FFA) is a crucial diagnostic tool for chorioretinal diseases, but its interpretation requires significant expertise and time. Prior studies have used Artificial Intelligence (AI)-based systems to assist FFA interpretation, but these systems lack user interaction and comprehensive evaluation by ophthalmologists. Here, we used large language models (LLMs) to develop an automated interpretation pipeline for both report generation and medical question-answering (QA) for FFA images. The pipeline comprises two parts: an image-text alignment module (Bootstrapping Language-Image Pre-training) for report generation and an LLM (Llama 2) for interactive QA. The model was developed using 654,343 FFA images with 9392 reports. It was evaluated both automatically, using language-based and classification-based metrics, and manually by three experienced ophthalmologists. The automatic evaluation of the generated reports demonstrated that the system can generate coherent and comprehensible free-text reports, achieving a BERTScore of 0.70 and F1 scores ranging from 0.64 to 0.82 for detecting top-5 retinal conditions. The manual evaluation revealed acceptable accuracy (68.3%, Kappa 0.746) and completeness (62.3%, Kappa 0.739) of the generated reports. The generated free-form answers were evaluated manually, with the majority meeting the ophthalmologists’ criteria (error-free: 70.7%, complete: 84.0%, harmless: 93.7%, satisfied: 65.3%, Kappa: 0.762-0.834). This study introduces an innovative framework that combines multi-modal transformers and LLMs, enhancing ophthalmic image interpretation, and facilitating interactive communications during medical consultation.
中文摘要: 眼底荧光素血管造影(FFA)是脉络膜视网膜疾病的重要诊断工具,但其解释需要大量的专业知识和时间。先前的研究已使用基于人工智能(AI)的系统来辅助FFA解释,但这些系统缺乏用户交互和眼科医生的综合评估。在这里,我们使用大型语言模型 (LLM) 开发自动解释管道,用于 FFA 图像的报告生成和医学问答 (QA)。该管道由两部分组成:用于生成报告的图像文本对齐模块(引导语言图像预训练)和用于交互式 QA 的 LLM (Llama 2)。该模型是使用 654,343 个 FFA 图像和 9392 份报告开发的。它使用基于语言和基于分类的指标进行自动评估,并由三位经验丰富的眼科医生进行手动评估。对生成的报告的自动评估表明,该系统可以生成连贯且易于理解的自由文本报告,在检测前 5 名视网膜状况时获得 0.70 的 BERTScore 和 0.64 至 0.82 的 F1 分数。手动评估显示生成的报告的准确性(68.3%,Kappa 0.746)和完整性(62.3%,Kappa 0.739)可接受。生成的自由形式答案经过手动评估,大多数符合眼科医生的标准(无错误:70.7%,完整:84.0%,无害:93.7%,满意:65.3%,Kappa:0.762-0.834)。这项研究引入了一个创新框架,该框架结合了多模态转换器和法学硕士,增强了眼科图像解释,并促进医疗咨询期间的交互式通信。
258. Robust language-based mental health assessments in time and space through social media.
通过社交媒体在时间和空间上进行基于语言的强大心理健康评估。
PMID: 38698174 | DOI: 10.1038/s41746-024-01100-0 | 日期: 2024-05-02
摘要: In the most comprehensive population surveys, mental health is only broadly captured through questionnaires asking about “mentally unhealthy days” or feelings of “sadness.” Further, population mental health estimates are predominantly consolidated to yearly estimates at the state level, which is considerably coarser than the best estimates of physical health. Through the large-scale analysis of social media, robust estimation of population mental health is feasible at finer resolutions. In this study, we created a pipeline that used ~1 billion Tweets from 2 million geo-located users to estimate mental health levels and changes for depression and anxiety, the two leading mental health conditions. Language-based mental health assessments (LBMHAs) had substantially higher levels of reliability across space and time than available survey measures. This work presents reliable assessments of depression and anxiety down to the county-weeks level. Where surveys were available, we found moderate to strong associations between the LBMHAs and survey scores for multiple levels of granularity, from the national level down to weekly county measurements (fixed effects β = 0.34 to 1.82; p < 0.001). LBMHAs demonstrated temporal validity, showing clear absolute increases after a list of major societal events (+23% absolute change for depression assessments). LBMHAs showed improved external validity, evidenced by stronger correlations with measures of health and socioeconomic status than population surveys. This study shows that the careful aggregation of social media data yields spatiotemporal estimates of population mental health that exceed the granularity achievable by existing population surveys, and does so with generally greater reliability and validity.
中文摘要: 在最全面的人口调查中,心理健康状况只是通过询问“心理不健康的日子”或“悲伤”的感觉的问卷来广泛了解。此外,人口心理健康估计主要合并为州一级的年度估计,这比身体健康的最佳估计要粗糙得多。通过对社交媒体的大规模分析,可以以更精细的分辨率对人口心理健康状况进行稳健估计。在这项研究中,我们创建了一个管道,使用来自 200 万地理位置用户的约 10 亿条推文来估计心理健康水平以及抑郁和焦虑这两种主要心理健康状况的变化。基于语言的心理健康评估(LBMHA)在空间和时间上的可靠性远远高于现有的调查措施。这项工作提供了对县周级别的抑郁和焦虑的可靠评估。在有调查的情况下,我们发现 LBMHA 与多个粒度级别的调查得分之间存在中等到强的关联,从国家层面到每周县测量(固定效应 β = 0.34 到 1.82;p < 0.001)。 LBMHA 表现出时间有效性,在一系列重大社会事件后显示出明显的绝对增加(抑郁症评估的绝对变化+23%)。 LBMHA 显示出更高的外部效度,这通过与健康和社会经济状况指标的相关性比人口调查更强来证明。这项研究表明,社交媒体数据的仔细汇总可以对人口心理健康状况进行时空估计,超出了现有人口调查所能达到的粒度,并且通常具有更高的可靠性和有效性。
259. Constructing personalized characterizations of structural brain aberrations in patients with dementia using explainable artificial intelligence.
使用可解释的人工智能构建痴呆症患者大脑结构畸变的个性化特征。
PMID: 38698139 | DOI: 10.1038/s41746-024-01123-7 | 日期: 2024-05-02
摘要: Deep learning approaches for clinical predictions based on magnetic resonance imaging data have shown great promise as a translational technology for diagnosis and prognosis in neurological disorders, but its clinical impact has been limited. This is partially attributed to the opaqueness of deep learning models, causing insufficient understanding of what underlies their decisions. To overcome this, we trained convolutional neural networks on structural brain scans to differentiate dementia patients from healthy controls, and applied layerwise relevance propagation to procure individual-level explanations of the model predictions. Through extensive validations we demonstrate that deviations recognized by the model corroborate existing knowledge of structural brain aberrations in dementia. By employing the explainable dementia classifier in a longitudinal dataset of patients with mild cognitive impairment, we show that the spatially rich explanations complement the model prediction when forecasting transition to dementia and help characterize the biological manifestation of disease in the individual brain. Overall, our work exemplifies the clinical potential of explainable artificial intelligence in precision medicine.
中文摘要: 基于磁共振成像数据的临床预测深度学习方法作为神经系统疾病诊断和预后的转化技术显示出巨大的前景,但其临床影响有限。这部分归因于深度学习模型的不透明性,导致对其决策背后的原因缺乏足够的了解。为了克服这个问题,我们在结构性脑扫描上训练卷积神经网络,以区分痴呆症患者和健康对照组,并应用分层相关性传播来获得模型预测的个体层面的解释。通过广泛的验证,我们证明模型识别的偏差证实了痴呆症中大脑结构畸变的现有知识。通过在轻度认知障碍患者的纵向数据集中使用可解释的痴呆分类器,我们表明,在预测向痴呆的转变时,丰富的空间解释补充了模型预测,并有助于表征个体大脑中疾病的生物学表现。总的来说,我们的工作体现了可解释的人工智能在精准医学中的临床潜力。
260. A critical assessment of using ChatGPT for extracting structured data from clinical notes.
使用 ChatGPT 从临床记录中提取结构化数据的关键评估。
PMID: 38693429 | DOI: 10.1038/s41746-024-01079-8 | 日期: 2024-05-01
摘要: Existing natural language processing (NLP) methods to convert free-text clinical notes into structured data often require problem-specific annotations and model training. This study aims to evaluate ChatGPT’s capacity to extract information from free-text medical notes efficiently and comprehensively. We developed a large language model (LLM)-based workflow, utilizing systems engineering methodology and spiral “prompt engineering” process, leveraging OpenAI’s API for batch querying ChatGPT. We evaluated the effectiveness of this method using a dataset of more than 1000 lung cancer pathology reports and a dataset of 191 pediatric osteosarcoma pathology reports, comparing the ChatGPT-3.5 (gpt-3.5-turbo-16k) outputs with expert-curated structured data. ChatGPT-3.5 demonstrated the ability to extract pathological classifications with an overall accuracy of 89%, in lung cancer dataset, outperforming the performance of two traditional NLP methods. The performance is influenced by the design of the instructive prompt. Our case analysis shows that most misclassifications were due to the lack of highly specialized pathology terminology, and erroneous interpretation of TNM staging rules. Reproducibility shows the relatively stable performance of ChatGPT-3.5 over time. In pediatric osteosarcoma dataset, ChatGPT-3.5 accurately classified both grades and margin status with accuracy of 98.6% and 100% respectively. Our study shows the feasibility of using ChatGPT to process large volumes of clinical notes for structured information extraction without requiring extensive task-specific human annotation and model training. The results underscore the potential role of LLMs in transforming unstructured healthcare data into structured formats, thereby supporting research and aiding clinical decision-making.
中文摘要: 现有的自然语言处理 (NLP) 方法将自由文本临床笔记转换为结构化数据,通常需要针对特定问题的注释和模型训练。本研究旨在评估 ChatGPT 高效、全面地从自由文本医疗笔记中提取信息的能力。我们开发了基于大型语言模型(LLM)的工作流程,利用系统工程方法和螺旋式“提示工程”流程,利用 OpenAI 的 API 批量查询 ChatGPT。我们使用 1000 多份肺癌病理报告的数据集和 191 份儿科骨肉瘤病理报告的数据集评估了该方法的有效性,将 ChatGPT-3.5 (gpt-3.5-turbo-16k) 输出与专家策划的结构化数据进行比较。 ChatGPT-3.5 在肺癌数据集中展
更多推荐



所有评论(0)