Confidence in artificial intelligence (AI) chatbots continues to grow, with more and more people using them, including patients seeking medical information. Underlying large language models enable chatbots to mimic human language, sometimes so well that it becomes difficult to discern whether the information is accurate. Language learning models have demonstrated the ability to pass the US Medical Licensing Examination, encode clinical knowledge, and provide medical diagnoses. As seen in JAMA Oncology, Shan Chen and colleagues conducted research to evaluate the accuracy of AI chatbots for cancer treatment recommendations by comparing AI responses with National Comprehensive Cancer Network (NCCN) guidelines for treatment of breast, prostate, and lung cancers.
The authors used 4 prompt templates and then benchmarked the answers against the 2021 NCCN guidelines, since ChatGPT’s knowledge cutoff was September 2021. Recommendations varied on the basis of how the question was posed. A panel of board-certified oncologists assessed concordance with the NCCN guidelines and found that all outputs with a recommendation included at least 1 NCCN-concordant treatment, but one-third also recommended nonconcordant treatments. Concordance with the guidelines varied by cancer type and by extent of disease. Responses that were not part of any recommended treatment occurred in 13% of outputs, primarily for localized treatment of advanced disease, targeted therapy, or immunotherapy.
Disagreement among the expert panel occurred in nearly 40% of scores, highlighting the challenges of interpreting descriptive output. The chatbot was likely to mix incorrect recommendations among correct ones, making it difficult even for experts to detect misinformation.
High level
The findings of this research help to highlight areas of concern, and future research is needed to improve use of AI chatbots for cancer treatment information. Chatbot developers should make it clear that they are not medical devices and should be held to high standards to ensure these new technologies do not cause harm. The frequency of guideline updates and new treatment approvals means there is a continuous gap between the knowledge cutoff and the latest scientific data, presenting both a challenge and an opportunity for improvement in these technologies.
Ground level
While patient use of chatbot technologies in self-education is likely to continue to increase, it is important for patients and clinicians to be aware of their limitations, especially the fact that chatbots may not include the most recent scientific data. Given the results of this research, the authors suggest that clinicians counsel patients on the potential for inaccuracies when using AI chatbots for cancer treatment information and discourage relying on them for specific treatment recommendations.