ChatGPT & AI – A New Era in Evidence-Based Medicine?

Systematic reviews are fundamental to evidence-based decision-making but are notoriously time-consuming to conduct. While the integration of AI, specifically Large Language Models like ChatGPT, has shown promise, it’s not a straightforward application. The intriguing experiment by Mahuli et al. illustrates the potential and limitations of ChatGPT in conducting systematic reviews. Simultaneously, advancements by Pitts in integrating GPT into their web application for systematic reviews showcase a different approach to using GPT in the process.

ChatGPT in Systematic Reviews: Potential and Limitations
As we’ve seen with Mahuli et al., ChatGPT has been able to conduct Risk of Bias (ROB) analysis and data extraction with considerable success1. But the way you may think ChatGPT can assist with systematic reviews may not align with reality. 

For one, the standard ChatGPT interface is restrictive for systematic review tasks. You cannot upload a PDF directly; instead, you have to copy and paste the text, running into problems with limited prompt length. Verification of the quality of GPT’s work for systematic reviews is also not a functionality that OpenAI has currently addressed1.

Integrating GPT into Pitts: A New Approach
Pitts has been working on an innovative approach by integrating GPT into their web application for systematic reviews, focusing on GPT-assisted data extraction. Unlike the standard ChatGPT tool, you can upload PDFs to the Pitts tool, and there is a system in place to verify the accuracy of the GPT data extractions.

How Does the Pitts Integration Work?
Upload Search Results: Either manually or directly from PubMed via the Pitts web interface.
Screening: Complete abstract and full-text screening, including PDF uploads.
Data Extraction: After configuring the review settings, GPT can be used for data extraction with a prediction box and configurable settings.

This can be tried directly at 

Cautions and Challenges
Pitts emphasizes that this integration is experimental. Trials have identified clear areas where GPT’s output was valid, but also numerous instances where the information was incorrect, requiring manual correction. Expertise in the subject matter remains essential, emphasizing the technology’s potential but also underscoring the importance of expert involvement. The priority is to further develop the tool to allow users to validate GPT’s usage for data extraction scientifically.

Both the approach by Mahuli et al. and the development by Pitts highlight the dynamic and evolving relationship between AI and systematic reviews1. ChatGPT’s standard interface may have limitations, but its potential, as shown through integration in specialized tools like the one developed by Pitts, is encouraging.

While it’s clear that AI can contribute significantly to the systematic review process, caution is needed. The technology is still in its infancy and requires substantial development and validation. Continued collaboration between researchers, developers, and experts will pave the way for AI’s broader applicability in generating evidence and potentially revolutionize the field of systematic reviews.

Mahuli, S., Rai, A., Mahuli, A. et al. Application ChatGPT in conducting systematic reviews and meta-analyses. Br Dent J 235, 90–92 (2023).