ChatGPT & AI – A New Era in Evidence-Based Medicine?

Systematic reviews are fundamental to evidence-based decision-making but notoriously time-consuming. While integrating AI, specifically large language models like ChatGPT, has shown promise, it’s not a straightforward application. An intriguing experiment by Mahuli et al. illustrates ChatGPT’s potential and limitations in conducting systematic reviews (SLR). Simultaneously, our advancements in integrating GPT into our web application for systematic reviews showcase how to incorporate large language models into the SLR workflow.

ChatGPT in Systematic Reviews: Potential and Limitations
As we have seen with Mahuli et al., ChatGPT has been able to conduct Risk of Bias (ROB) analysis and data extraction with considerable success1. However, using the ChatGPT might not be the interface for large language models in the systematic review process. For one, the standard ChatGPT interface is restrictive for systematic review tasks. You cannot upload a PDF directly; you must copy and paste the text, running into problems with limited prompt length. Verifying the quality of GPT’s work for systematic reviews at scale is not a functionality of the chatGPT interface1.

Integrating GPT into Pitts: A New Approach
We has been working on an innovative approach by integrating GPT into our web application for systematic reviews, focusing on GPT-assisted data extraction. Unlike the standard ChatGPT tool, you can upload PDFs to the Pitts tool, and we have a system in place to verify the accuracy of the GPT data extractions.

How Does the Pitts Integration Work?

  1. Upload Search Results: Either manually with a .ris file or directly from PubMed via the Pitts web interface.
  2. Screening: Complete abstract and full-text screening, including PDF uploads.
  3. Data Extraction: After configuring the review settings, you can use GPT with a prediction box and configurable settings for data extraction.

You can try this directly at

Cautions and Challenges
We emphasize that this integration is experimental. Research has identified clear areas where GPT’s output was valid and instances where the information was incorrect, requiring manual correction. Expertise in conducting systematic reviews remains essential. Our priority is to develop the tool further to allow users to validate GPT’s scientific usage for data extraction.

Mahuli et al.’s approach and our development highlight the dynamic and evolving relationship between AI and systematic reviews1. ChatGPT’s standard interface may have limitations, but there is potential for using large language models in the SLR process, as we try to show by integrating GPT into the SLR workflow.

While it’s clear that AI can contribute significantly to the systematic review process, caution is needed. The technology is still in its infancy and requires substantial development and validation. Continued collaboration between researchers, developers, and experts will pave the way for AI’s broader applicability in generating evidence and potentially revolutionize the field of systematic reviews.

1. Mahuli, S., Rai, A., Mahuli, A. et al. Application ChatGPT in conducting systematic reviews and meta-analyses. Br Dent J 235, 90–92 (2023).