OpenAI’s large language model is reshaping the world of medicine. A new study just found that the AI tool helped researchers identify new diagnoses in rare disease cases that had gone unsolved for years.
A study published in the New England Journal of Medicine’s AI-focused publication, NEJM AI, found that OpenAI’s o3 Deep Research model helped diagnose 18 children at Boston Children’s Hospital. Previously, these children and their families were left in the dark as doctors struggled to find the causes of their rare illnesses. These findings suggest that the end of the search for answers may not be permanent.
The researchers called the findings “a total game changer,” noting that the software increased new diagnoses by five per cent. “Which doesn’t sound like a lot,” Catherine Brownstein, PhD, one of the study’s lead researchers told NBC News, “but considering how many times these had already been analysed, that’s a huge number, and each one means an answer for a family.”
Brownstein is the scientific director of the genetic investigations arm of the Manton Center for Orphan Disease Research at Boston Children’s Hospital. The organisation conducted the research last year by running the genomes of 376 patients who did not have diagnoses through the o3 system. At the time, that was the most powerful system available.
How AI can diagnose
The researchers fed the o3 model with clinicians’ notes about each case, a description of the patient’s symptoms, as well as a list of the genes that might be responsible for their symptoms.
The research team reviewed the model’s outputs using standard criteria from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology before confirming any diagnosis.
The 376 cases were divided into four different disease areas. The researchers found new diagnoses for 10 of the patients with rare neurodevelopmental diseases, four with neuromuscular disorders, two with early childhood psychosis illnesses, and two that died suddenly without further specification.
AI’s role and limits
Brownstein told Inc. that the technology is a way to clear out the longstanding bottleneck. In other words, it is not meant as a replacement for humans doing the work.
Genetic diagnoses involve weighing phenotypes, inheritance patterns, variant data, researcher experience, and a body of scientific literature that’s constantly shifting. She said AI can move through that information without getting tired or making frequent mistakes. This can free up geneticists to spend their time on the strongest leads.
“AI will never have the human research experience that is an essential part of the process,” she told Inc. “Our jobs are safe.”
Brownstein said that the biggest risk is a conflation of fluency with validity. AI can provide a coherent explanation that can still be wrong, known as hallucinations. She said that her team is aware and watching for hallucinations, as well as “misinterpreted evidence, automation bias, uncalibrated confidence, model drift, and biases arising from the populations represented (and populations not represented) in genomic databases and the literature.”
“We have to be careful that we don’t start relying too much on models, and make sure we treat them only as copilots,” she said. Every lead, she added, still has to go through expert review, confirmatory clinical testing, and genetic counselling before it reaches a family.
Regarding patient privacy, Brownstein said the team used a data-minimisation approach rather than just feeding the raw medical records into a consumer chatbot. “Each case was converted into a de-identified packet containing standardised phenotype terms, limited relevant clinical information, basic metadata, and a filtered variant table with the information needed for the analysis,” she said. No protected health information was used.
The future of diagnosing
Brownstein said her goal is to democratise access to best-in-class genetic analysis by turning this workflow into an easy-to-use genetics AI copilot, one designed to cut down the time and cost of reanalysis rather than replace the people doing it.
“My hope is to see prospective, multi-centre pilots over the next one to two years. My aspiration, not a prediction, is to have a publicly accessible reanalysis tool in two to three years,” she said. – Inc./TNS
