Luo, X., Rechardt, A., Sun, G. et al., University College London (UCL)
An international team led by researchers from University College London (UCL) has discovered that large language models (LLMs)—a form of artificial intelligence that analyzes textual data—can predict the results of proposed neuroscience studies more accurately than human experts. Published in Nature Human Behaviour, the study demonstrates that LLMs trained on extensive text datasets can identify patterns within scientific literature, enabling them to forecast experimental outcomes with superhuman precision.
AI’s Untapped Potential in Research
Lead author Dr. Ken Luo from UCL’s Department of Psychology & Language Sciences noted that while much of the recent focus has been on LLMs’ ability to retrieve and summarize existing knowledge, their potential to synthesize information and predict future scientific results has been less explored. “Scientific progress often depends on trial and error, which is both time-consuming and resource-intensive,” said Dr. Luo. “Even the most skilled researchers might overlook critical insights from the literature. Our study investigates whether LLMs can detect patterns across vast scientific texts to forecast experimental outcomes.”
Developing BrainBench: A Testing Ground
To evaluate this potential, the research team developed a tool called BrainBench. This tool comprises numerous pairs of neuroscience study abstracts. In each pair, one abstract is from an actual study detailing the research background, methods, and results. The other abstract has identical background and methods but features a plausible yet incorrect outcome, crafted by experts in the relevant neuroscience domain.
The team tested 15 different general-purpose LLMs alongside 171 human neuroscience experts who had passed a screening test confirming their expertise. Participants were tasked with identifying the real abstract with the actual study results in each pair. Remarkably, all the LLMs outperformed the human experts, achieving an average accuracy of 81% compared to the humans’ 63%. Even when considering only the human experts with the highest self-reported expertise in specific neuroscience domains, their accuracy was still lower at 66%.
Further analysis revealed that the LLMs were more likely to be correct when they were more confident in their decisions. This finding suggests the potential for future collaboration between human experts and well-calibrated AI models.
Specialized AI Shows Even Greater Promise
Building on these results, the researchers adapted an existing open-source LLM called Mistral by training it specifically on neuroscience literature. The specialized model, named BrainGPT, achieved an even higher accuracy of 86% in predicting study results, outperforming the general-purpose version of Mistral, which had an 83% accuracy rate.
Senior author Professor Bradley Love of UCL’s Department of Psychology & Language Sciences commented on the implications: “In light of our results, we suspect it won’t be long before scientists are using AI tools to design the most effective experiment for their question. While our study focused on neuroscience, our approach was universal and should successfully apply across all of science.”
Dr. Luo added, “Building on our results, we are developing AI tools to assist researchers. We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design.”
The study involved collaborators from multiple institutions worldwide, including the University of Cambridge, University of Oxford, the Max Planck Institute for Neurobiology of Behavior in Germany, Bilkent University in Turkey, and other institutions across the UK, USA, Switzerland, Russia, Germany, Belgium, Denmark, Canada, Spain, and Australia.
0 Comments