Skip to content

Evidence Aggregation: Extracting Key Findings from Academic Papers

Google Gemini

Policy Context

Over the past two decades, the body of evidence from academic impact evaluations has grown dramatically. For policymakers and donors committed to evidence-based decision-making, this means an ever-expanding pool of studies to draw on. The issue these decision-makers face is not a lack of evidence; rather, it is accessing that evidence and identifying the most effective interventions to improve welfare. This has sharpened interest in interventions that scale effectively and travel well across contexts. Aggregating findings from multiple studies into a unified database makes it easier to surface these interventions and draw meaningful comparisons across the literature. While meta-analyses remain the gold standard for evidence synthesis, many donors are shifting toward “living” evidence reviews—an approach that allows new findings to be continuously integrated as they emerge.

However, the existing infrastructure to produce evidence synthesis is labor intensive. On average, it takes a trained human coder 5–6 hours to extract data from a single study because key information is often scattered or locked in unstructured PDFs. Extracting data from hundreds of papers—each with different structures and reporting formats—requires immense time and labor. By using Large Language Models (LLMs), it’s possible to make it cheaper, faster, and potentially more accurate to extract this information. This project originated to test methods using LLMs for evidence aggregation, benchmarking the LLM extractions against human-coded datasets to continually ensure accuracy.

Study Design

CEGA’s Berkeley Initiative for Transparency in the Social Sciences (BITSS) and the Global Poverty Research Lab (GPRL) at Northwestern University are working together to test approaches to automate the extraction of key information from academic papers. We are leveraging a specialized metadata schema, co-developed with partners like the World Bank and J-PAL, to identify the information needed in academic papers for extracting treatment effects. By testing both fully automated and semi-automated approaches, the team is continually checking to see which workflows offer the best data with high accuracy and low cost.

Ultimately, BITSS and GPRL aim to build a publicly available dataset of treatment effects from global development papers, branded “DEVIDENCE” for Development Evidence. DEVIDENCE will serve as a resource for decision-makers interested in evidence for reducing poverty.

Results and Policy Lessons

Results forthcoming. The tool produced by this collaboration aims to enable the construction of large, open-access datasets of treatment effect data from social science papers. By using automation to speed up and reduce the cost of producing meta-analyses, DEVIDENCE will enable the global development community and other research disciplines to synthesize evidence, helping donors, researchers, and policymakers identify the most effective interventions based on academic research.

Areas of work
Transparency