Skip to content

Enhancing UNHCR Microdata Access and Privacy with Differential Privacy

Google Gemini

Policy Context

Humanitarian organizations face a persistent tension between making critical data publicly available and protecting the privacy of vulnerable populations. UNHCR, which maintains some of the world’s most comprehensive microdata on forcibly displaced people, has relied on statistical disclosure control methods as its standard for anonymizing microdata since 2020. But these techniques carry significant trade-offs where anonymization can sharply reduce dataset utility. This limitation has prevented the UNHCR from publicly sharing its registration data, limiting the opportunity for other organizations to make informed decisions on forcibly displaced people.

The challenge illustrates a broader problem across the humanitarian and development sectors: traditional anonymization methods often force an unacceptable choice between data privacy and data usefulness, creating exactly the kind of gap that privacy-enhancing technologies (PETs) are designed to close.

Study Design

In 2024, Nitin Kohli of CEGA’s Data Privacy Lab partnered with the UNHCR to test whether differential privacy (a type of PET) could offer a viable alternative to traditional anonymization for its microdata. Differential privacy generates full-size synthetic datasets with carefully calibrated statistical noise to preserve the analytical properties of the original data while providing formal guarantees that no individual record can be re-identified. The team used the open-source OpenDP library to build this pipeline, navigating significant technical challenges posed by UNHCR’s data, including a complex relational structure spanning multiple tables and a mix of data types within registration records.

Results and Policy Lessons

The project demonstrated that differential privacy can produce synthetic datasets that are both richer and more analytically accurate than those generated through conventional statistical disclosure control — while offering mathematically provable privacy protections. For UNHCR, this opens a path to publicly releasing full-size registration data for the first time, substantially expanding what researchers and policymakers can learn to improve humanitarian response to forced displacement.

More broadly, the work offers a proof of concept for other organizations holding sensitive administrative or survey data: PETs like differential privacy can address the longstanding trade-off between disclosure risk and data utility, making it possible to share detailed data without compromising the people it describes.

Check out our webinar about this work