The Microdata Library in 2024: Expanding Access to Data While Strengthening Privacy Protections | UNHCR Blog
In 2024, UNHCR worked with Nitin Kohli of CEGA to test the application of differential privacy, a state-of-the-art approach that enables the secure release of full-size synthetic datasets. These datasets are generated by adding ‘noise’ to the data, ensuring the confidentiality of the records is maintained, while also ensuring that they maintain many of the same properties as the original dataset and allow derivation of the same results and insights. The work was challenging due to the complex relational structure across multiple tables and the presence of mixed data in UNHCR registration records. However, using the open-source OpenDP library, it was possible to generate full-size synthetic data with strong privacy properties, as well as richer and more accurate data compared to previously used statistical disclosure methods.
This year will mark a milestone as differentially private synthetic registration datasets will be made available for the first time on the Microdata Library. These comprehensive datasets will provide new opportunities for researchers, NGOs, and policymakers to gain deeper insights into refugee demographics, protection needs, skills, and assistance received—ultimately supporting more effective responses to forced displacement.