Randomized control trials (RCTs) are a powerful way to assess the impact of interventions and policies, and among the most prominent tools guiding evidence informed decision making. However, RCTs sometimes rely on sensitive data, and protection of this information is often tricky, creating privacy risks and raising ethical questions.
Data and code need to be available for replication to improve transparency in social sciences, an approach that CEGA’s Transparency initiatives have advocated for years.
Transparency and ethics thus might seem to be in conflict in such cases. Policy-oriented stakeholders, citizens that contribute their data to RCTs, and companies that provide key data to researchers, such as fintech providers, all have a vested interest in protecting privacy. However, they also need the research that uses these data to be transparent, credible, and trustworthy.
This project aims to provide a systematic assessment of the feasibility of using formally private methods, such as differential privacy (DP), for data publication and adjusted valid inference. The goal is to provide stronger privacy protections for those who contribute data to RCTs, while maintaining the high level of transparency that distinguishes RCTs. Consumer and citizen protection agencies, ethic review boards, and other regulators, should be interested in such methods, possibly facilitating approval of studies in the presence of strong privacy guarantees.
The project will contain two primary components, beginning by assessing the feasibility of stronger privacy protections for data collected in low- and middle-income countries (LMICs). Researchers will re-analyze 10-12 published studies that used data from LMICs, working with partners at Innovations for Poverty Action to first choose relevant studies and then identify the analysis method used, the variables of interest, and the data generating process. Afterwards, the team will choose and apply the most efficient differentially-private data protection method, re-run the analysis using DP-aware methods, and compare the protected inference to the original inference. Privacy-protected data is noisier, and it is expected that the effective power of these studies will be reduced, challenging the ability to produce rigorous and statistically reliable impact estimates once privacy protections are introduced. Because of this, researchers will also endeavor to recover the sample size that would have been needed to obtain the originally intended power.
Throughout this process, the researchers will leverage, where possible, existing methods and emerging toolkits (e.g., openDP, Tumult lab’s forthcoming system) that are well understood and accessible to other researchers. Using well-known and understood approaches is important because after completing their analysis, the team will produce a short guide and sample code for practitioners on how to plan sample sizes in the presence of privacy-protection, implement differential privacy for data publication, and adjust analysis in the presence of differential privacy.
This project is ongoing, results are forthcoming.
Copyright 2023. All Rights Reserved
Design & Dev by Wonderland Collective