More Data, More (Privacy) Problems
CEGA launches the Data Privacy Lab, plus a new effort to enhance privacy protections and data sharing in LMICs to improve humanitarian aid and development
A close up of a cell phone with a green light on it
Tsd_studio via Unsplash
Brad Sagara (Research Project Manager, Data Privacy Lab, CEGA) and Nitin Kohli (Principal Investigator, Data Privacy Lab, CEGA) introduce CEGA’s Data Privacy Lab and the launch of the Privacy Playbook, a new project dedicated to leveraging privacy-enhancing technologies (PETs) to bolster data-driven innovation in low- and middle-income countries, transforming the landscape of social welfare, financial inclusion, and public service delivery.
A Privacy Mirage Amidst a Data Deluge
We live in a time of unprecedented data abundance fueled by a diverse ecosystem of data sources, including mobile phones in the hands of more than 70 percent of the global population, government statistics, and the constellation of thousands of satellites generating high-resolution images of Earth. Like any resource, data can be harnessed for social good (e.g. targeting humanitarian assistance or predicting agricultural yields for food security) or exploited without knowledge or consent for malicious use.
Even if data is shared for legitimate reasons, what data and how it is shared matter. Existing data protections can create the illusion of security despite actual risk to individual privacy. Within the privacy literature, researchers have shown the vulnerabilities of sharing data with commonly applied privacy protections, e.g. removing names or other obvious personally identifiable information. Back in 2002, the then governor of Massachusetts was identified by merging public medical and voter registration data. Even aggregate statistics are not safe: an internal study at the US Census Bureau found that using their 2010 publicly available aggregate statistics, they were able to reconstruct microdata for 46% of the US population (or 142 million people) and match names to 52 million people using census linked with auxiliary data.
Mitigating this risk does not mean stopping data-sharing altogether. Instead, it’s more important than ever to maximize the utility of data for the public good through more sharing. This urgency is particularly acute in low- and-middle-income countries (LMICs), where dramatic cuts in humanitarian and development funding have necessitated decision-makers to do more with less to meet humanitarian needs and development goals.
Privacy enhancing technologies, or PETs, are a suite of tools that can help mitigate this risk and facilitate the analysis and sharing of sensitive data without compromising individual privacy. These technologies leverage advances in mathematics and computer science to develop robust privacy guarantees across a range of applications. For an overview of PETs and example use cases, refer to our blog series on How PETs Make Data Work for All.
Privacy Laws Proliferate, PET Adoption Lags
The challenge then becomes how to share data to improve the lives of the most vulnerable in a secure way that conforms with modern understanding of privacy risk, as well as local and international privacy legislation and norms. There are now 172 countries and self-governing jurisdictions and territories around the world that have adopted comprehensive data privacy laws, covering approximately 85 percent of the global population. PETs are increasingly being recognized in their ability to sufficiently protect individual privacy in accordance with privacy legislation. In the EU, Article 29 from the Data Protection Working Party 216 finds that differential privacy (a type of PET) can sufficiently address all anonymization concerns when applied appropriately. And in the US, Executive Order 14110 identifies PETs as an appropriate “technical tool to protect privacy and combat the broader legal and societal risks…that result from improper collection and use of people’s data.”
Enter the Data Privacy Lab: Making Privacy Work(able)
No single tool can eliminate all privacy risks under all conditions; there are always tradeoffs to consider when deploying PETs and they must be tailored for their purpose. Those with access to data, particularly in LMICs, may have limited awareness of and/or technical capacity to effectively deploy PETs.
The Data Privacy Lab at CEGA is a group of interdisciplinary privacy researchers and engineers working to address these challenges in LMICs by bridging the gap between state of the art privacy research and real world needs. The lab accomplishes this by working with government and other partners to build a body of evidence and knowledge around a wide range of use cases and applications of PETs in LMICs addressing real world privacy challenges.
Last year, the UNHCR worked with Data Privacy Lab’s PI Nitin Kohli to responsibly share data on forcibly displaced populations in their microdata library. Leveraging CEGA’s deployment of PETs, the UNHCR can now share more than 150 datasets that were not previously shared due to privacy concerns. With support from the lab, UNHCR was able to integrate robust privacy protections that not only enabled them to share previously unreleased data, but also enabled them to improve the privacy protections while enhancing the accuracy of previously shared data by transitioning away from statistical disclosure controls to state of the art PETs.
With generous support from the Gates Foundation, the Data Privacy Lab is actively expanding this evidence base, and sharing insights, tools, and lessons learned in a “privacy playbook” to enable users to make informed decisions about which PETs will be most appropriate for their circumstances. This playbook will serve two primary audiences:
- Policymakers will learn common privacy challenges and the range of potential solutions available to them and their associated strengths, limitations, and tradeoffs
- Technical staff will learn how to appropriately adapt and implement PETs to their specific privacy challenge, leveraging clear guidance and open-source code
As the Data Privacy Lab implements these use cases through 2027, we will share our insights and lessons learned via webinars and newsletters. To stay up to date on these and other events, please subscribe to the Data Science for Development mailing list.
If you are interested in learning more or collaborating with the Data Privacy Lab, please contact: brad.sagara@berkeley.edu.