Privacy Enhancing Technologies (PETs) are at the forefront of efforts to protect privacy and enable responsible data use. But what are they and how are they used? Dan Cassara, Project Manager for the Digital Credit Observatory (DCO), provides a case study of how PETs can be used in public health responses in the second part of our series on ‘How PETs Make Data Work for All.’ Read part one.
Privacy Enhancing Technologies (PETs) aim to preserve the insights of a dataset and the privacy of the individuals in the data. The first case study in the How PETs Make Data Work for All series showed how Multiparty Computation could enable banks to work together to enhance regulatory efforts without compromising sensitive customer data. In other cases, researchers may already have a central dataset, so the challenge isn’t in pooling the data but in sharing the insights that can be gleaned from it. PETs can help with that, too.
Mobility data from cell phones are collected in centralized datasets and can play an important role in improving humanitarian relief efforts and public health responses. This data, which tracks location over time, can help model the spread of epidemics, track displaced populations after natural disasters, detect government violence against citizens, or provide insight into population movements.
While collecting and analyzing mobility data can lead to large social benefits when done ethically, it also carries risks for individuals, their privacy, and their autonomy. For example, the New York Times reported how cell phone location data could be used to identify political protesters from past demonstrations and learn about their current daily activities. The places people visit — trips to a place of worship, a bar, or a doctor’s office — reveal a person’s life and carry different social meanings. In the wrong hands, mobility data could be used to target religious minorities, track an ex-partner, or blackmail political officials.
Privacy enhancing technologies like differential privacy can play a pivotal part in the ethical use of data for social good. Consider a hypothetical pandemic scenario in which decision makers want to gauge compliance with stay-at-home orders: can differential privacy address location and mobility privacy concerns in this context? And conversely, how does the context of a pandemic response inform the design of systems that rely on location and mobility data?
Call Detail Records (CDR) — metadata like the date, time, duration, and location of phone calls — can be used to compute statistics on mobility. Of particular interest for our example is population flow between distinct regions in a given time period, an important factor in determining the spread of disease. Analyzing sensitive data demands that we first think through privacy risks to the data used and the statistics generated. The WHO provides 17 guidelines for ethical public health surveillance; the five below are related to the collection and sharing of public health data:
[3]: Surveillance data should be collected only for a legitimate public health purpose.
[4]: Countries have an obligation to ensure that the data collected are of sufficient quality, including being timely, reliable and valid, to achieve public health goals.
[13]: Results of surveillance must be effectively communicated to relevant target audiences.
[15]: During a public health emergency, it is imperative that all parties involved in surveillance share data in a timely fashion.
[17]: Personally identifiable surveillance data should not be shared with agencies that are likely to use them to take action against individuals or for uses unrelated to public health.
Best practices such as sharing data only when necessary and de-identifying data as early as possible can help protect individuals. These steps, combined with the public health need in our hypothetical, satisfy WHO guidelines 3, 13, and 15, but guideline 17 may not be met. A growing body of research shows that true anonymization with these approaches is nearly impossible, meaning these mobility statistics could leak the presence or absence of an individual. Enter differential privacy.
We can privatize statistics by adding random deviations or “noise” to them, ensuring no one can learn too much about any individual in a dataset. How much is known about an individual depends on the circumstances and the statistics we wish to publish. While adding noise to a dataset increases the level of privacy, it also decreases the accuracy of the data — this is known as the accuracy-privacy tradeoff.
Imagine we’d like to know more about compliance with stay-at-home orders related to COVID-19. We could produce a histogram showing the count of people who traveled certain distances, adding noise to make the counts differentially private. This provides insights on general compliance with travel restrictions, as well as the risks associated with long distance travelers who could contract or distribute the disease in many places. To better understand the importance of choosing the right statistics for the job, we can also contrast our histogram with other statistics. Though average distance traveled may seem like a more intuitive choice here, because the maximum distance an individual can travel is quite large, the average may be sensitive to their presence or absence and a lot of noise is needed to mask their data’s contribution. Adding more noise reduces the accuracy of our estimates and highlights the accuracy-privacy tradeoff. Although it may seem more invasive to produce an entire histogram and share more data points, because an individual cannot change counts nearly as much as they could an average, much less noise is needed to privatize the histogram. Thus, the histogram enables us to gain more privacy for each bit of noise we add, and to produce meaningful insights for policymakers without earning “too much” about any individual in the data, even if the statistics were leaked.
Big data and new technologies have created many opportunities to improve the distribution of aid and the delivery of public health interventions, but they come with many risks, including threats to individual privacy. Modern day mobility data can play a useful role in policymaking and evaluation, but presents ethical dilemmas. Thankfully, PETs and a well designed analysis allow for the production of useful insights to public health officials in an ethical manner that limits infringements on privacy.
Mobility Statistics for Public Health Response was originally published in CEGA on Medium, where people are continuing the conversation by highlighting and responding to this story.
Copyright 2024. All Rights Reserved
Design & Dev by Wonderland Collective