Identifying Fraud Patterns and Financial Exclusion with Privacy-Preserving Synthetic Data
Young woman in a local market using her mobile phone | Photo Credit: Adobe Stock
Context
According to the GSM Association (GSMA), there are over 548 million registered mobile money accounts in sub-Saharan Africa (SSA), which has helped spur the growth of digital financial services in the region. In Uganda, the Central Bank reports that while only 43% of adults have a bank account, 77% have active mobile money accounts. Unfortunately, mobile money systems have also become a conduit for data breaches, identity theft, unauthorized sharing of personal identifying information (PII), and fraud, costing consumers hundreds of millions of dollars annually.
Researchers could use mobile money transactions data to study challenges such as fraud or patterns of financial exclusion, but privacy concerns, regulation, and business interests often limit access to such data, constraining progress. Synthetic datasets, generated by computers rather than real-world events, have the potential to fill this gap and contribute novel techniques to detect and reduce fraud and privacy-related risks. Currently, most existing publicly available synthetic datasets are for general financial and bank transactions and are not based on the unique properties of mobile money transactions in SSA, limiting their usefulness.
Because of this, researchers are now aiming to (1) investigate and develop privacy-preserving synthetic data generation techniques for mobile money transactions; (2) develop and demonstrate the utility of privacy-preserving synthetic datasets in the detection of fraud patterns; and (3) study the utility of privacy-preserving synthetic datasets in characterizing financial inclusion and exclusion.
Study Design
This project builds on recent work from the research team investigating synthetic dataset generation for mobile money transactions. They will extend that work by integrating privacy-preserving technologies into synthetic data generation methods, which can then be analyzed for patterns of fraud and/or exclusion.
To ensure insights from synthetic data can translate to real-world datasets, the synthetic data should have similar statistical properties – for example, the distribution of transaction volumes or sizes – as a real sample dataset obtained from an implementing partner organization. Researchers will use a model incorporating a primary agent and clients who can deposit, withdraw, and pay for goods and services or transfer money to other clients. Other entities such as merchants, banks, and fraudulent agents are designed to carry transactions in their capacities as observed from the real mobile money ecosystem. Using existing methods and additional data from the implementing partner, the research team will then analyze synthetic datasets from the platform for patterns of financial exclusion and fraud.
Results & Policy Lessons
Exploring the development and application of privacy-preserving synthetic datasets in Sub-Saharan Africa highlighted that fraudulent activities—like split deposits, refund fraud, and PIN fraud, are prevalent and were effectively simulated in the mobile money transaction simulator (MoMTSim). The synthetic datasets produced proved to closely resemble real transaction data, enabling the identification of fraud patterns and improving detection techniques.
The study also demonstrated machine learning models like XGBoost and Decision Trees to show resilience to data degradation caused by privacy measures. The datasets used were useful in analyzing patterns of financial exclusion, offering insights into regional transaction activity and fraud hotspots.
By creating large-scale synthetic datasets and developing tools for financial analytics, this research has significant implications for enhancing data privacy while enabling the study of mobile money systems. This approach could significantly increase access to financial data for a myriad of independent modeling efforts that can collectively advance our cumulative knowledge about digital financial services. This scalable simulation platform positions this study as a foundation for future advancements in secure and inclusive financial services.