In 2020, rural households experiencing poverty in Sindh, Pakistan’s second-most populous state, were hit with twin shocks: the onset of the Covid-19 pandemic in March and one of the wettest monsoon seasons since 1961. The Social Protection Strategic Unit of Pakistan’s Sindh Province, charged with providing social assistance in this context, wanted to identify households in need of aid absent traditional survey collection methods. The Bristol University research team proposed an approach using ensemble learning models that could reliably predict household poverty in very small geographic areas (1km2 cells).
This project developed a geographical targeting system that government partners can use to accurately target households experiencing poverty quickly at scale. The research team structured their classification problem by creating a binary variable using a wealth threshold defined by the government to label households as “chronically poor” and “not chronically poor” based on previous household wealth surveys collected 2016-2019. They then overlaid a 1km2 grid in Sindh Province and labeled each 1km2 grid with the wealth status of the median household using this binary classification.
Researchers then developed a set of algorithms tasked with predicting the median household wealth of each of these grids. Using satellite imagery, nighttime satellite imagery, and accessibility data, researchers trained three different convolutional neural network models: (1) ResNet-50, (2) ResNet-50V2, and (3) ResNet-101. Each model predicted a label for each 1km2 grid; in cases where model predictions did not coincide, the ensemble assigned the majority label.
To appraise the performance of the prediction from this ensemble of models, researchers used three different validation approaches. First, the research team compared predictions for random holdout test samples. Second, they used a spatial cross-validation approach: whole districts were omitted from the training process and then used for out-of-sample testing. Both of these approaches are commonly used to validate machine learning models. Researchers further innovated by implementing a novel validation approach. They generated predictions for Ghotki, a district without previous household wealth surveys. This original survey conducted in 2022 sampled 7194 households from 174 1km2 grid cells; researchers compared the model predictions with this newly collected “ground truth” data. Across all three validation exercises, the ensemble model researchers developed shows an improvement upon random assignment.
This research shows the promise of using similar geographic targeting systems to improve the efficacy of social assistance programs, particularly in contexts where it may be difficult or dangerous to conduct survey data collection to inform allocation decisions. It significantly improves upon existing geographic targeting models, particularly in rural areas, by predicting poverty in grid areas ten times smaller than earlier models and having lower exclusion errors.
Copyright 2024. All Rights Reserved
Design & Dev by Wonderland Collective