This paper addresses the challenge of missing crop yield data in large-scale agricultural surveys, where crop-cutting, the most accurate method for yield measurement, is often limited due to cost constraints. Multiple imputation techniques, supported by machine learning models are used to predict missing yield data. This method is validated using survey data from Mali, which includes both crop-cut and self-reported yield information. The analysis covers several crops, providing insights into the importance of different predictors, including farmer-reported yields and geo-spatial variables, and the conditions under which the approach is valid. The findings show that machine learning-based imputations can provide accurate yield estimates, especially for crops with low intercropping rates and higher commercialization. However, survey-to-survey imputations are less accurate than within-survey imputations, suggesting limitations in extrapolating data across different survey rounds. The study contributes valuable insights into improving cost-efficiency in agricultural surveys and the potential of imputation methods.
Authors
- Citation
- “ Djima, Ismaël Yacoubou ; Tiberti, Marco ; Kilic, Talip . 2024 . Yielding Insights: Machine Learning-Driven Imputations to Filling Agricultural Data Gaps . Policy Research Working Paper; 10964 . © Washington, DC: World Bank . http://hdl.handle.net/10986/42371 License: CC BY 3.0 IGO . ”
- Collection(s)
- Policy Research Working Papers
- DOI
- http://dx.doi.org/10.1596/1813-9450-10964
- Identifier externaldocumentum
- 34417194
- Identifier internaldocumentum
- 34417194
- Pages
- 52
- Published in
- United States of America
- RelationisPartofseries
- Policy Research Working Paper; 10964
- Report
- WPS10964
- Rights
- CC BY 3.0 IGO
- Rights Holder
- World Bank
- Rights URI
- https://creativecommons.org/licenses/by/3.0/igo/
- UNIT
- Strategy & Collaboratives (DECSC)
- URI
- https://hdl.handle.net/10986/42371
- date disclosure
- 2024-11-04
- region geographical
- World
Files
Table of Contents
- Introduction 4
- Data situations 5
- Partially missing: Within-survey imputation 6
- Completely missing: Across surveys imputation 6
- Data 7
- Summary statistics and sample counts 7
- Measurement errors in SR yields 8
- Empirical approach 9
- Validation approach 9
- Imputation framework 10
- Results 13
- Modeling lessons 13
- Top predictors 13
- Relative importance of SR yields and GPS extracted variables 14
- Prediction accuracy by crops 15
- Training set size 15
- Imputation results 16
- within-survey imputation 16
- survey-to-survey imputation 17
- Conclusion 17
- Machine learning modeling 38
- Additional Tables and Figures 42
- List of Covariates 51