Replication Data for: Multi-label Prediction for Political Text-as-Data
Political scientists increasingly use supervised machine learning to code multiple relevant labels from a single set of texts. The current "best practice'' of individually applying supervised machine learning to each label ignores information on inter-label association(s), and is likely to under-perform as a result. We introduce multi-label prediction as a solution to this problem. After reviewing the multi-label prediction framework, we apply it to code multiple features of (i) access to information requests made to the Mexican government and (ii) country-year human rights reports. We find that multi-label prediction outperforms standard supervised learning approaches, even in instances where the correlations among one's multiple labels are low. This repository replicates the figures and tables in the article and appendix. More information can be found in the "README.md" file. (2021-02-12)
| Item Type | Dataset |
|---|---|
| Publisher | Harvard Dataverse |
| DOI | 10.7910/dvn/sovpa4 |
| Date made available | 1 April 2021 |
| Keywords | Computer and Information Science, social sciences |
| Resource language | Other |
| Departments | LSE |
Explore Further
-
Erlich, A., Dantas, S. G., Bagozzi, B. E., Berliner, D.
& Palmer-Rubin, B. (2022). Multi-label prediction for political text-as-data. Political Analysis, 30(4), 463 - 480. https://doi.org/10.1017/pan.2021.15 (Repository Output)