Replication Data for: Multi-label Prediction for Political Text-as-Data

Berliner, D.ORCID logo, Erlich, A., Dantas, S., Bagozzi, B. & Palmer-Rubin, B. (2021). Replication Data for: Multi-label Prediction for Political Text-as-Data. [Dataset]. Harvard Dataverse. https://doi.org/10.7910/dvn/sovpa4
Copy

Political scientists increasingly use supervised machine learning to code multiple relevant labels from a single set of texts. The current "best practice'' of individually applying supervised machine learning to each label ignores information on inter-label association(s), and is likely to under-perform as a result. We introduce multi-label prediction as a solution to this problem. After reviewing the multi-label prediction framework, we apply it to code multiple features of (i) access to information requests made to the Mexican government and (ii) country-year human rights reports. We find that multi-label prediction outperforms standard supervised learning approaches, even in instances where the correlations among one's multiple labels are low. This repository replicates the figures and tables in the article and appendix. More information can be found in the "README.md" file. (2021-02-12)

Available at: 10.7910/dvn/sovpa4

Access level: Open

Licence: CC0 1.0


Export as

EndNote BibTeX Reference Manager Refer Atom Dublin Core JSON Multiline CSV
Export

Downloads