Data for The Repeated Adjustment of Measurement Protocols (RAMP) method for developing high-validity text classifiers

Goddard, A.ORCID logo (2025). Data for The Repeated Adjustment of Measurement Protocols (RAMP) method for developing high-validity text classifiers. [Dataset]. London School of Economics and Political Science. https://doi.org/10.17605/OSF.IO/PE4JY
Copy

This repository contain the data required to replicate the results of the article titled "The Repeated Adjustment of Measurement Protocols (RAMP) method for developing high-validity text classifiers". This data is used in the Python notebooks found at: https://github.com/alexiamhe93/RAMP_method/tree/main

There are four csv files in the zip folder.

train_test.csv The raw text data as well as manual coding for misunderstandings and the train/test split for both One-shot and RAMP groups.

RAMP_Stage1.csv Contains details of all the manual coding iterations over the inference loop for the RAMP group, including the final shared set used to calculate reported inter-rater reliability. Also includes the manual coding for the One-shot group.

RAMP_Stage2.csv Contains a record of classifier runs (predictive accuracy and adjustments to protocols) for RAMP.

RAMP_Stage3.csv Contains the final classifications of all classifiers on the RAMP test data.

results_OneShot.csv Contains the classification results of all classifiers on the One-shot test data.

The raw text data contained in train_test.csv contains 21,994 sentences coded for misunderstanding. All author names and sentences have been anonymized. This dataset is comprised of:

Reddit data: This data was downloaded through the Reddit API for the purposes of this study.

Twitter Customer Support (Thought Vector & Axelbrooke, 2017): This data was downloaded from: https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter (Copyright: CC BY-NC-SA 4.0).

Wikipedia Talk Pages (Danescu-Niculescu-Mizil et al., 2012): This data was downloaded using Cornell University's ConvoKit Python package (see: https://convokit.cornell.edu/documentation/wiki.html) (Copyright: CC BY 4.0)

References: Danescu-Niculescu-Mizil, C., Lee, L., Pang, B., & Kleinberg, J. (2012). Echoes of power: Language effects and power differences in social interaction. Proceedings of the 21st International Conference on World Wide Web, 699–708. https://doi.org/10.1145/2187836.2187931

Thought Vector, & Axelbrooke, S. (2017). Customer Support on Twitter (v10). https://kaggle.com/thoughtvector/customer-support-on-twitter

Available at: 10.17605/OSF.IO/PE4JY

Access level: Open

Licence: Creative Commons: Attribution 4.0


Export as

EndNote BibTeX Reference Manager Refer Atom Dublin Core JSON Multiline CSV
Export

Downloads