Data for The Repeated Adjustment of Measurement Protocols (RAMP) method for developing high-validity text classifiers
This repository contain the data required to replicate the results of the article titled "The Repeated Adjustment of Measurement Protocols (RAMP) method for developing high-validity text classifiers". This data is used in the Python notebooks found at: https://github.com/alexiamhe93/RAMP_method/tree/main
There are four csv files in the zip folder.
train_test.csv The raw text data as well as manual coding for misunderstandings and the train/test split for both One-shot and RAMP groups.
RAMP_Stage1.csv Contains details of all the manual coding iterations over the inference loop for the RAMP group, including the final shared set used to calculate reported inter-rater reliability. Also includes the manual coding for the One-shot group.
RAMP_Stage2.csv Contains a record of classifier runs (predictive accuracy and adjustments to protocols) for RAMP.
RAMP_Stage3.csv Contains the final classifications of all classifiers on the RAMP test data.
results_OneShot.csv Contains the classification results of all classifiers on the One-shot test data.
The raw text data contained in train_test.csv contains 21,994 sentences coded for misunderstanding. All author names and sentences have been anonymized. This dataset is comprised of:
Reddit data: This data was downloaded through the Reddit API for the purposes of this study.
Twitter Customer Support (Thought Vector & Axelbrooke, 2017): This data was downloaded from: https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter (Copyright: CC BY-NC-SA 4.0).
Wikipedia Talk Pages (Danescu-Niculescu-Mizil et al., 2012): This data was downloaded using Cornell University's ConvoKit Python package (see: https://convokit.cornell.edu/documentation/wiki.html) (Copyright: CC BY 4.0)
References: Danescu-Niculescu-Mizil, C., Lee, L., Pang, B., & Kleinberg, J. (2012). Echoes of power: Language effects and power differences in social interaction. Proceedings of the 21st International Conference on World Wide Web, 699–708. https://doi.org/10.1145/2187836.2187931
Thought Vector, & Axelbrooke, S. (2017). Customer Support on Twitter (v10). https://kaggle.com/thoughtvector/customer-support-on-twitter
| Item Type | Dataset |
|---|---|
| Publisher | London School of Economics and Political Science |
| DOI | 10.17605/OSF.IO/PE4JY |
| Date made available | 16 June 2025 |
| Keywords | Social and Behavioral Sciences |
| Resource language | Other |
| Departments | LSE > Academic Departments > Psychological and Behavioural Science |
Explore Further
-
Goddard, A.
& Gillespie, A.
(2025). The repeated adjustment of measurement protocols method for developing high-validity text classifiers. Psychological Methods, https://doi.org/10.1037/met0000787 (Repository Output)
- https://github.com/alexiamhe93/RAMP_method