RealToxicityPromptsDataset43/100 via “multi-dimensional toxicity scoring for prompt-completion pairs”
100K prompts for evaluating toxic text generation.
Unique: Provides 8-dimensional toxicity scoring (not binary classification) with explicit separation of severe_toxicity, threat, insult, identity_attack, profanity, sexually_explicit, and flirtation as independent dimensions, enabling nuanced analysis of different harm types rather than aggregate toxicity only. Includes source document tracking via filename and character offsets for traceability.
vs others: More granular than binary toxicity datasets (e.g., Jigsaw Toxic Comments) by decomposing toxicity into 8 independent dimensions; more practical for model evaluation than human-annotated safety benchmarks because it provides pre-scored baselines for comparison without requiring manual annotation of model outputs.