TruthfulQA
BenchmarkFreeTruthfulness evaluation: can models answer factually?
Capabilities1 decomposed
factuality evaluation through misconception testing
Medium confidenceTruthfulQA evaluates the factual accuracy of model responses by presenting a set of 817 questions designed to challenge common misconceptions. Each question is crafted to require a truthful answer that contradicts widely held false beliefs, allowing for a clear assessment of a model's ability to discern truth from falsehood. This benchmark employs a systematic approach to categorize responses, identifying models that 'hallucinate' or provide incorrect answers despite sounding confident.
TruthfulQA's unique approach lies in its focus on questions that directly contradict common misconceptions, providing a targeted evaluation of model truthfulness rather than general accuracy.
More focused on evaluating truthfulness compared to general benchmarks like GLUE, which do not specifically address factual accuracy.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with TruthfulQA, ranked by overlap. Discovered automatically through the match graph.
SimpleQA
OpenAI's factuality benchmark for hallucination detection.
TruthfulQA
817 adversarial questions measuring model truthfulness vs misconceptions.
TrustLLM
8-dimension trustworthiness benchmark for LLMs.
Perplexity AI
AI powered search tools.
Wordtune
AI sentence rewriter for clarity and tone improvement.
Perplexity: Sonar Reasoning Pro
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thought (CoT). Designed for...
Best For
- ✓AI researchers developing models focused on factual accuracy
- ✓developers evaluating the truthfulness of conversational agents
Known Limitations
- ⚠Limited to 817 specific questions, which may not cover all areas of knowledge
- ⚠Does not provide real-time feedback on model performance
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
TruthfulQA contains 817 questions where the correct answer contradicts common misconceptions (e.g., 'Which planet is closest to the sun?' — actually Mercury, not Venus by proximity). Tests whether models answer truthfully or repeat internet falsehoods. Separates models that are honest from those that 'hallucinate' to sound confident.
Categories
Alternatives to TruthfulQA
Are you the builder of TruthfulQA?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →