{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"wildchat","slug":"wildchat","name":"WildChat","type":"dataset","url":"https://huggingface.co/datasets/allenai/WildChat","page_url":"https://unfragile.ai/wildchat","categories":["model-training"],"tags":[],"pricing":{"model":"free","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"wildchat__cap_0","uri":"capability://data.processing.analysis.real.world.conversation.dataset.collection.and.curation","name":"real-world conversation dataset collection and curation","description":"Aggregates over 1 million authentic user conversations with ChatGPT and GPT-4 captured through a research chatbot interface, preserving full conversation threads with metadata including timestamps, user demographics (country, browser type), and conversation-level toxicity annotations. The dataset captures genuine, unfiltered user intents across diverse domains without synthetic generation or prompt engineering, enabling analysis of actual AI usage patterns in production environments.","intents":["I need authentic conversation data to understand how real users interact with LLMs across different use cases","I want to study genuine user needs and failure modes that aren't captured in synthetic benchmarks","I need demographic-stratified conversation data to analyze AI usage patterns across geographies and user populations","I want to train models on real-world conversation distributions rather than curated or synthetic data"],"best_for":["ML researchers studying LLM behavior and user interaction patterns","teams building instruction-tuned models requiring diverse, authentic training data","researchers analyzing geographic and demographic variations in AI usage","safety researchers studying real-world toxicity, jailbreaks, and edge cases"],"limitations":["Dataset is English-dominant with limited multilingual coverage despite claims of multilingual conversations","Toxicity labels are coarse-grained (binary or limited categories) rather than fine-grained harm taxonomy","No explicit consent from original ChatGPT/GPT-4 users — raises privacy and licensing questions for derivative use","Conversation metadata is limited to country and browser; lacks temporal distribution analysis or user segmentation by expertise level","No conversation quality scores or user satisfaction ratings — cannot distinguish high-value from low-value interactions"],"requires":["Hugging Face account for dataset access","Python 3.7+ with datasets library (huggingface_hub)","Disk space: ~50-100GB for full dataset depending on format","Understanding of conversation JSON schema and metadata structure"],"input_types":["conversation JSON objects with nested message arrays","metadata fields: user_id, country, browser, timestamp, conversation_id"],"output_types":["structured conversation records with turn-level text and metadata","toxicity labels (conversation-level)","demographic stratification by country and browser"],"categories":["data-processing-analysis","model-training-datasets"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__cap_1","uri":"capability://data.processing.analysis.demographic.stratified.conversation.analysis.and.filtering","name":"demographic-stratified conversation analysis and filtering","description":"Enables filtering and analysis of conversations by user demographics (country, browser type) and conversation-level metadata, allowing researchers to slice the dataset by geographic region, device type, or other user attributes. The dataset structure preserves demographic fields as queryable attributes, supporting cohort analysis, geographic bias detection, and population-specific model evaluation without requiring external demographic inference.","intents":["I want to analyze how users in different countries interact with AI differently","I need to identify geographic or device-based biases in conversation patterns","I want to train region-specific or device-optimized models using stratified subsets","I need to study how browser type or device constraints affect user behavior with LLMs"],"best_for":["researchers studying geographic variation in AI usage and user needs","teams building localized or region-specific AI products","fairness researchers analyzing demographic disparities in AI interactions","product teams understanding device-specific usage patterns"],"limitations":["Demographic data is limited to country and browser type — no age, education, expertise level, or socioeconomic indicators","Country-level granularity masks within-country variation and urban/rural differences","Browser type is a weak proxy for device type and user technical sophistication","No temporal stratification — cannot analyze how usage patterns evolved over time","Imbalanced demographic distribution — some countries/browsers likely overrepresented relative to global population"],"requires":["Python 3.7+ with pandas for filtering and aggregation","Familiarity with dataset schema and demographic field names","Statistical tools for cohort analysis (scipy, statsmodels)"],"input_types":["conversation records with country and browser metadata fields","filter criteria: country codes, browser names, conversation IDs"],"output_types":["filtered conversation subsets by demographic cohort","aggregated statistics: conversation count, average length, toxicity rate by country/browser","stratified samples for balanced model training"],"categories":["data-processing-analysis","search-retrieval"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__cap_2","uri":"capability://safety.moderation.toxicity.annotation.and.content.safety.labeling","name":"toxicity annotation and content safety labeling","description":"Provides conversation-level toxicity labels assigned through automated or human annotation, enabling researchers to identify and filter harmful content, study safety patterns, and train content moderation models. Labels are attached at the conversation level (not per-message), allowing downstream filtering of unsafe conversations or stratified analysis of toxicity distribution across user demographics and conversation types.","intents":["I need to filter out toxic conversations from training data to reduce harmful outputs in fine-tuned models","I want to study the prevalence and characteristics of toxic user interactions with LLMs","I need labeled data to train or evaluate content moderation classifiers","I want to analyze how toxicity patterns vary across geographic regions or user demographics"],"best_for":["safety researchers studying real-world toxicity in LLM interactions","teams training content moderation or safety classifiers","model builders filtering training data to reduce harmful outputs","researchers analyzing geographic or demographic variation in toxic content"],"limitations":["Toxicity labels are conversation-level only — cannot identify which specific messages or turns contain harmful content","Label granularity unknown — likely binary (toxic/non-toxic) rather than multi-class harm taxonomy (hate speech, violence, sexual content, etc.)","No inter-annotator agreement scores or label confidence estimates — unclear label quality and reliability","Annotation methodology not documented — unclear if labels are automated (rule-based, classifier-based) or human-annotated","May not capture subtle harms, context-dependent toxicity, or cultural variation in what constitutes harmful content"],"requires":["Python 3.7+ with pandas for label filtering and analysis","Understanding of toxicity label schema and encoding","Statistical tools for distribution analysis and fairness metrics"],"input_types":["conversation records with toxicity label field","filter criteria: toxicity threshold, label values"],"output_types":["filtered conversation subsets (toxic/non-toxic)","toxicity distribution statistics by country, browser, conversation length","labeled datasets for training content moderation models"],"categories":["safety-moderation","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__cap_3","uri":"capability://data.processing.analysis.multilingual.conversation.corpus.extraction.and.analysis","name":"multilingual conversation corpus extraction and analysis","description":"Provides access to non-English conversations within the dataset, enabling analysis of how users in different languages interact with English-trained LLMs and supporting training of multilingual or cross-lingual models. Conversations are preserved in original language with metadata indicating language or country of origin, allowing language-specific filtering and comparative analysis across linguistic communities.","intents":["I want to understand how non-English speakers interact with English-trained LLMs","I need multilingual conversation data to train or evaluate cross-lingual models","I want to study language-specific usage patterns and user needs","I need to analyze how well LLMs handle code-switching or multilingual conversations"],"best_for":["researchers building multilingual or cross-lingual LLMs","teams studying non-English user needs and LLM behavior","researchers analyzing language-specific biases or performance gaps","product teams localizing AI products for non-English markets"],"limitations":["Multilingual coverage is limited and imbalanced — dataset is English-dominant with sparse non-English conversations","Language identification not explicit — requires automatic language detection or country-based inference","No language-level metadata or language pair information for code-switching analysis","Limited non-Latin script languages — likely skewed toward European languages","No translation or parallel data — cannot directly compare same conversation across languages"],"requires":["Python 3.7+ with language detection library (langdetect, textblob)","Understanding of dataset language distribution and country-to-language mapping","Multilingual NLP tools for analysis (spaCy, transformers for multiple languages)"],"input_types":["conversation records in multiple languages","country metadata for language inference","conversation text in original language"],"output_types":["filtered conversation subsets by language or language pair","language distribution statistics","language-specific conversation characteristics (length, topics, toxicity)"],"categories":["data-processing-analysis","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__cap_4","uri":"capability://data.processing.analysis.conversation.metadata.extraction.and.temporal.analysis","name":"conversation metadata extraction and temporal analysis","description":"Provides structured metadata for each conversation including timestamps, conversation IDs, user IDs, and conversation length, enabling temporal analysis of usage patterns, trend detection, and time-series studies of how user needs and LLM interactions evolved. Metadata is queryable and filterable, supporting cohort analysis by time period and correlation analysis between temporal patterns and conversation characteristics.","intents":["I want to analyze how user needs and conversation patterns changed over time","I need to identify trending topics or use cases in LLM interactions","I want to study how conversation length and complexity evolved as users became more familiar with LLMs","I need temporal stratification for time-aware model evaluation and training"],"best_for":["researchers studying temporal trends in LLM usage and user behavior","teams analyzing how user needs evolved as LLMs became more capable","product teams understanding seasonal or temporal patterns in AI usage","researchers building time-aware or adaptive LLM systems"],"limitations":["Temporal granularity limited to conversation-level timestamps — no turn-level or message-level timing","No explicit time period documentation — unclear if dataset spans days, weeks, months, or years","No conversation duration or wall-clock time — cannot analyze how long users spend per conversation","Temporal distribution likely skewed toward recent conversations — no uniform sampling across time","No explicit topic or intent labels — requires manual annotation or topic modeling for trend analysis"],"requires":["Python 3.7+ with pandas and datetime libraries","Statistical tools for time-series analysis (statsmodels, scipy)","Understanding of timestamp format and timezone handling"],"input_types":["conversation records with timestamp and conversation_id fields","time range filters: start date, end date","grouping criteria: day, week, month, user_id"],"output_types":["time-series statistics: conversation count, average length, toxicity rate by time period","temporal cohorts for stratified analysis","trend detection results and anomaly flags"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__cap_5","uri":"capability://data.processing.analysis.domain.and.use.case.diversity.sampling.and.stratification","name":"domain and use-case diversity sampling and stratification","description":"Provides conversations spanning diverse user intents and domains (coding help, creative writing, sensitive topics, general Q&A, etc.) captured from real users without prompt engineering, enabling researchers to sample representative conversations across use cases and train models on realistic domain distributions. The dataset's scale and authenticity allow stratified sampling by inferred domain or use case without requiring explicit domain labels.","intents":["I want training data that covers diverse real-world use cases, not just curated or synthetic examples","I need to understand how users approach different types of tasks with LLMs","I want to evaluate model performance across diverse domains and use cases","I need to identify underrepresented use cases or user needs in LLM training data"],"best_for":["teams training general-purpose LLMs requiring diverse domain coverage","researchers studying how LLM behavior varies across use cases","product teams understanding real-world user needs and use cases","researchers analyzing domain-specific biases or performance gaps"],"limitations":["No explicit domain or use-case labels — requires manual annotation or topic modeling for stratification","Domain distribution likely reflects ChatGPT user base biases — overrepresented technical/coding use cases, underrepresented specialized domains","No use-case difficulty or complexity scores — cannot distinguish simple vs complex tasks within domains","No success/failure labels — cannot identify which use cases LLMs handle well vs poorly","No user expertise or intent clarity — cannot distinguish expert vs novice users or clear vs vague requests"],"requires":["Python 3.7+ with topic modeling libraries (gensim, sklearn) or manual annotation","NLP tools for domain inference (transformers, zero-shot classification)","Understanding of conversation content and implicit domain signals"],"input_types":["conversation records with full message text","optional: domain labels or use-case categories for stratification"],"output_types":["stratified conversation samples by inferred domain","domain distribution statistics and coverage analysis","domain-specific conversation characteristics (length, complexity, success rate)"],"categories":["data-processing-analysis","search-retrieval"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__cap_6","uri":"capability://data.processing.analysis.conversation.metadata.extraction.and.statistical.summarization","name":"conversation metadata extraction and statistical summarization","description":"The dataset includes structured metadata for each conversation (user demographics, browser/device info, conversation length, timestamps, toxicity labels) that can be extracted and aggregated for statistical analysis. Researchers can compute summary statistics (e.g., average conversation length by country, toxicity prevalence by domain) without processing full conversation text, enabling efficient exploratory analysis and dataset characterization. Metadata is stored in queryable fields, supporting both individual record lookup and bulk aggregation.","intents":["Understand overall dataset composition and statistical properties","Identify patterns in conversation length, user engagement, or request distribution","Characterize user demographics and geographic distribution","Compare statistical properties across subsets (e.g., by country, domain, toxicity level)"],"best_for":["Researchers conducting exploratory data analysis and dataset characterization","Teams assessing dataset quality and coverage for specific use cases","Organizations analyzing user engagement and conversation patterns","Researchers investigating whether statistical properties vary by demographic group"],"limitations":["Metadata completeness and accuracy not documented — some fields may be missing or inaccurate","Statistical summaries may mask important outliers or long-tail patterns","Metadata does not capture qualitative aspects of conversations (e.g., user satisfaction, task completion)","Temporal metadata (timestamps) may not be granular enough for time-series analysis","Aggregation across large datasets may require significant computational resources"],"requires":["Data analysis tools (pandas, polars, SQL, etc.) for metadata extraction and aggregation","Statistical knowledge for appropriate summary statistics and comparative analysis","Understanding of potential biases in metadata collection and inference","Visualization tools for exploratory analysis"],"input_types":["Conversation records with metadata fields","Aggregation criteria (grouping by country, domain, toxicity level, etc.)"],"output_types":["Summary statistics tables (mean, median, distribution by metadata field)","Comparative analysis across demographic groups or domains","Metadata distribution visualizations","Dataset characterization reports"],"categories":["data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__cap_7","uri":"capability://data.processing.analysis.instruction.following.and.user.intent.distribution.analysis","name":"instruction-following and user intent distribution analysis","description":"The dataset captures authentic user requests and model responses, enabling analysis of instruction-following patterns, user intent distribution, and how well models address diverse user needs. Researchers can analyze which types of instructions users provide, how models interpret and respond to them, and where misalignment or misunderstanding occurs. This supports studying instruction-following quality, identifying common user frustrations, and understanding the diversity of real-world use cases beyond typical benchmarks.","intents":["Analyze how well models follow diverse user instructions in production","Identify common user intents and request patterns","Study where models misunderstand or misalign with user expectations","Create instruction-following evaluation sets that reflect real user needs"],"best_for":["Researchers studying instruction-following and alignment in production systems","Teams analyzing user satisfaction and model performance on real requests","Organizations identifying common failure modes and user frustrations","Researchers investigating whether instruction-following quality varies by user demographic"],"limitations":["No explicit user satisfaction or success metrics — requires inference from conversation content","Intent labels not provided — requires manual annotation or inference","Instruction complexity and diversity may not be uniformly distributed","Model responses reflect ChatGPT/GPT-4 behavior — may not generalize to other models","No explicit feedback on whether users were satisfied with responses"],"requires":["Intent classification system or taxonomy for categorizing user requests","Text analysis tools for extracting user intent and instruction patterns","Understanding of instruction-following evaluation methodologies","Potentially human annotators for validating intent classification"],"input_types":["Conversation records with user requests and model responses","Intent filter criteria or instruction type categories"],"output_types":["Intent distribution statistics","Instruction-following success/failure examples","Intent-stratified conversation subsets","User frustration or misalignment patterns"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__cap_8","uri":"capability://data.processing.analysis.model.behavior.and.response.quality.comparative.analysis","name":"model behavior and response quality comparative analysis","description":"The dataset includes conversations with both ChatGPT and GPT-4, enabling direct comparison of model behavior, response quality, and user satisfaction across model versions. Researchers can analyze how model improvements manifest in real-world usage, identify domains where newer models perform better, and study whether user satisfaction or request patterns differ by model. This supports understanding model evolution, identifying model-specific failure modes, and studying how users adapt to model capabilities.","intents":["Compare response quality and user satisfaction between ChatGPT and GPT-4","Identify domains or request types where newer models show improvement","Study how users adapt their requests based on model capabilities","Analyze model-specific failure modes and user frustrations"],"best_for":["Researchers studying model evolution and improvement across versions","Teams analyzing user experience and satisfaction across model versions","Organizations identifying model-specific performance gaps or strengths","Researchers investigating whether users adjust behavior based on model capabilities"],"limitations":["Model version information may not be explicitly labeled — requires inference or documentation review","No explicit user satisfaction metrics — requires inference from conversation content","Conversation distribution between models unknown — may be unbalanced","Temporal confounds — conversations with different models may be from different time periods","User behavior may differ between models due to interface or marketing differences, not just model quality"],"requires":["Ability to identify and filter conversations by model version","Comparative analysis tools and statistical methods for model comparison","Understanding of potential confounds in model comparison (temporal, user selection, etc.)","Domain expertise for interpreting model-specific performance differences"],"input_types":["Conversation records with model version labels","Model comparison criteria (by domain, request type, etc.)"],"output_types":["Model-stratified conversation subsets","Comparative quality metrics by model","Domain-specific model performance comparison","Model-specific failure mode examples"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"wildchat__headline","uri":"capability://model.training.real.user.conversation.dataset.for.ai.training","name":"real user conversation dataset for ai training","description":"A comprehensive dataset of over 1 million real user conversations with ChatGPT and GPT-4, valuable for training AI models to understand diverse user interactions and needs.","intents":["best dataset for AI model training","real user conversation data for AI","dataset for understanding AI usage patterns","multilingual conversation dataset for AI","AI training data with toxicity labels"],"best_for":["AI researchers","developers training conversational models"],"limitations":["may not cover all languages","limited to user interactions with ChatGPT and GPT-4"],"requires":[],"input_types":[],"output_types":[],"categories":["model-training"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":56,"verified":false,"data_access_risk":"high","permissions":["Hugging Face account for dataset access","Python 3.7+ with datasets library (huggingface_hub)","Disk space: ~50-100GB for full dataset depending on format","Understanding of conversation JSON schema and metadata structure","Python 3.7+ with pandas for filtering and aggregation","Familiarity with dataset schema and demographic field names","Statistical tools for cohort analysis (scipy, statsmodels)","Python 3.7+ with pandas for label filtering and analysis","Understanding of toxicity label schema and encoding","Statistical tools for distribution analysis and fairness metrics"],"failure_modes":["Dataset is English-dominant with limited multilingual coverage despite claims of multilingual conversations","Toxicity labels are coarse-grained (binary or limited categories) rather than fine-grained harm taxonomy","No explicit consent from original ChatGPT/GPT-4 users — raises privacy and licensing questions for derivative use","Conversation metadata is limited to country and browser; lacks temporal distribution analysis or user segmentation by expertise level","No conversation quality scores or user satisfaction ratings — cannot distinguish high-value from low-value interactions","Demographic data is limited to country and browser type — no age, education, expertise level, or socioeconomic indicators","Country-level granularity masks within-country variation and urban/rural differences","Browser type is a weak proxy for device type and user technical sophistication","No temporal stratification — cannot analyze how usage patterns evolved over time","Imbalanced demographic distribution — some countries/browsers likely overrepresented relative to global population","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.7,"quality":0.8500000000000001,"ecosystem":0.3,"match_graph":0.25,"freshness":0.75,"weights":{"adoption":0.3,"quality":0.25,"ecosystem":0.1,"match_graph":0.3,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-05-24T12:16:34.803Z","last_scraped_at":null,"last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=wildchat","compare_url":"https://unfragile.ai/compare?artifact=wildchat"}},"signature":"P4kIvtAj8Y96u8beWD6b7SefK6YMVDMKhAeVLADMmntqfesJpGgPxgyZ826hKPRvmdI6d57xzUxM65UfVcyoDA==","signedAt":"2026-06-22T22:01:53.167Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/wildchat","artifact":"https://unfragile.ai/wildchat","verify":"https://unfragile.ai/api/v1/verify?slug=wildchat","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}