zero-shot machine-generated text detection via probability curvature analysis
Detects machine-generated text without requiring training data by analyzing the curvature of token probability distributions from a reference language model. The method computes the difference between log-probabilities assigned by the reference model to original text versus perturbed text (with randomly masked tokens replaced), measuring how sharply probability distributions change. This probability curvature signature distinguishes human-written text (which exhibits different distributional properties) from LLM-generated text without fine-tuning or labeled datasets.
Unique: Uses probability curvature (second-order statistical properties of token distributions) rather than supervised classifiers or fine-tuned models, enabling zero-shot detection by leveraging inherent distributional differences between human and machine text without labeled training data
vs alternatives: Eliminates the need for labeled training datasets and fine-tuning, making it immediately deployable across domains, whereas supervised detection methods (e.g., RoBERTa-based classifiers) require domain-specific labeled data and degrade when LLM architectures change
masked token perturbation for probability distribution sampling
Generates perturbed versions of input text by randomly masking tokens and replacing them with samples from the reference model's probability distribution. For each masked position, the method samples alternative tokens according to the model's predicted probabilities, creating multiple variants of the original text. This perturbation strategy allows the detector to measure how probability distributions shift when text is modified, providing the signal for curvature-based detection without requiring explicit training on synthetic data.
Unique: Applies masked token perturbation specifically to expose probability curvature differences rather than for data augmentation or paraphrasing, using the perturbation as a diagnostic tool to measure how sharply a model's probability landscape changes around the original text
vs alternatives: More computationally efficient than generating full paraphrases or using external paraphrase models, and directly targets the probability distribution properties that distinguish machine-generated text rather than relying on surface-level linguistic features
reference model-agnostic detection scoring with cross-model compatibility
Computes detection scores using any pre-trained language model as a reference, without requiring the reference model to be the same model that generated the suspect text. The method calculates probability curvature relative to the reference model's distribution, enabling detection even when the generating model is unknown or proprietary. This architecture allows deployment with readily available models (e.g., GPT-2, open-source LLMs) while detecting text from any LLM, including closed-source systems.
Unique: Decouples the reference model from the generating model, enabling detection without knowing or having access to the LLM that produced the text, whereas most supervised detection methods require training on outputs from specific target models
vs alternatives: Provides immediate detection capability for new LLMs without retraining, whereas supervised classifiers must be retrained for each new generating model or architecture change
probability curvature computation with statistical significance testing
Calculates a numerical score representing the curvature of token probability distributions by measuring the divergence between log-probabilities of original and perturbed text. The method computes statistics such as the mean and variance of probability differences across tokens, enabling statistical significance testing to distinguish genuine machine-generated text from natural variation in human writing. This statistical framework provides both a point estimate (curvature score) and confidence intervals for detection decisions.
Unique: Frames detection as a statistical hypothesis test on probability curvature rather than a binary classifier, providing principled uncertainty quantification and enabling adaptive thresholding based on text properties
vs alternatives: Offers interpretable, threshold-independent scores with statistical justification, whereas neural classifiers produce opaque confidence scores without principled uncertainty estimates