Capability
Masked Language Model Token Prediction With Bidirectional Context
20 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
fill-mask model by undefined. 6,06,75,227 downloads.
Unique: Bidirectional transformer architecture (unlike GPT's unidirectional design) enables context-aware predictions by attending to both preceding and following tokens simultaneously; trained on 110M parameters making it lightweight enough for edge deployment while maintaining strong performance on GLUE benchmark tasks
vs others: Smaller and faster than BERT-large (110M vs 340M params) with minimal accuracy trade-off, and more widely adopted than RoBERTa for fill-mask tasks due to earlier release and extensive fine-tuning examples in the community