Capability

Masked Language Model Token Prediction With Bidirectional Context

20 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

fill-mask model by undefined. 6,06,75,227 downloads.

Unique: Bidirectional transformer architecture (unlike GPT's unidirectional design) enables context-aware predictions by attending to both preceding and following tokens simultaneously; trained on 110M parameters making it lightweight enough for edge deployment while maintaining strong performance on GLUE benchmark tasks

vs others: Smaller and faster than BERT-large (110M vs 340M params) with minimal accuracy trade-off, and more widely adopted than RoBERTa for fill-mask tasks due to earlier release and extensive fine-tuning examples in the community

Masked Language Model Token Prediction With Bidirectional Context

Top Matches

Also Known As

Company