Capability
Batched Token Generation With Continuous Batching Scheduler
13 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “efficient batch inference with dynamic batching”
text-generation model by undefined. 71,06,872 downloads.
Unique: Inherits standard transformer batching from PyTorch/transformers library, with no custom optimization — relies on framework-level CUDA kernel fusion and memory management rather than model-specific batching logic
vs others: Simpler than specialized inference engines (vLLM, TGI) but slower; no custom kernel optimization but compatible with standard PyTorch tooling and profilers