Capability
Local Rest Api Inference With Streaming Output
20 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “openai-compatible rest api server with streaming support”
High-throughput LLM serving engine — PagedAttention, continuous batching, OpenAI-compatible API.
Unique: Implements OpenAI API contract via FastAPI with SSE streaming, enabling zero-code migration from OpenAI to vLLM while maintaining client compatibility
vs others: Provides drop-in replacement for OpenAI API with 10-24x lower latency and cost vs OpenAI, while maintaining identical client code