Capability

Multi Lora Adapter Composition And Switching

9 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

via “lora adapter loading and switching with dynamic model patching”

Fast LLM/VLM serving — RadixAttention, prefix caching, structured output, automatic parallelism.

Unique: Implements dynamic LoRA adapter switching within batches by maintaining an adapter registry and patching model layers per-request during forward passes. Merges adapters into base weights for inference efficiency rather than maintaining separate model copies.

vs others: Enables per-request adapter switching without model reloading, unlike naive approaches that require full model reloads. Reduces memory overhead compared to storing separate full models for each adapter.

Multi Lora Adapter Composition And Switching

Top Matches

Also Known As

Company