Multi Adapter Composition And Routing

1

PEFTRepository55/100

via “multi-adapter composition and switching”

Parameter-efficient fine-tuning — LoRA, QLoRA, adapter methods for LLMs on consumer GPUs.

Unique: Implements a named adapter registry pattern where each adapter is stored independently with its own configuration and weights, allowing dynamic activation without model reloading. The PeftModel wrapper maintains a mapping of adapter names to tuner instances, enabling O(1) adapter switching by updating the active adapter reference.

vs others: More efficient than training separate models for each task because it shares the base model weights across tasks, reducing memory footprint by 90%+ compared to maintaining N independent models while enabling runtime task switching without model reloading.

2

peftFine-tune23/100

via “multi-adapter composition and routing”

Parameter-Efficient Fine-Tuning (PEFT)

Unique: Implements a stateful adapter registry within PeftModel that tracks active adapters and their configurations, enabling runtime switching without model recompilation. The design separates adapter loading (from disk) from adapter activation (in forward pass), allowing multiple adapters to coexist in memory with minimal overhead.

vs others: More flexible than single-adapter approaches because it supports arbitrary composition patterns and dynamic routing, while maintaining the same inference latency as single adapters when only one is active. Enables multi-tenant serving that would otherwise require separate model instances.

3

QLoRA: Efficient Finetuning of Quantized LLMs (QLoRA)Product22/100

via “adapter composition and inference with merged weight strategies”

* ⭐ 05/2023: [Voyager: An Open-Ended Embodied Agent with Large Language Models (Voyager)](https://arxiv.org/abs/2305.16291)

Unique: Provides systematic adapter composition strategies (sequential, weighted ensemble) with automatic precision handling when merging full-precision adapters into quantized base weights, enabling flexible multi-task model construction — prior LoRA work focused on single-adapter inference

vs others: Enables multi-task inference without maintaining separate models or adapter routing logic, and supports weighted ensemble composition that would otherwise require custom inference code or model ensembling infrastructure

Top Matches

Also Known As

Company