peft
RepositoryFreeParameter-Efficient Fine-Tuning (PEFT)
Capabilities12 decomposed
low-rank adapter injection with dynamic module wrapping
Medium confidenceInjects trainable low-rank decomposition matrices (LoRA) into transformer model layers by wrapping linear modules with a parallel adapter path that computes A @ B^T additions to activations. Uses a registry-based dispatch mechanism (src/peft/mapping.py) to identify target layers by name pattern, then replaces them with LoRALinear wrappers that maintain frozen base weights while training only the rank-r adapter matrices, achieving 0.1-2% parameter overhead per adapter.
Uses a unified PeftModel wrapper (src/peft/peft_model.py) that abstracts away the complexity of layer identification and replacement, supporting 25+ PEFT methods through a single configuration interface. The registry-based dispatch (src/peft/mapping.py) automatically maps method names to tuner implementations, enabling seamless switching between LoRA, AdaLoRA, QLoRA, and other methods without code changes.
More flexible than Hugging Face's native LoRA implementation because it supports dynamic adapter composition, multi-adapter stacking, and method-agnostic serialization, while maintaining full compatibility with quantized models (8-bit, 4-bit) through the same API.
dynamic rank allocation with gradient-based importance scoring
Medium confidenceAdaLoRA extends LoRA by maintaining per-layer importance scores that guide automatic rank allocation during training. The implementation computes Hadamard products of adapter gradients to estimate parameter importance, then dynamically increases ranks for high-importance layers and decreases ranks for low-importance ones, achieving 40-50% parameter reduction vs fixed-rank LoRA while maintaining task performance.
Implements gradient-based importance estimation (Hadamard product of gradients) to guide rank allocation, integrated into the standard PEFT training loop via the BaseTuner abstraction. Unlike static LoRA, AdaLoRA modifies adapter structure during training through the on_train_step_end() hook, enabling adaptive parameter allocation without requiring separate rank-search phases.
More principled than manual rank selection and faster than grid-search alternatives because it uses gradient information directly from the training process, while remaining compatible with all PEFT infrastructure (quantization, distributed training, multi-adapter composition).
adapter merging and unmerging with weight fusion
Medium confidenceProvides merge_adapter() and unmerge_adapter() methods that fuse adapter weights into base model weights or extract them back out. For LoRA, merging computes (W + alpha/r * A @ B^T) to create a single set of weights, reducing inference latency by eliminating the adapter computation path. Unmerging recovers the original base weights and adapter weights from the merged state, enabling reversible adapter composition. Implemented through method-specific merge logic in each tuner class.
Implements reversible adapter merging through method-specific merge logic that fuses adapter weights into base weights mathematically (e.g., LoRA: W' = W + alpha/r * A @ B^T), enabling both merged and unmerged states from the same checkpoint. The unmerge operation recovers original weights by subtracting the adapter contribution.
More flexible than permanent merging because unmerge() enables recovery of original weights and adapter separation, while merged models achieve inference latency parity with non-adapter baselines. Supports both merged and adapter-based deployment strategies from the same training run.
configuration validation and compatibility checking
Medium confidenceValidates PEFT configurations against model architecture and detects incompatibilities before training begins. The system checks that target_modules exist in the model, that adapter ranks are compatible with layer dimensions, and that method-specific constraints are satisfied. Implemented through PeftConfig validation methods and pre-training checks in get_peft_model() that raise informative errors for common misconfiguration patterns.
Implements configuration validation in PeftConfig subclasses and get_peft_model() that checks method-specific constraints (e.g., LoRA rank < layer dimension) before model wrapping, catching errors at configuration time rather than training time. Validation is method-aware, enabling checks specific to each PEFT approach.
More helpful than silent failures because it provides early error detection with informative messages, while remaining lightweight enough to not impact training startup. Method-specific validation catches issues that generic checks would miss.
quantization-aware adapter training with frozen base weights
Medium confidenceEnables fine-tuning of 4-bit and 8-bit quantized models by freezing the quantized base weights and training only adapter parameters, implemented through integration with bitsandbytes quantization library. The system detects quantized layers (Linear4bit, Linear8bit) and injects adapters in the forward pass without dequantizing base weights, reducing memory footprint by 75-90% compared to full-precision training while maintaining numerical stability through careful gradient flow management.
Integrates seamlessly with bitsandbytes quantization through the PeftModel wrapper, automatically detecting quantized layer types and routing adapter computations appropriately. The implementation preserves gradient flow through quantized weights without dequantization, achieved via careful handling of backward passes in the adapter injection layer.
More memory-efficient than QLoRA alternatives because PEFT's unified adapter interface works with any quantization backend, while QLoRA implementations are often tightly coupled to specific quantization libraries. Supports both 4-bit and 8-bit quantization with identical API.
multi-adapter composition and routing
Medium confidenceEnables loading and composing multiple adapters on a single base model through add_adapter(), set_adapter(), and delete_adapter() methods that manage an adapter registry. Supports sequential composition (stacking adapters), parallel composition (weighted averaging), and task-specific routing where different adapters activate based on input characteristics. Implemented via the PeftModel wrapper maintaining a dictionary of adapter states and switching between them without reloading the base model.
Implements a stateful adapter registry within PeftModel that tracks active adapters and their configurations, enabling runtime switching without model recompilation. The design separates adapter loading (from disk) from adapter activation (in forward pass), allowing multiple adapters to coexist in memory with minimal overhead.
More flexible than single-adapter approaches because it supports arbitrary composition patterns and dynamic routing, while maintaining the same inference latency as single adapters when only one is active. Enables multi-tenant serving that would otherwise require separate model instances.
prompt learning and soft prompt optimization
Medium confidenceImplements prefix tuning and prompt tuning methods that prepend learnable soft prompt tokens to input sequences, optimizing only the prompt embeddings while freezing all model weights. The implementation maintains a learnable embedding matrix that is concatenated to input embeddings before the first transformer layer, enabling task adaptation through prompt optimization rather than weight updates. Supports both prefix (prepended to all layers) and prompt (prepended to input only) variants.
Implements prompt learning as a first-class PEFT method through the same PeftModel abstraction as LoRA, enabling direct comparison and composition with other methods. The implementation uses virtual tokens (learnable embeddings) that are prepended to inputs, integrated into the forward pass through a minimal wrapper that doesn't require model architecture changes.
More parameter-efficient than LoRA for extreme constraints (<0.01% overhead) and enables frozen-model fine-tuning, but typically requires longer training. Unique advantage is interpretability potential through prompt analysis, though learned prompts remain largely opaque.
adapter serialization and checkpoint management
Medium confidenceProvides save_pretrained() and from_pretrained() methods that serialize only adapter weights and configurations to disk, enabling efficient checkpoint storage and loading. The system saves adapter parameters as .safetensors or .bin files alongside adapter_config.json containing method-specific hyperparameters, supporting both local filesystem and HuggingFace Hub uploads. Implemented through a unified serialization interface (src/peft/utils/save_and_load.py) that abstracts method-specific serialization logic.
Implements a unified serialization interface that works across all 25+ PEFT methods without method-specific code, achieved through the configuration system where each method's PeftConfig subclass handles its own serialization. The design separates adapter weights from base model weights, enabling ~100x smaller checkpoints than full fine-tuning.
More efficient than full-model checkpointing (50MB vs 14GB) and more portable than method-specific serialization because the same adapter can be loaded with different base model sizes/architectures (e.g., same LoRA adapter works on 7B and 70B models). Hub integration enables community sharing of adapters.
distributed training with adapter synchronization
Medium confidenceEnables distributed training across multiple GPUs/TPUs by synchronizing adapter gradients using standard PyTorch DistributedDataParallel (DDP) or DeepSpeed integration. The implementation treats adapters as regular parameters in the distributed training graph, with gradient accumulation and all-reduce operations handled by the distributed backend. Supports both data parallelism (same adapters across devices) and model parallelism (adapters sharded across devices) through integration with transformers' distributed training utilities.
Integrates with PyTorch's DistributedDataParallel and DeepSpeed through the standard transformers Trainer API, requiring no PEFT-specific distributed code. Adapters are treated as regular parameters in the distributed graph, enabling seamless use of existing distributed training infrastructure.
More straightforward than custom distributed implementations because it leverages standard PyTorch/DeepSpeed primitives, while maintaining full compatibility with all PEFT methods. Enables scaling from single-GPU to multi-node training without API changes.
vision model and diffusion model adapter support
Medium confidenceExtends PEFT methods (LoRA, prefix tuning, etc.) to vision transformers (ViT, DeiT) and diffusion models (Stable Diffusion, DDPM) by identifying and wrapping attention/linear layers in these architectures. The implementation uses the same adapter injection mechanism as language models but adapts layer identification patterns for vision-specific architectures. Supports fine-tuning image generation, classification, and segmentation tasks with minimal parameter overhead.
Applies the same PeftModel wrapper and adapter injection logic to vision architectures by adapting layer identification patterns, enabling code reuse across modalities. The implementation handles vision-specific challenges like attention head dimensions and timestep embeddings through method-specific configuration options.
More unified than vision-specific fine-tuning libraries because it uses the same PEFT API across language and vision models, enabling practitioners to apply learned patterns across domains. Supports diffusion model fine-tuning which most general-purpose libraries don't address.
custom peft method registration and extension
Medium confidenceProvides a plugin architecture for implementing new PEFT methods by extending BaseTuner and registering them in the method registry (src/peft/mapping.py). New methods define a configuration class (inheriting from PeftConfig), a tuner class (inheriting from BaseTuner), and register themselves via PEFT_TYPE_TO_CONFIG_MAPPING. The system automatically handles adapter lifecycle (initialization, forward pass injection, serialization) through the base class, enabling new methods to integrate with all PEFT infrastructure without reimplementation.
Implements a registry-based plugin system where new methods register themselves in PEFT_TYPE_TO_CONFIG_MAPPING and PEFT_TYPE_TO_TUNER_MAPPING, enabling automatic dispatch through get_peft_model(). The BaseTuner abstraction handles common functionality (parameter tracking, serialization, lifecycle management), reducing implementation burden for new methods.
More extensible than monolithic fine-tuning libraries because the plugin architecture enables new methods to integrate without modifying core code. Automatic inheritance of PEFT infrastructure (quantization support, distributed training, multi-adapter composition) means new methods work with all existing tooling.
layer-wise learning rate scheduling and gradient management
Medium confidenceEnables fine-grained control over adapter training through layer-wise learning rate schedules and gradient clipping strategies. The implementation integrates with PyTorch optimizers to apply different learning rates to different adapter layers, and supports gradient accumulation patterns specific to adapter training. Implemented through integration with transformers' Trainer API and custom callback hooks that modify optimizer parameter groups per training step.
Integrates layer-wise learning rate control through the transformers Trainer API using callback hooks that modify optimizer parameter groups, enabling discriminative learning rates without custom training loops. The implementation works with any PEFT method by operating on the adapter parameter groups.
More flexible than fixed learning rate approaches because it enables layer-wise tuning, while remaining compatible with standard PyTorch optimizers. Integrates with transformers Trainer, avoiding custom training loop implementation.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with peft, ranked by overlap. Discovered automatically through the match graph.
PEFT
Parameter-efficient fine-tuning — LoRA, QLoRA, adapter methods for LLMs on consumer GPUs.
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks (VL-Adapter)
* ⭐ 04/2022: [Winoground: Probing Vision and Language Models for Visio-Linguistic... (Winoground)](https://arxiv.org/abs/2204.03162)
exllamav2
Python AI package: exllamav2
llama.cpp
C/C++ LLM inference — GGUF quantization, GPU offloading, foundation for local AI tools.
QLoRA: Efficient Finetuning of Quantized LLMs (QLoRA)
* ⭐ 05/2023: [Voyager: An Open-Ended Embodied Agent with Large Language Models (Voyager)](https://arxiv.org/abs/2305.16291)
LitGPT
Lightning AI's LLM library — pretrain, fine-tune, deploy with clean PyTorch Lightning code.
Best For
- ✓ML engineers fine-tuning large pretrained models on consumer/enterprise GPUs
- ✓Teams deploying multiple task-specific models from a single base checkpoint
- ✓Researchers experimenting with adapter composition and multi-task learning
- ✓Practitioners without domain knowledge to set LoRA ranks manually
- ✓Resource-constrained deployments requiring minimal adapter footprint
- ✓Research into layer-wise importance in transformer models
- ✓Production inference where latency is critical and adapter overhead is unacceptable
- ✓Scenarios where model deployment requires single monolithic weights
Known Limitations
- ⚠LoRA rank selection requires manual tuning; no automated rank discovery (use AdaLoRA for dynamic ranks)
- ⚠Adapter inference adds ~5-10% latency overhead due to additional matrix multiplications
- ⚠Cannot be applied to embedding layers or non-linear modules without custom implementation
- ⚠Merging adapters into base weights is irreversible without saving original checkpoint
- ⚠Requires additional forward/backward passes to compute importance scores, adding ~15-20% training time overhead
- ⚠Rank reallocation happens at fixed intervals (configurable), not continuously, potentially missing optimal allocation
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Package Details
About
Parameter-Efficient Fine-Tuning (PEFT)
Categories
Alternatives to peft
Are you the builder of peft?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →