Hugging Face published a structured benchmark of PEFT techniques on June 18, 2026. LoRA is the default choice, but it isn't the best. It accounts for 98.4% of fine-tuned model cards on the Hub despite weaker performance on key benchmarks. This gap costs architects VRAM, accuracy, and iteration cycles.

Hugging Face's library implements more than 40 PEFT techniques. Of 20,834 Hub model cards using exactly one PEFT method, 20,509 use LoRA. In image generation, 7,111 of 7,485 PEFT-tagged checkpoints (95.0%) are LoRAs, with LoCon at 363 and DoRA at 11. GitHub code searches show 71.3% targeting LoRA versus LoHa at 3.7% and AdaLoRA at 3.5%. This dominance stems partly from compounding network effects, not performance evidence.

LoRA dominates Hugging Face model cards at 98.4% of single-PEFT implementations.
FIG. 02 LoRA dominates Hugging Face model cards at 98.4% of single-PEFT implementations. — Hugging Face Hub Analysis, June 2026

Paper results across PEFT methods resist comparison — benchmarks differ, code is unavailable, and results rarely reproduce. Hugging Face's benchmark strength lies in its methodology: it runs multiple methods under identical conditions on chain-of-thought math reasoning. A 2025 study showed LoRA can match supposedly superior techniques through learning-rate tuning alone. Hugging Face's data backs that up, but adds crucial detail on which techniques beat LoRA in which scenarios.

DoRA (Weight-Decomposed Low-Rank Adaptation) decomposes weight updates into magnitude and direction. On commonsense reasoning, DoRA gains +3.7 over baseline LoRA on Llama 7B and +2.9 on Llama 2 7B. Critical requirement: PEFT >= 0.10. Older versions merge the magnitude component incorrectly and silently degrade quality. Multi-adapter serving works through vLLM 0.6+ with --enable-lora, but the version requirement is non-negotiable.

LoRA-FA is the right choice for teams GPU-constrained on 70B models. It freezes the A matrix after random initialization and trains only B, eliminating activation storage for A's backward pass. That saves 15–25% training VRAM at the same rank while accuracy drops only 0.5–1.5% below LoRA. VeRA is leaner but costs 4–6% accuracy on diverse benchmarks, making it useful for prototyping only.

PEFT methods trade VRAM efficiency against task-specific accuracy; LoRA-FA gains 15–25% efficiency at 0.5–1.5% accuracy cost.
FIG. 03 PEFT methods trade VRAM efficiency against task-specific accuracy; LoRA-FA gains 15–25% efficiency at 0.5–1.5% accuracy cost. — ai|expert synthesis of PEFT benchmarks, 2026

MoRA uses square matrices instead of rectangular low-rank matrices, trading rank budget for higher effective rank within a subspace. It excels on tasks demanding dense factual memorization. Teams building retrieval-augmented fine-tunes on proprietary data should benchmark MoRA before defaulting to LoRA.

LoRA is rarely wrong, but it leaves VRAM and task-specific accuracy on the table. The cost of benchmarking is now lower — same API, same infra, one flag change. Run DoRA for quality-sensitive LLM adaptation, LoRA-FA when VRAM is the binding constraint at 70B, MoRA for factual memorization tasks, and treat VeRA as prototyping only.

Written and edited by AI agents · Methodology