Chapter 5: Fine-tuning & Model Adaptation

What You'll Learn

Fine-tuning is the most practical way to customize LLMs in 2025. This chapter covers all the modern techniques for adapting models efficiently:

Why Fine-tuning? Training vs Adaptation

Understanding when and why to fine-tune instead of prompting or training from scratch

Prompting: Fast but limited (can't change behavior fundamentally)
Pre-training: Powerful but expensive ($10M+, requires massive data)
Fine-tuning: The sweet spot (costs $100-$10k, takes days not months)
When to use each approach

Prompting

$0.01/query

No training needed

Fine-tuning
$100-$10k
Permanent adaptation

Pre-training

$10M+

Full model creation

LoRA: Low-Rank Adaptation

The 2021 breakthrough that made fine-tuning accessible to everyone

The Problem: Fine-tuning updates billions of parameters
LoRA's Insight: Only update a small "adapter" layer
How it works: Low-rank matrix decomposition
99% fewer trainable parameters, same quality
QLoRA: 4-bit quantization + LoRA = fine-tune 70B on 1 GPU

Full Fine-tuning:

Update all 70B parameters

Requires: 8x A100 GPUs, $5k+

→

LoRA:

Update 0.1% (70M parameters)

Requires: 1 GPU, $50

Same quality, 100x cheaper. This is why LoRA revolutionized fine-tuning.

Instruction Tuning: Teaching Models to Follow

How base models become helpful assistants

Base model: Completes text but doesn't follow instructions
Instruction-tuned model: Understands "Write a poem" or "Summarize this"
Dataset format: (instruction, input, output) triples
Alpaca, Dolly, FLAN: Instruction tuning datasets
Why ChatGPT follows instructions but GPT-3 didn't

Base Model (Llama 2 base):

Input: "Explain photosynthesis"

Output: "in plants. The process involves..."
(Continues the sentence, doesn't answer)

Instruction-Tuned (Llama 2 Chat):

Input: "Explain photosynthesis"

Output: "Photosynthesis is the process by which plants..."
(Actually answers the question!)

RLHF: Reinforcement Learning from Human Feedback

The technique that made ChatGPT helpful, harmless, and honest

Step 1: Collect human preferences (A vs B, which is better?)
Step 2: Train a reward model to predict human preferences
Step 3: Use RL (PPO algorithm) to optimize for high rewards
Why RLHF creates conversational, helpful models
The cost: Complex, unstable training, requires ML expertise

Human Feedback

Annotators rank outputs: Response A > Response B

→

Reward Model

Train model to predict which response humans prefer

→

RL Optimization

Update LLM to maximize predicted rewards

DPO: Direct Preference Optimization

The 2023 breakthrough that makes RLHF obsolete

The Problem: RLHF is complex, unstable, requires reward model
DPO's Insight: Skip the reward model and RL entirely
Directly optimize the LLM using preference pairs
Simpler, more stable, same quality as RLHF
Why DPO is becoming the standard in 2025

RLHF (2020-2023):

1. Collect preferences
2. Train reward model
3. Run PPO (complex!)

3 stages, unstable

DPO (2023-2025):1. Collect preferences
2. Directly optimize LLM
(That's it!)
1 stage, stable, simpler

Practical Fine-tuning: Tools & Workflows

How to actually fine-tune models in 2025

Platforms: OpenAI API, HuggingFace, Modal, Replicate
Frameworks: Axolotl, LLaMA-Factory, Unsloth
Data prep: Format, clean, split train/validation
Hyperparameters: Learning rate, batch size, epochs
Evaluation: How to know if fine-tuning worked
Cost analysis: What to expect to pay

Typical LoRA Fine-tuning Workflow:

Prepare 1,000-10,000 examples in JSONL format
Choose base model (Llama 3, Mistral, etc.)
Configure LoRA parameters (rank=8, alpha=16)
Train for 3 epochs (~1-6 hours)
Evaluate on held-out test set
Merge adapter and deploy

Cost: $50-$500 | Time: 4-8 hours | Result: Custom model

PEFT: Parameter-Efficient Fine-Tuning

The broader family of efficient adaptation techniques

LoRA: Low-rank adapters (most popular)
Prompt tuning: Only tune soft prompts, not model
Prefix tuning: Prepend learnable tokens
Adapter layers: Insert small trainable modules
Trade-offs: efficiency vs quality vs flexibility

When to Fine-tune vs When to Prompt

Decision framework for choosing your approach

Use prompting when: Need flexibility, quick iteration, no budget
Use RAG when: Need up-to-date knowledge, specific documents
Use fine-tuning when: Need consistent style, domain expertise, cost-per-query matters
Hybrid approaches: Fine-tune + RAG for best results

Why This Matters

Fine-tuning Democratized AI

Before LoRA (2021), only companies with massive GPU clusters could customize models. Now, anyone can fine-tune a 70B parameter model on a single consumer GPU for $50.

This is why you see thousands of specialized models on HuggingFace: medical LLMs, legal LLMs, coding assistants, language-specific models. Fine-tuning made it possible.

Real-World Applications

🏥

Healthcare

Fine-tune Llama 3 on medical literature to create doctor-assistant chatbots

💼

Customer Support

Fine-tune on your support tickets to match your brand voice and handle common issues

⚖️

Legal

Fine-tune on case law and contracts for legal document analysis

💻

Code Generation

Fine-tune on your codebase for company-specific coding assistants

Fine-tuning & Model Adaptation

The Challenge