← Deep Learning & Modern AI Chapter 5
Coming Soon

Fine-tuning & Model Adaptation

How to customize LLMs for your needs without training from scratch

The Challenge

You have GPT-4 or Llama 3, but you need it to:

  • Follow your company's specific writing style
  • Understand your domain-specific terminology
  • Behave more helpfully and safely
  • Answer questions in your brand's voice

The solution isn't training a model from scratch ($10M+ and months of work).
Instead, you adapt an existing model through fine-tuning techniques.

What You'll Learn

Fine-tuning is the most practical way to customize LLMs in 2025. This chapter covers all the modern techniques for adapting models efficiently:

01

Why Fine-tuning? Training vs Adaptation

Understanding when and why to fine-tune instead of prompting or training from scratch

  • Prompting: Fast but limited (can't change behavior fundamentally)
  • Pre-training: Powerful but expensive ($10M+, requires massive data)
  • Fine-tuning: The sweet spot (costs $100-$10k, takes days not months)
  • When to use each approach
Prompting
$0.01/query
No training needed
Fine-tuning
$100-$10k
Permanent adaptation
Pre-training
$10M+
Full model creation
02

LoRA: Low-Rank Adaptation

The 2021 breakthrough that made fine-tuning accessible to everyone

  • The Problem: Fine-tuning updates billions of parameters
  • LoRA's Insight: Only update a small "adapter" layer
  • How it works: Low-rank matrix decomposition
  • 99% fewer trainable parameters, same quality
  • QLoRA: 4-bit quantization + LoRA = fine-tune 70B on 1 GPU
Full Fine-tuning:
Update all 70B parameters
Requires: 8x A100 GPUs, $5k+
LoRA:
Update 0.1% (70M parameters)
Requires: 1 GPU, $50

Same quality, 100x cheaper. This is why LoRA revolutionized fine-tuning.

03

Instruction Tuning: Teaching Models to Follow

How base models become helpful assistants

  • Base model: Completes text but doesn't follow instructions
  • Instruction-tuned model: Understands "Write a poem" or "Summarize this"
  • Dataset format: (instruction, input, output) triples
  • Alpaca, Dolly, FLAN: Instruction tuning datasets
  • Why ChatGPT follows instructions but GPT-3 didn't
Base Model (Llama 2 base):
Input: "Explain photosynthesis"
Output: "in plants. The process involves..."
(Continues the sentence, doesn't answer)
Instruction-Tuned (Llama 2 Chat):
Input: "Explain photosynthesis"
Output: "Photosynthesis is the process by which plants..."
(Actually answers the question!)
04

RLHF: Reinforcement Learning from Human Feedback

The technique that made ChatGPT helpful, harmless, and honest

  • Step 1: Collect human preferences (A vs B, which is better?)
  • Step 2: Train a reward model to predict human preferences
  • Step 3: Use RL (PPO algorithm) to optimize for high rewards
  • Why RLHF creates conversational, helpful models
  • The cost: Complex, unstable training, requires ML expertise
1
Human Feedback

Annotators rank outputs: Response A > Response B

2
Reward Model

Train model to predict which response humans prefer

3
RL Optimization

Update LLM to maximize predicted rewards

05

DPO: Direct Preference Optimization

The 2023 breakthrough that makes RLHF obsolete

  • The Problem: RLHF is complex, unstable, requires reward model
  • DPO's Insight: Skip the reward model and RL entirely
  • Directly optimize the LLM using preference pairs
  • Simpler, more stable, same quality as RLHF
  • Why DPO is becoming the standard in 2025
RLHF (2020-2023):
  • 1. Collect preferences
  • 2. Train reward model
  • 3. Run PPO (complex!)
3 stages, unstable
DPO (2023-2025):
  • 1. Collect preferences
  • 2. Directly optimize LLM
  • (That's it!)
1 stage, stable, simpler
06

Practical Fine-tuning: Tools & Workflows

How to actually fine-tune models in 2025

  • Platforms: OpenAI API, HuggingFace, Modal, Replicate
  • Frameworks: Axolotl, LLaMA-Factory, Unsloth
  • Data prep: Format, clean, split train/validation
  • Hyperparameters: Learning rate, batch size, epochs
  • Evaluation: How to know if fine-tuning worked
  • Cost analysis: What to expect to pay
Typical LoRA Fine-tuning Workflow:
  1. Prepare 1,000-10,000 examples in JSONL format
  2. Choose base model (Llama 3, Mistral, etc.)
  3. Configure LoRA parameters (rank=8, alpha=16)
  4. Train for 3 epochs (~1-6 hours)
  5. Evaluate on held-out test set
  6. Merge adapter and deploy
Cost: $50-$500 | Time: 4-8 hours | Result: Custom model
07

PEFT: Parameter-Efficient Fine-Tuning

The broader family of efficient adaptation techniques

  • LoRA: Low-rank adapters (most popular)
  • Prompt tuning: Only tune soft prompts, not model
  • Prefix tuning: Prepend learnable tokens
  • Adapter layers: Insert small trainable modules
  • Trade-offs: efficiency vs quality vs flexibility
08

When to Fine-tune vs When to Prompt

Decision framework for choosing your approach

  • Use prompting when: Need flexibility, quick iteration, no budget
  • Use RAG when: Need up-to-date knowledge, specific documents
  • Use fine-tuning when: Need consistent style, domain expertise, cost-per-query matters
  • Hybrid approaches: Fine-tune + RAG for best results

Why This Matters

Fine-tuning Democratized AI

Before LoRA (2021), only companies with massive GPU clusters could customize models. Now, anyone can fine-tune a 70B parameter model on a single consumer GPU for $50.

This is why you see thousands of specialized models on HuggingFace: medical LLMs, legal LLMs, coding assistants, language-specific models. Fine-tuning made it possible.

Real-World Applications

🏥

Healthcare

Fine-tune Llama 3 on medical literature to create doctor-assistant chatbots

💼

Customer Support

Fine-tune on your support tickets to match your brand voice and handle common issues

⚖️

Legal

Fine-tune on case law and contracts for legal document analysis

💻

Code Generation

Fine-tune on your codebase for company-specific coding assistants

Quick Reference: Adaptation Techniques

Technique
Cost
Complexity
Use When
Prompting
$0.01/query
Easy
Quick iteration, simple tasks
RAG
$100-$1k setup
Medium
Need specific knowledge base
LoRA Fine-tuning
$50-$500
Medium
Domain adaptation, style consistency
DPO/RLHF
$1k-$10k
Hard
Behavior alignment, safety
Pre-training
$10M+
Expert
Creating foundation model

Coming Soon!

This chapter will include interactive demos, code examples, and step-by-step tutorials for fine-tuning your first model with LoRA. You'll learn exactly how to customize LLMs for your specific needs.

All Chapters