Pre-Training vs Fine-Tuning vs Instruction Tuning

Creating a modern LLM happens in distinct stages. It’s like the education of a human: first you learn to read, then you learn a profession, then you learn to be polite.

Stage 1: Pre-Training (The Heavy Lifting)

Goal: Learn the statistical patterns of language and world knowledge. Data: Trillions of tokens (The Internet). Cost: Millions of dollars. Output: A “Base Model.”

A base model is essentially a text completion engine.

  • Input: “The recipe for cake is”
  • Output: “flour, sugar, eggs…”
  • Input: “The capital of Paris is”
  • Output: “France.” (It completes the sentence).

Base models are not chatty. If you ask “What is the capital of France?”, it might complete it with “and its population is 2M,” instead of answering you.

Stage 2: Instruction Tuning (SFT)

Goal: Teach the model to follow orders. Data: Thousands of (Instruction, Output) pairs written by humans. Cost: Thousands of dollars.

  • Example Data:
    • User: “Summarize this article.”
    • Assistant: “Here is a summary…”

This turns the Base Model into an Instruct Model. Now it understands that when you ask a question, it should provide an answer, not just continue the sentence.

Stage 3: Alignment / Preference Tuning (RLHF/DPO)

Goal: Safety, style, and helpfulness. Data: Comparisons (A is better than B).

Even an instruction-tuned model might be rude, dangerous, or verbose. This stage polishes the edges. It aligns the model with human values (don’t be racist, don’t help build bioweapons, be concise).

Summary Pipeline

  1. Pre-Training: Reading the entire library. (Result: Smart but unruly).
  2. Instruction Tuning: Learning to take tests. (Result: Helpful).
  3. Alignment: Learning manners. (Result: Safe and Chatty).

Most open-source models (like Llama 3) release both:

  • Llama-3-Base (Stage 1) - For developers who want to fine-tune from scratch.
  • Llama-3-Instruct (Stage 2+3) - For users who want to chat immediately.