Temperature, Top-K, Top-P: Controlling AI Creativity
What actually happens when you adjust the settings of an LLM? A guide to sampling parameters.
Temperature, Top-K, Top-P: Controlling AI Creativity
When an LLM generates text, it doesn’t just pick “the best” word every time. If it did, it would be boring and repetitive. Instead, it predicts a probability distribution for the next token and then samples from it.
You can control this sampling process with three main dials: Temperature, Top-K, and Top-P.
How the Model Predicts
Imagine the model has finished the sentence: “The cat sat on the…”
It assigns probabilities to the next possible token:
mat(50%)floor(20%)couch(10%)universe(0.001%)
1. Temperature (The Chaos Dial)
Temperature controls the randomness of the selection.
- Low Temperature (0.0 - 0.3): The model becomes conservative. It almost always picks the most likely token (
mat).- Use case: Coding, math, factual retrieval where accuracy matters.
- High Temperature (0.7 - 1.5): The model flattens the probability curve.
floorandcouchget a higher chance of being picked.universebecomes possible.- Use case: Creative writing, brainstorming, poetry.
Math Note: Temperature divides the “logits” before the softmax function. $ \text{softmax}(x_i / T) $ As $T \to 0$, the highest value dominates (becomes 100%). As $T \to \infty$, all values become equal.
2. Top-K Sampling (The VIP List)
Top-K truncates the list of possibilities. It says: “Only consider the top K mostly likely words. Zero out the rest.”
- If Top-K = 3: The model considers only [
mat,floor,couch]. - It recalculates their probabilities to sum to 100% and picks one.
- It completely ignores
universe(the weird long-tail words).
This prevents the model from going completely off the rails by picking a nonsense word that had a 0.0001% chance.
3. Top-P Sampling (Nucleus Sampling)
Top-P is smarter than Top-K. Instead of a fixed number of words (K), it picks a dynamic set of words whose probabilities add up to P.
- If Top-P = 0.9: The model goes down the list:
mat(0.5) - Sum is 0.5. Keep going.floor(0.2) - Sum is 0.7. Keep going.couch(0.1) - Sum is 0.8. Keep going.bed(0.1) - Sum is 0.9. Stop.
The pool is now [mat, floor, couch, bed].
Why is this better?
- In a clear sentence (“The capital of France is…”), the top word
Parismight have 99% probability. Top-P will only pickParis. - In a vague sentence (“The meaning of life is…”), the probability is spread out flat. Top-P will include dozens of words in the pool, allowing for variety.
Summary: How to Tune
- Precision Task (Code/Facts): Temp = 0.0, Top-P = 1.0.
- Creative Task (Story): Temp = 0.7-0.9, Top-P = 0.9.
- Wild Brainstorming: Temp = 1.2+.
Most users only need to tweak Temperature. Leave Top-P alone unless you know what you’re doing.