A general AI model knows a lot about everything and not quite enough about your specific domain. Fine-tuning solves this by continuing the training process on your specific data — your company’s communication style, your medical specialty’s terminology, your legal framework’s conventions. The result is a model that behaves like a general AI that has also interned at your organisation for a year.
When Fine-Tuning Is (and Is Not) Worth It
Fine-tuning is worth it when:
- You need consistent style or tone the base model does not adopt from prompting
- Your domain has specialised terminology or format requirements
- You are making thousands of API calls and want to use a smaller, cheaper model
- You have more than 50 high-quality training examples
Fine-tuning is NOT the right answer when:
- Prompt engineering has not been seriously attempted — try prompts first, always
- You have fewer than 50 examples — not enough signal
- You want the model to learn new factual knowledge — fine-tuning teaches style and format, not facts. Use RAG (Retrieval Augmented Generation) for knowledge.
The Three-Step Process
Step 1 — Prepare Your Training Data
OpenAI’s fine-tuning API (the easiest starting point) requires data in JSONL format — one training example per line. Each example is a conversation:
# training_data.jsonl — one object per line
{"messages": [
{"role": "system", "content": "You are a customer support agent for TechCorp. Be concise and helpful."},
{"role": "user", "content": "My order hasn't arrived after 7 days."},
{"role": "assistant", "content": "I apologize for the delay. Can you share your order number? I'll check the status and arrange a resolution within 24 hours."}
]}
{"messages": [
{"role": "system", "content": "You are a customer support agent for TechCorp. Be concise and helpful."},
{"role": "user", "content": "How do I return a defective product?"},
{"role": "assistant", "content": "You can return defective products within 30 days for a full refund. Visit techcorp.com/returns or reply here with your order number and I will send a return label."}
]}
You need at minimum 50 examples, ideally 100-500. Quality matters far more than quantity — 100 excellent examples outperforms 1,000 mediocre ones.
Step 2 — Upload and Start Training
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Upload training file
with open("training_data.jsonl", "rb") as f:
file_response = client.files.create(file=f, purpose="fine-tune")
print(f"File uploaded: {file_response.id}")
# Start fine-tuning job
job = client.fine_tuning.jobs.create(
training_file=file_response.id,
model="gpt-4o-mini-2024-07-18", # Smallest, cheapest model; good starting point
hyperparameters={"n_epochs": 3} # 3 epochs is a good default
)
print(f"Fine-tuning job started: {job.id}")
print(f"Status: {job.status}")
Step 3 — Monitor and Test
# Check job status
job_status = client.fine_tuning.jobs.retrieve(job.id)
print(f"Status: {job_status.status}")
print(f"Fine-tuned model: {job_status.fine_tuned_model}")
# Once complete, test your model
if job_status.fine_tuned_model:
response = client.chat.completions.create(
model=job_status.fine_tuned_model, # Your fine-tuned model ID
messages=[
{"role": "system", "content": "You are a customer support agent for TechCorp."},
{"role": "user", "content": "My invoice shows the wrong address."}
]
)
print(response.choices[0].message.content)
Open-Source Alternative: Fine-Tuning with Unsloth
If you want to avoid API costs and run fine-tuning locally, Unsloth makes fine-tuning Llama and Mistral models 2-5x faster than standard methods on consumer hardware:
pip install unsloth
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3-8b-Instruct",
max_seq_length=2048,
load_in_4bit=True, # Reduces VRAM usage by 4x
)
# Add LoRA adapters (fine-tunes only a small fraction of parameters)
model = FastLanguageModel.get_peft_model(model, r=16, target_modules=["q_proj", "k_proj"])
# Then train with your data using standard HuggingFace Trainer
A full fine-tuning tutorial with Unsloth is available at github.com/unslothai/unsloth — the README is detailed and well-maintained.
Evaluating Your Fine-Tuned Model
After training, test your model on a held-out set of examples that were not in the training data. Evaluate:
- Does it follow the style and format you defined?
- Does it correctly handle edge cases?
- Is it worse than the base model at anything you care about?
The last question is critical — fine-tuning can cause “catastrophic forgetting” of capabilities the base model had. Always test broadly, not just on your target task.
Key Takeaway: Fine-tuning is most valuable for style, format, and domain vocabulary — not for adding new knowledge. Use at least 50-100 high-quality examples. OpenAI’s API is the easiest starting point; Unsloth + Llama is the best option for those who want local, free fine-tuning.

Be the first to respond