Finetune Llama 3.2 1B for Tool Calling: A Practical Guide

Feb 18, 2024

6 Min Read

Post By: Raj Gupta

Hey there! Today we’re going to build something exciting — an AI assistant that can understand natural language requests and convert them into specific tool calls. Think of it like teaching an AI to press the right buttons when someone asks for help.

What We’re Going to Build

Imagine someone asks: “What’s the weather in Paris?” Our AI will respond with: {"tool": "get_weather", "location": "Paris"}

Or if someone says: “Find me red shoes under $100” It’ll know to use: {"tool": "search_products", "query": "red shoes", "max_price": 100}

Let’s build this together! 🚀

What You’ll Need

Before we start, let’s make sure you have:

A Google Colab account (Pro recommended for faster training)
Basic Python knowledge
About 2–3 hours of time
Some example data (don’t worry, we’ll create this together!)

Step 1: Preparing Our Training Data

Let’s start with the most important part — our training data. Think of this as the textbook we’ll use to teach our AI.

Creating Our CSV File

First, let’s create a CSV file that looks like this:

user_message,tool_name,parameters
"What's the weather in Paris?","get_weather","{""location"": ""Paris"", ""units"": ""celsius""}
""Find red shoes under $100","search_products","{""query"": ""red shoes"", ""max_price"": 100}"

Let me explain each column:

user_message: What a person might ask (like a text message)
tool_name: Which tool should handle this request
parameters: What information the tool needs (in JSON format)

💡 Why JSON format?: We use JSON because it’s a standard way to structure data. It’s like giving the AI a consistent format to follow.

Let’s Create Some Examples Together

Think about common requests people might make. Here are some patterns we can follow:

Weather requests:

# Example structure for weather
{    
    "user_message": "What's the weather like in [CITY]?",   
    "tool_name": "get_weather",   
    "parameters": {"location": "[CITY]", "units": "celsius"}
}

2. Shopping requests:

# Example structure for shopping
{   
    "user_message": "Find [ITEM] under $[PRICE]",
    "tool_name": "search_products",    
    "parameters": {"query": "[ITEM]", "max_price": [PRICE]}
}

🔍 Quick Check: Before moving on, make sure your CSV file:

Opens correctly in Excel or a text editor
Has properly formatted JSON in the parameters column
Contains at least 10–15 diverse examples

Step 2: Setting Up Our Environment

Now let’s set up everything we need to train our model. I’ll explain each part as we go:

# First, let's install our tools
!pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install bitsandbytes>=0.41.1 pandas

# Now import what we need
from unsloth import FastLanguageModel
import torch
import json
import pandas as pd
from datasets import Dataset
from transformers import BitsAndBytesConfig

# Set up 4-bit quantization (this helps us use less memory!)
bnb_config = BitsAndBytesConfig(    
    load_in_4bit=True,  
    bnb_4bit_quant_type="nf4",  
    bnb_4bit_compute_dtype=torch.float16,  
    bnb_4bit_use_double_quant=True
)

💡 Why 4-bit quantisation?: Think of this like compressing a file. We’re making the model smaller without losing too much quality. It’s like turning a high-res photo into a smaller file that still looks good!

✅ Success Check: You should see no error messages after running these commands.

Step 3: Loading Our Training Data

Let’s create a helper class to load and validate our CSV data. I’ll explain each part:

class DataLoader:   
  def __init__(self, csv_path):      
    self.csv_path = csv_path       
    print("Loading data from:", csv_path)      
    
    def load_data(self):        
      """Loads and checks our training data"""       
      try:           
        # Read the CSV file           
        df = pd.read_csv(self.csv_path)         
        print(f"Found {len(df)} examples")     
        
        # Process each row          
        processed_examples = []        
        for idx, row in df.iterrows():      
          try:                   
            # Convert the parameters string to JSON     
            parameters = json.loads(row['parameters'])  
            
            # Create our training example             
            example = {                       
              "input": row['user_message'],            
              "output": {                    
                "function_call": {         
                  "name": row['tool_name'],      
                  "arguments": parameters         
                }                      
              }                   
            }                    
            processed_examples.append(example)          
          except json.JSONDecodeError:        
            print(f"⚠️ Row {idx + 1} has invalid JSON - skipping it")  
            continue                          
            
            print(f"Successfully processed {len(processed_examples)} examples")         
            return processed_examples    
            
          except Exception as e:     
            print(f"Error loading CSV: {str(e)}")      
            return []    
            
     def validate_example(self, example):   
              """Checks if an example looks good"""      
       # Check for empty messages    
         if not example['input'].strip():          
           return False                  
         
         # Make sure we have a tool name        
         if not example['output']['function_call']['name'].strip():  
           return False                
           
           # Check parameters       
         if not example['output']['function_call']['arguments']:   
           return False                   
           
          return True

Let’s load our data:

# Create our data loader
loader = DataLoader('training_data.csv')

# Load and check our examples
training_examples = loader.load_data()

print(f"\nData Summary:")
print(f"- Total examples: {len(training_examples)}")
print(f"- Unique tools: {len(set(ex['output']['function_call']['name'] for ex in training_examples))}")

🔍 What to Look For:

You should see the number of examples loaded
No error messages about invalid JSON
At least a few different tools being used

Step 4: Preparing for Training

Now let’s format our data for training. We’ll create a class to handle this:

class TrainingPreparator: 
  def __init__(self, examples):      
    self.examples = examples      
    self.tools = self._gather_tools()    
    def _gather_tools(self):        
      """Collects info about all our tools"""       
      tools = {}        
      for ex in self.examples:    
        tool_call = ex['output']['function_call']        
        tool_name = tool_call['name']                      
        
        if tool_name not in tools:           
          tools[tool_name] = {                
            "name": tool_name,                  
            "parameters": set()              
          }                        
          
          # Collect all parameters this tool uses          
          tools[tool_name]["parameters"].update(tool_call['arguments'].keys())              
          
          return tools       
          
    def format_for_training(self):       
      """Gets our data ready for the model"""      
      formatted_data = []              
      
      for example in self.examples:        
        # Create the instruction           
        instruction = self._create_instruction(example)                   
        
        # Format the expected output         
        output = json.dumps(example['output'], indent=2)          
        
        # Put it all together           
        formatted = {               
          "text": f"<s>[INST] {instruction} [/INST]\n{output}</s>"  
        }                       
        
        formatted_data.append(formatted)    
        
        return formatted_data      
        
    def _create_instruction(self, example):      
      """Creates clear instructions for the model"""       
      instruction = f"User Request: {example['input']}\n\n"       
      instruction += "Available tools:\n"             
      
      # List all tools and their parameters      
      for tool_name, tool_info in self.tools.items():      
        params = ", ".join(tool_info["parameters"])      
        instruction += f"- {tool_name} (Parameters: {params})\n" 
        
        return instruction

Let’s use our preparator:

# Prepare our training data
preparator = TrainingPreparator(training_examples)
prepared_data = preparator.format_for_training()

print("\nTraining Data Preview:")
print(prepared_data[0]["text"])

✅ Success Check: You should see:

A formatted example with instructions
List of available tools
Properly formatted JSON output

Step 5: Training Our Model

Now for the exciting part — let’s train our model! I’ll explain each step:

def setup_training():  
  """Sets up everything for training"""    
  print("Loading base model...")    
  model, tokenizer = FastLanguageModel.from_pretrained(       
    model_name = "unsloth/Llama-3.2-1B-Instruct",       
    max_seq_length = 2048,        
    quantization_config = bnb_config,       
    device_map = "auto"   
  )       
  
  print("Preparing for fine-tuning...")   
  model = FastLanguageModel.get_peft_model(  
    model,       
    r = 16,  # This controls how much the model can learn        
    target_modules = [          
      "q_proj", "k_proj", "v_proj", "o_proj",  # Attention layers          
      "gate_proj", "up_proj", "down_proj"      # FFN layers       
    ],       
    lora_alpha = 16,       
    lora_dropout = 0,       
    use_gradient_checkpointing = True,  # Saves memory      
    use_4bit = True                     # More memory savings  
  )       
  
  return model, tokenizer
  
  # Set up training settings
  from transformers import TrainingArguments
  def get_training_settings():  
    """Gets our training configuration"""  
    return TrainingArguments(       
      output_dir = "my_tool_calling_model",  # Where to save our model     
      per_device_train_batch_size = 4,       # How many examples to process at once   
      gradient_accumulation_steps = 4,       # Helps with memory usage      
      learning_rate = 2e-4,                  # How fast to learn     
      max_steps = 100,                       # How long to train      
      logging_steps = 1,                     # How often to see progress  
      fp16 = True,                          # More memory efficiency   
      gradient_checkpointing = True          # Even more memory savings!   
    )
    
    # Start training
    from trl import SFTTrainer
    def train_model(model, tokenizer, prepared_data, training_args):   
      """Trains our model"""    print("Setting up training...")  
      trainer = SFTTrainer( 
        model = model,      
        tokenizer = tokenizer,       
        train_dataset = Dataset.from_dict({"text": [ex["text"] for ex in prepared_data]}),   
        dataset_text_field = "text",    
        max_seq_length = 2048,     
        args = training_args    
      )        
      
      print("Starting training...")  
      trainer.train()   
      return trainer
      
    # Let's put it all together!
      model, tokenizer = setup_training()
      training_args = get_training_settings()
      trainer = train_model(model, tokenizer, prepared_data, training_args)

💡 What’s Happening?:

We load the base model (Llama 3.2 1B)
Set it up for efficient training (4-bit quantization)
Configure how it should learn
Start the training process!

You should see progress updates as it trains:

Training Step: 1/100
Loss: 2.345
Training Step: 2/100
Loss: 2.123
...

The loss number should go down — that means it’s learning! 📉

Step 6: Testing Our Model

Let’s see what our model can do:

def test_our_model(model, tokenizer, user_input):  
    """Tests what our model has learned"""   
    # Format the input like in training    
    input_text = f"<s>[INST] User Request: {user_input}\n\nProvide the appropriate tool call in JSON format. [/INST]"       
    
    # Get the model's response   
    inputs = tokenizer(input_text, return_tensors="pt").to("cuda")  
    outputs = model.generate(    
        **inputs,       
        max_new_tokens = 100,        
        temperature = 0.7  # This adds some creativity   
    ) 
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)  
    return response
    
 #Let's try some examples!
 test_questions = [
   "What's the weather like in London?",
    "Find me a gaming laptop under $1000",
    "Is it raining in Tokyo?"
   ]
 
print("\nTesting our model:")
for question in test_questions:   
    print(f"\nQuestion: {question}")    
  print("Response:", test_our_model(model, tokenizer, question))

✅ Success Looks Like:

Clean JSON responses
Correct tool selection
Proper parameter formatting

Common Problems and Solutions

“Help! My CSV isn’t loading!”

Check the file encoding (use UTF-8)
Make sure JSON is properly formatted
Look for missing quotes or commas

“The training is so slow!”

Reduce max_steps for testing
Use a smaller dataset first
Make sure 4-bit quantisation is enabled

“I’m getting memory errors!”

Reduce batch size
Enable all memory optimisations
Use shorter sequences

Next Steps

Now that your model is working, try:

Adding more training examples
Creating new tools
Testing with different questions
Fine-tuning the parameters

Remember:

Start small and test often
Add complexity gradually
Keep your training data clean
Monitor the training loss

You’ve built your own tool-calling AI! 🎉

Happy training! 🚀

‹ How A2A and MCP Are Teaching AI Agents to Speak (and Use Their Tools)

PydanticAI Just Made AI Testing Dead Simple — Here’s How It Works ›

Dec 20, 2024

6 Min Read

Hugging Face TGI v3.0: A Quantum Leap in LLM Performance and Simplified Local Deployment

Dec 20, 2024

6 Min Read

Hugging Face TGI v3.0: A Quantum Leap in LLM Performance and Simplified Local Deployment

May 3, 2025

8 Min Read

How we Designed an Entity Resolution Logic Engine

May 3, 2025

8 Min Read

How we Designed an Entity Resolution Logic Engine

Apr 19, 2025

6 Min Read

How A2A and MCP Are Teaching AI Agents to Speak (and Use Their Tools)

Apr 19, 2025

6 Min Read

Finetune Llama 3.2 1B for Tool Calling: A Practical Guide

What We’re Going to Build

What You’ll Need

Step 1: Preparing Our Training Data

Creating Our CSV File

Let’s Create Some Examples Together

Step 2: Setting Up Our Environment

Step 3: Loading Our Training Data

Step 4: Preparing for Training

Step 5: Training Our Model

Step 6: Testing Our Model

Common Problems and Solutions

“Help! My CSV isn’t loading!”

“The training is so slow!”

“I’m getting memory errors!”

Next Steps

Read More

Hugging Face TGI v3.0: A Quantum Leap in LLM Performance and Simplified Local Deployment

Hugging Face TGI v3.0: A Quantum Leap in LLM Performance and Simplified Local Deployment

How we Designed an Entity Resolution Logic Engine

How we Designed an Entity Resolution Logic Engine

How A2A and MCP Are Teaching AI Agents to Speak (and Use Their Tools)

How A2A and MCP Are Teaching AI Agents to Speak (and Use Their Tools)