Ollama + gpt-oss:20b not using tools properly

Ollama + gpt-oss:20b not using tools properly
TLDR: The vanilla gpt‑oss:20b runs with a 4 k context window – too small for tool‑calling.

Run the following commands:
ollama run gpt-oss:20b
/set parameter num_ctx 131072
/save gpt-oss:20b-128k
/bye


- This will clone the model with a 128 k context window
- Then point your UI (OpenWebUI, VSCode, OpenCode, …) at the cloned model.
- Tool‑calling works again.

1. The Problem: Why Tool‑Calling Fails

When you ask an LLM to use a tool (e.g., run git status, call a REST API, or execute a shell command), the model needs to:

  1. Understand the prompt – what the user wants.
  2. Retrieve the tool definition – name, description, arguments.
  3. Read the tool’s response – and then incorporate it into the final answer.

Format a tool‑call – JSON with function name and arguments.

All of that happens inside the model’s context window. Think of it as a “memory box” that holds tokens. The larger the box, the more information the model can keep in mind while it’s reasoning.

The default gpt‑oss:20b that comes from Ollama is configured with ~4 k tokens (≈ 4086). That’s great for a quick chat, but it’s too small for the four steps above. The model runs out of “room” and either:

  • Drops the tool definition – so it can’t know what the tool is.
  • Truncates the user prompt – so it loses context.
  • Returns an error – e.g., “I’m sorry, I can’t do that.”

2. Quick Check: What Context Is Your Model Using?

Open a terminal (or your terminal emulator) and run:

ollama ps

You’ll see output similar to:

If the CONTEXT column shows ≈ 4086; you’re stuck with the default 4 k window.


3. Clone the Model with a Larger Context Window

Why Clone?

Ollama allows you to modify a model’s configuration in‑memory and then save that configuration as a new image. This is the safest way to upgrade the context window because:

  • You keep the original image untouched (you can always fall back).
  • The new image can be used like any other model (in OpenWebUI, VSCode, OpenCode, etc.).

The Command Flow

Step
Command
What it Does
1
ollama run gpt-oss:20b
Loads the 20 B model into RAM and starts an interactive session.
2
/set parameter num_ctx 131072
Sets the context window to 128 k tokens (131 072 = 128 × 1024).
3
/save gpt-oss:20b-128k
Persists the configuration as a new image named gpt-oss:20b-128k.
4
/bye
Exits the interactive session.
Note – If you’re on a machine with limited RAM, you can set num_ctx to 65536 (≈ 64 k tokens). The same steps apply, just replace the number.

What About the 120B Model?

If you’re using the larger gpt-oss:120b, the process is identical—just replace the model name:

ollama run gpt-oss:120b
/set parameter num_ctx 131072
/save gpt-oss:120b-128k
/bye

4. Verify the New Model

Run ollama ps again:

ollama ps

You should now see two entries:


5. Point Your UI to the New Model

OpenWebUI

  1. Open SettingsGeneral.
  2. Under Model choose gpt‑oss:20b-128k.
  3. Reload the page.

VSCode (Continue.dev Extension)

  1. Click the Chat sidebar icon.
  2. Click the gear icon → Change Model.
  3. Pick gpt‑oss:20b-128k.
  4. Restart the chat session.

OpenCode

  1. Go to SettingsLLM Model.
  2. Select gpt‑oss:20b-128k.
  3. Save and reload.

6. Test a Tool‑Call

Now, let’s make the assistant run a simple tool, e.g., the built‑in git status tool:

Assistant: Please run the `git status` tool.

You should see the assistant output the tool’s JSON response, followed by a human‑readable answer that includes the tool’s result.


7. Troubleshooting

Symptom
Likely Cause
Fix
“Tool not found” or “No tool definitions”
Context window too small
Clone with 64 k or 128 k
“Out of memory” when starting the model
128 k requires a lot of RAM (≈ 10 GB for 20 B)
Use 64 k (num_ctx 65536) or upgrade hardware
“Failed to load model” after saving
The image name already exists
Delete the old image (ollama rm gpt-oss:20b-128k) or use a new suffix
“Error: No model named gpt‑oss:20b-128k” in UI
UI not refreshed or wrong spelling
Restart the UI, double‑check the name

8. Quick Recap

  1. Check your current context: ollama ps.
  2. Re‑clone the model with a larger window (num_ctx 131072 for 128 k).
  3. Save the new image (gpt-oss:20b-128k).
  4. Switch your UI to the new model.
  5. Test a tool‑call – it should work now.

Final Thought

Tool‑calling is a powerful feature that lets an LLM act like a software engineer rather than just a chatbot. With the right context window, your self‑hosted Ollama + GPT‑OSS setup becomes a truly autonomous coding assistant. Happy hacking!