# Introduction
Agentic AI systems rely on a model’s ability to reliably call tools, selecting the right function, formatting arguments correctly, and integrating results into multi-step workflows. Large frontier models such as ChatGPT, Claude, and Gemini handle this well, but they come with tradeoffs in cost, latency, and hardware requirements that make them impractical for many real-world deployments. Small language models have done well to close that gap, and several compact, open-weight options now offer first-class tool-calling support without the need for a data center to run them.
And now, in no particular order, here are 5 small language models for agentic tool calling. Note that, for convenience and consistency, all model links point to Hugging Face-hosted models.
# 1. SmolLM3-3B
| Technical Aspect | Details |
|---|---|
| Parameters | 3B |
| Architecture | Decoder-only transformer (GQA + NoPE, 3:1 ratio) |
| Context Length | 64K native; up to 128K with YaRN extrapolation |
| Training Tokens | 11.2T |
| Multilingual Support | 6 languages (EN, FR, ES, DE, IT, PT) |
| Reasoning Mode | Dual-mode (thinking / no-think toggle) |
| Tool Calling | Yes: JSON/XML (xml_tools) and Python (python_tools) |
| License | Apache 2.0 |
SmolLM3 is a 3B parameter language model designed to push the boundaries of small models, supporting dual-mode reasoning, 6 languages, and long context. It is a decoder-only transformer using Grouped Query Attention (GQA) and No Positional Embeddings (NoPE) (with a 3:1 ratio), pretrained on 11.2T tokens with a staged curriculum of web, code, math, and reasoning data. Post-training included a mid-training phase on 140 billion reasoning tokens, followed by supervised fine-tuning and alignment via Anchored Preference Optimization (APO), HuggingFace’s off-policy approach to preference alignment. The model supports two distinct tool-calling interfaces, JSON/XML blobs via xml_tools and Python-style function calls via python_tools, making it highly flexible for agentic pipelines and RAG systems. As a fully open release, including weights, datasets, and training code, SmolLM3 is ideal for chatbots, RAG systems, and code assistants on constrained hardware such as edge devices or low-VRAM machines.
# 2. Qwen3-4B-Instruct-2507
| Technical Aspect | Details |
|---|---|
| Parameters | 4.0B (3.6B non-embedding) |
| Architecture | Causal LM, 36 layers, GQA (32 Q heads / 8 KV heads) |
| Context Length | 262,144 tokens (native) |
| Reasoning Mode | Non-thinking only (no blocks) |
| Multilingual | 100+ languages |
| Tool Calling | Yes: native, via Qwen-Agent / MCP |
| License | Apache 2.0 |
Qwen3-4B-Instruct-2507 is an updated version of the Qwen3-4B non-thinking mode, featuring significant improvements in general capabilities including: instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. It also possesses substantial gains in long-tail knowledge coverage across multiple languages. Both the Instruct and Thinking variants share 4 billion total parameters (3.6B excluding embeddings) built across 36 transformer layers, using GQA with 32 query heads and 8 key/value heads, enabling efficient memory management for very long contexts. This specific non-thinking variant is optimized for direct, fast-response use cases, such as delivering concise answers without explicit chain-of-thought traces, making it well-suited for chatbots, customer support, and tool-calling agents where low latency matters. Qwen3 excels in tool-calling capabilities, and Alibaba recommends using the Qwen-Agent framework, which encapsulates tool-calling templates and parsers internally, reducing coding complexity, with support for MCP server configuration files.
# 3. Phi-3-mini-4k-instruct
| Technical Aspect | Details |
|---|---|
| Parameters | 3.8B |
| Architecture | Decoder-only transformer |
| Context Length | 4K tokens |
| Vocabulary Size | 32,064 tokens |
| Training Data | Synthetic + filtered public web data |
| Post-training | SFT + DPO |
| Tool Calling | Yes: via chat template (requiring HF’s transformers SmolLM3 is a 3B parameter open-weight reasoning model that |



