# Available Models

FreeInference provides access to multiple state-of-the-art LLM models for coding agents and IDEs.

## Model Overview

| Model ID | Name | Context Length | Max Output | Features |
|----------|------|----------------|------------|----------|
| `glm-5.1` | GLM-5.1 | 200K tokens | 128K tokens | Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
| `glm-5` | GLM-5 | 200K tokens | 128K tokens | Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
| `glm-4.7` | GLM-4.7 | 200K tokens | 128K tokens | Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
| `glm-4.7-flash` | GLM-4.7-Flash | 200K tokens | 128K tokens | Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
| `minimax-m2.5` | MiniMax M2.5 | 1M tokens | 128K tokens | Function calling, Structured output, Thinking mode, Multimodal (text+image) |
| `minimax-m2` | MiniMax M2 | 196K tokens | 8K tokens | Function calling, Structured output |
| `qwen3-coder-30b` | Qwen3 Coder 30B | 32K tokens | 8K tokens | Function calling, Structured output |
| `llama-3.3-70b-instruct` | Llama 3.3 70B Instruct | 131K tokens | 8K tokens | Function calling, Structured output |
| `llama-4-scout` | Llama 4 Scout | 128K tokens | 16K tokens | Function calling, Structured output |
| `llama-4-maverick` | Llama 4 Maverick | 128K tokens | 16K tokens | Function calling, Structured output, Multimodal (text+image) |

> **Note:** Llama models are available with limited capacity. Availability may vary during peak usage.

### Embedding Models

| Model ID | Name | Dimensions | Context Length | Use Case |
|----------|------|------------|----------------|----------|
| `bge-m3` | BGE-M3 | 1024 | 8K tokens | Codebase indexing, semantic search |

---

## Model Details

### GLM-5.1

**Model ID:** `glm-5.1`

- Context length: 200,000 tokens
- Max output: 128,000 tokens
- Quantization: fp8
- Input modalities: text
- Output modalities: text
- Language support: Chinese, English
- Function calling: Yes
- Structured output: Yes
- Thinking mode: Yes
- Tool streaming: Yes

---

### GLM-5

**Model ID:** `glm-5`

- Context length: 200,000 tokens
- Max output: 128,000 tokens
- Architecture: 745B MoE (44B active parameters)
- Quantization: fp8
- Input modalities: text
- Output modalities: text
- Language support: Chinese, English
- Function calling: Yes
- Structured output: Yes
- Thinking mode: Yes
- Tool streaming: Yes

---

### GLM-4.7

**Model ID:** `glm-4.7`

- Context length: 200,000 tokens
- Max output: 128,000 tokens
- Quantization: fp8
- Input modalities: text
- Output modalities: text
- Language support: Chinese, English
- Function calling: Yes
- Structured output: Yes
- Thinking mode: Yes
- Tool streaming: Yes

---

### GLM-4.7-Flash

**Model ID:** `glm-4.7-flash`

- Context length: 200,000 tokens
- Max output: 128,000 tokens
- Quantization: fp8
- Input modalities: text
- Output modalities: text
- Language support: Chinese, English
- Function calling: Yes
- Structured output: Yes
- Thinking mode: Yes
- Tool streaming: Yes

---

### MiniMax M2.5

**Model ID:** `minimax-m2.5`

- Context length: 1,000,000 tokens
- Max output: 131,072 tokens
- Architecture: 230B MoE (10B active parameters)
- Quantization: bf16
- Input modalities: text, image
- Output modalities: text
- Function calling: Yes
- Structured output: Yes
- Thinking mode: Yes

---

### MiniMax M2

**Model ID:** `minimax-m2`

- Context length: 196,608 tokens
- Max output: 8,192 tokens
- Quantization: bf16
- Input modalities: text
- Output modalities: text
- Function calling: Yes
- Structured output: Yes

---

### Qwen3 Coder 30B

**Model ID:** `qwen3-coder-30b`

- Context length: 32,768 tokens
- Max output: 8,192 tokens
- Quantization: bf16
- Input modalities: text
- Output modalities: text
- Function calling: Yes
- Structured output: Yes

---

### Llama 3.3 70B Instruct (Limited Capacity)

**Model ID:** `llama-3.3-70b-instruct`

- Context length: 131,072 tokens
- Max output: 8,192 tokens
- Quantization: bf16
- Input modalities: text
- Output modalities: text
- Function calling: Yes
- Structured output: Yes

---

### Llama 4 Scout (Limited Capacity)

**Model ID:** `llama-4-scout`

- Context length: 128,000 tokens
- Max output: 16,384 tokens
- Quantization: fp8
- Input modalities: text
- Output modalities: text
- Function calling: Yes
- Structured output: Yes

---

### Llama 4 Maverick (Limited Capacity)

**Model ID:** `llama-4-maverick`

- Context length: 128,000 tokens
- Max output: 16,384 tokens
- Quantization: fp8
- Input modalities: text, image
- Output modalities: text
- Function calling: Yes
- Structured output: Yes

---

### BGE-M3 (Embedding)

**Model ID:** `bge-m3`

- Type: Embedding
- Dimensions: 1024
- Context length: 8,192 tokens
- Quantization: fp16
- Input modalities: text
- Output modalities: embedding
- Multilingual: Yes (100+ languages)

Use this model for codebase indexing in Roo Code, Kilo Code, and other tools that support semantic code search. See the [integration guide](integrations.md) for setup instructions.

---

## Switching Models

To use different models, change the model name in your IDE configuration:

**Cursor:** Select from the dropdown in settings

**Codex:** Edit `~/.codex/config.toml`:
```toml
model = "glm-5"  # Change to any model ID
```

**Roo Code / Kilo Code:** Select from the dropdown in extension settings