Unified Thinking / Reasoning
The gateway normalizes thinking and reasoning across all supported providers into a single request contract and a single response shape.
Request: The reasoning Parameter
Send a top-level reasoning object in any OpenAI-format request body:
{
"model": "deepseek-reasoner",
"reasoning": {
"effort": "medium",
"max_tokens": 8000
},
"messages": [{ "role": "user", "content": "..." }]
}Fields
| Field | Type | Description |
|---|---|---|
enabled | boolean | false disables thinking. Equivalent to effort: "none". |
effort | string | "none" | "low" | "medium" | "high" | "xhigh". Passed through to models that accept an effort level. |
max_tokens | number | Maximum tokens the model may spend on reasoning (thinking budget). |
exclude | boolean | When true, tells the provider not to include thought content in the response (Google-specific). |
All fields are optional. reasoning: {} (empty object) enables thinking with provider defaults.
Disabling thinking
{ "reasoning": { "enabled": false } }or
{ "reasoning": { "effort": "none" } }Provider-native thinking parameter (Anthropic / Google)
Anthropic does not read the unified reasoning field — it requires the Anthropic-native thinking param directly:
{
"thinking": { "type": "enabled", "budget_tokens": 8000 }
}{
"thinking": { "type": "disabled" }
}Google accepts both reasoning and thinking; they are merged into generationConfig.thinking_config.
How the Gateway Maps reasoning Per Provider
| Provider | reasoning.effort | reasoning.enabled / effort: "none" | reasoning.max_tokens |
|---|---|---|---|
| OpenAI | → reasoning_effort (string) | not sent | not sent |
| DeepSeek | → reasoning_effort + thinking.type | → thinking.type: "disabled" | not sent |
→ generationConfig.thinkingConfig.thinkingLevel | not sent | → generationConfig.thinking_config.thinking_budget | |
| Cohere | → thinking.type: "enabled" | → thinking.type: "disabled" | → thinking.budget_tokens + token_budget |
| Ollama | "low"/"medium"/"high" → think: <effort> | → think: false | not sent |
| DashScope | not sent | → enable_thinking: false | not sent |
| Anthropic | (not mapped — use thinking directly) | (not mapped) | (not mapped) |
Response: Normalized Output Shape
Regardless of provider, all reasoning text is returned in:
choices[n].message.reasoning (non-streaming)
choices[n].delta.reasoning (streaming)Both are a plain string. The field is omitted when the model produced no thinking output.
Non-streaming
{
"choices": [{
"message": {
"role": "assistant",
"content": "The answer is 42.",
"reasoning": "Let me work through this step by step..."
},
"finish_reason": "stop"
}]
}Streaming
data: {"choices":[{"delta":{"reasoning":"Let me work"},"index":0}]}
data: {"choices":[{"delta":{"reasoning":" through this"},"index":0}]}
data: {"choices":[{"delta":{"content":"The answer is 42."},"index":0}]}
data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}]}Reasoning deltas and content deltas are sent in separate chunks. A chunk will have at most one of delta.reasoning or delta.content.
Response Normalization Pipeline
All providers that return thinking in non-standard shapes pass through normalizeOpenAIReasoningMessage before the response leaves the gateway. The normalizer collects reasoning from the following locations (in order) and concatenates them into a single reasoning string:
message.reasoning— already normalized (OpenAI-compatible providers).message.reasoning_content— DeepSeek's field name; stripped after extraction.message.thinking— some providers; stripped after extraction.message.content_blocks[]— array form used by some providers; reasoning fields extracted per block.message.content[]withtype: "thinking"ortype: "redacted_thinking"— Anthropic streaming content parts; those blocks are removed from the content array.<think>...</think>tags inmessage.content— inline tags emitted by open-source models (Qwen, etc.); extracted and removed from the content string.
After normalization:
message.reasoning_contentis deleted.message.thinkingis deleted.message.content_blocksis deleted.message.contenthas all<think>tags and thinking-type content parts stripped out.- If no reasoning was found, the
reasoningfield is omitted entirely (not set tonullor"").
The same logic runs on each streaming delta via normalizeOpenAIReasoningDelta, with the additional rule that an empty delta.content ("") is deleted rather than emitted.
Anthropic-specific handling
Anthropic returns thinking as a typed content block array. The Anthropic response transformer iterates response.content[] before normalization:
type: "text"blocks → concatenated intomessage.contenttype: "thinking"blocks →block.thinkingappended toreasoningtype: "redacted_thinking"blocks →block.dataappended toreasoning
In streaming, a delta.type === "thinking_delta" chunk maps delta.thinking to delta.reasoning.
Google-specific handling
Google returns thinking inside content parts[] with a thought: true flag. The Google response transformer collects these parts into message.reasoning and excludes them from message.content.
Provider Coverage Summary
| Provider | reasoning param | thinking param | Response normalized |
|---|---|---|---|
| OpenAI / Azure OpenAI | ✅ effort | — | ✅ |
| Anthropic | ❌ use thinking | ✅ | ✅ |
| Google AI / Vertex AI | ✅ | ✅ | ✅ |
| DeepSeek | ✅ effort + enabled | — | ✅ |
| Cohere | ✅ | — | ✅ |
| Ollama | ✅ | — | ✅ |
| DashScope | ✅ enabled only | — | ✅ |
| OpenRouter | — | — | ✅ (passthrough normalized) |
| Novita AI | — | — | ✅ (passthrough normalized) |
"Response normalized" means all reasoning fields are consolidated into choices[n].message.reasoning regardless of what the upstream provider returned.