Unified Thinking / Reasoning

The gateway normalizes thinking and reasoning across all supported providers into a single request contract and a single response shape.

Request: The `reasoning` Parameter

Send a top-level reasoning object in any OpenAI-format request body:

json

{
  "model": "deepseek-reasoner",
  "reasoning": {
    "effort": "medium",
    "max_tokens": 8000
  },
  "messages": [{ "role": "user", "content": "..." }]
}

Fields

Field	Type	Description
`enabled`	`boolean`	`false` disables thinking. Equivalent to `effort: "none"`.
`effort`	`string`	`"none"` \| `"low"` \| `"medium"` \| `"high"` \| `"xhigh"`. Passed through to models that accept an effort level.
`max_tokens`	`number`	Maximum tokens the model may spend on reasoning (thinking budget).
`exclude`	`boolean`	When `true`, tells the provider not to include thought content in the response (Google-specific).

All fields are optional. reasoning: {} (empty object) enables thinking with provider defaults.

Disabling thinking

json

{ "reasoning": { "enabled": false } }

json

{ "reasoning": { "effort": "none" } }

Provider-native `thinking` parameter (Anthropic / Google)

Anthropic does not read the unified reasoning field — it requires the Anthropic-native thinking param directly:

json

{
  "thinking": { "type": "enabled", "budget_tokens": 8000 }
}

json

{
  "thinking": { "type": "disabled" }
}

Google accepts both reasoning and thinking; they are merged into generationConfig.thinking_config.

How the Gateway Maps `reasoning` Per Provider

Provider	`reasoning.effort`	`reasoning.enabled` / `effort: "none"`	`reasoning.max_tokens`
OpenAI	→ `reasoning_effort` (string)	not sent	not sent
DeepSeek	→ `reasoning_effort` + `thinking.type`	→ `thinking.type: "disabled"`	not sent
Google	→ `generationConfig.thinkingConfig.thinkingLevel`	not sent	→ `generationConfig.thinking_config.thinking_budget`
Cohere	→ `thinking.type: "enabled"`	→ `thinking.type: "disabled"`	→ `thinking.budget_tokens` + `token_budget`
Ollama	`"low"/"medium"/"high"` → `think: <effort>`	→ `think: false`	not sent
DashScope	not sent	→ `enable_thinking: false`	not sent
Anthropic	(not mapped — use `thinking` directly)	(not mapped)	(not mapped)

Response: Normalized Output Shape

Regardless of provider, all reasoning text is returned in:

choices[n].message.reasoning   (non-streaming)
choices[n].delta.reasoning     (streaming)

Both are a plain string. The field is omitted when the model produced no thinking output.

Non-streaming

json

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The answer is 42.",
      "reasoning": "Let me work through this step by step..."
    },
    "finish_reason": "stop"
  }]
}

Streaming

data: {"choices":[{"delta":{"reasoning":"Let me work"},"index":0}]}
data: {"choices":[{"delta":{"reasoning":" through this"},"index":0}]}
data: {"choices":[{"delta":{"content":"The answer is 42."},"index":0}]}
data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}]}

Reasoning deltas and content deltas are sent in separate chunks. A chunk will have at most one of delta.reasoning or delta.content.

Response Normalization Pipeline

All providers that return thinking in non-standard shapes pass through normalizeOpenAIReasoningMessage before the response leaves the gateway. The normalizer collects reasoning from the following locations (in order) and concatenates them into a single reasoning string:

message.reasoning — already normalized (OpenAI-compatible providers).
message.reasoning_content — DeepSeek's field name; stripped after extraction.
message.thinking — some providers; stripped after extraction.
message.content_blocks[] — array form used by some providers; reasoning fields extracted per block.
message.content[] with type: "thinking" or type: "redacted_thinking" — Anthropic streaming content parts; those blocks are removed from the content array.
<think>...</think> tags in message.content — inline tags emitted by open-source models (Qwen, etc.); extracted and removed from the content string.

After normalization:

message.reasoning_content is deleted.
message.thinking is deleted.
message.content_blocks is deleted.
message.content has all <think> tags and thinking-type content parts stripped out.
If no reasoning was found, the reasoning field is omitted entirely (not set to null or "").

The same logic runs on each streaming delta via normalizeOpenAIReasoningDelta, with the additional rule that an empty delta.content ("") is deleted rather than emitted.

Anthropic-specific handling

Anthropic returns thinking as a typed content block array. The Anthropic response transformer iterates response.content[] before normalization:

type: "text" blocks → concatenated into message.content
type: "thinking" blocks → block.thinking appended to reasoning
type: "redacted_thinking" blocks → block.data appended to reasoning

In streaming, a delta.type === "thinking_delta" chunk maps delta.thinking to delta.reasoning.

Google-specific handling

Google returns thinking inside content parts[] with a thought: true flag. The Google response transformer collects these parts into message.reasoning and excludes them from message.content.

Provider Coverage Summary

Provider	`reasoning` param	`thinking` param	Response normalized
OpenAI / Azure OpenAI	✅ effort	—	✅
Anthropic	❌ use `thinking`	✅	✅
Google AI / Vertex AI	✅	✅	✅
DeepSeek	✅ effort + enabled	—	✅
Cohere	✅	—	✅
Ollama	✅	—	✅
DashScope	✅ enabled only	—	✅
OpenRouter	—	—	✅ (passthrough normalized)
Novita AI	—	—	✅ (passthrough normalized)

"Response normalized" means all reasoning fields are consolidated into choices[n].message.reasoning regardless of what the upstream provider returned.

Unified Thinking / Reasoning ​

Request: The reasoning Parameter ​

Fields ​

Disabling thinking ​

Provider-native thinking parameter (Anthropic / Google) ​

How the Gateway Maps reasoning Per Provider ​

Response: Normalized Output Shape ​

Non-streaming ​

Streaming ​

Response Normalization Pipeline ​

Anthropic-specific handling ​

Google-specific handling ​

Provider Coverage Summary ​

Unified Thinking / Reasoning

Request: The `reasoning` Parameter

Fields

Disabling thinking

Provider-native `thinking` parameter (Anthropic / Google)

How the Gateway Maps `reasoning` Per Provider

Response: Normalized Output Shape

Non-streaming

Streaming

Response Normalization Pipeline

Anthropic-specific handling

Google-specific handling

Provider Coverage Summary