Skip to content

Unified Thinking / Reasoning

The gateway normalizes thinking and reasoning across all supported providers into a single request contract and a single response shape.

Request: The reasoning Parameter

Send a top-level reasoning object in any OpenAI-format request body:

json
{
  "model": "deepseek-reasoner",
  "reasoning": {
    "effort": "medium",
    "max_tokens": 8000
  },
  "messages": [{ "role": "user", "content": "..." }]
}

Fields

FieldTypeDescription
enabledbooleanfalse disables thinking. Equivalent to effort: "none".
effortstring"none" | "low" | "medium" | "high" | "xhigh". Passed through to models that accept an effort level.
max_tokensnumberMaximum tokens the model may spend on reasoning (thinking budget).
excludebooleanWhen true, tells the provider not to include thought content in the response (Google-specific).

All fields are optional. reasoning: {} (empty object) enables thinking with provider defaults.

Disabling thinking

json
{ "reasoning": { "enabled": false } }

or

json
{ "reasoning": { "effort": "none" } }

Provider-native thinking parameter (Anthropic / Google)

Anthropic does not read the unified reasoning field — it requires the Anthropic-native thinking param directly:

json
{
  "thinking": { "type": "enabled", "budget_tokens": 8000 }
}
json
{
  "thinking": { "type": "disabled" }
}

Google accepts both reasoning and thinking; they are merged into generationConfig.thinking_config.


How the Gateway Maps reasoning Per Provider

Providerreasoning.effortreasoning.enabled / effort: "none"reasoning.max_tokens
OpenAIreasoning_effort (string)not sentnot sent
DeepSeekreasoning_effort + thinking.typethinking.type: "disabled"not sent
GooglegenerationConfig.thinkingConfig.thinkingLevelnot sentgenerationConfig.thinking_config.thinking_budget
Coherethinking.type: "enabled"thinking.type: "disabled"thinking.budget_tokens + token_budget
Ollama"low"/"medium"/"high"think: <effort>think: falsenot sent
DashScopenot sentenable_thinking: falsenot sent
Anthropic(not mapped — use thinking directly)(not mapped)(not mapped)

Response: Normalized Output Shape

Regardless of provider, all reasoning text is returned in:

choices[n].message.reasoning   (non-streaming)
choices[n].delta.reasoning     (streaming)

Both are a plain string. The field is omitted when the model produced no thinking output.

Non-streaming

json
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The answer is 42.",
      "reasoning": "Let me work through this step by step..."
    },
    "finish_reason": "stop"
  }]
}

Streaming

data: {"choices":[{"delta":{"reasoning":"Let me work"},"index":0}]}
data: {"choices":[{"delta":{"reasoning":" through this"},"index":0}]}
data: {"choices":[{"delta":{"content":"The answer is 42."},"index":0}]}
data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}]}

Reasoning deltas and content deltas are sent in separate chunks. A chunk will have at most one of delta.reasoning or delta.content.


Response Normalization Pipeline

All providers that return thinking in non-standard shapes pass through normalizeOpenAIReasoningMessage before the response leaves the gateway. The normalizer collects reasoning from the following locations (in order) and concatenates them into a single reasoning string:

  1. message.reasoning — already normalized (OpenAI-compatible providers).
  2. message.reasoning_content — DeepSeek's field name; stripped after extraction.
  3. message.thinking — some providers; stripped after extraction.
  4. message.content_blocks[] — array form used by some providers; reasoning fields extracted per block.
  5. message.content[] with type: "thinking" or type: "redacted_thinking" — Anthropic streaming content parts; those blocks are removed from the content array.
  6. <think>...</think> tags in message.content — inline tags emitted by open-source models (Qwen, etc.); extracted and removed from the content string.

After normalization:

  • message.reasoning_content is deleted.
  • message.thinking is deleted.
  • message.content_blocks is deleted.
  • message.content has all <think> tags and thinking-type content parts stripped out.
  • If no reasoning was found, the reasoning field is omitted entirely (not set to null or "").

The same logic runs on each streaming delta via normalizeOpenAIReasoningDelta, with the additional rule that an empty delta.content ("") is deleted rather than emitted.

Anthropic-specific handling

Anthropic returns thinking as a typed content block array. The Anthropic response transformer iterates response.content[] before normalization:

  • type: "text" blocks → concatenated into message.content
  • type: "thinking" blocks → block.thinking appended to reasoning
  • type: "redacted_thinking" blocks → block.data appended to reasoning

In streaming, a delta.type === "thinking_delta" chunk maps delta.thinking to delta.reasoning.

Google-specific handling

Google returns thinking inside content parts[] with a thought: true flag. The Google response transformer collects these parts into message.reasoning and excludes them from message.content.


Provider Coverage Summary

Providerreasoning paramthinking paramResponse normalized
OpenAI / Azure OpenAI✅ effort
Anthropic❌ use thinking
Google AI / Vertex AI
DeepSeek✅ effort + enabled
Cohere
Ollama
DashScope✅ enabled only
OpenRouter✅ (passthrough normalized)
Novita AI✅ (passthrough normalized)

"Response normalized" means all reasoning fields are consolidated into choices[n].message.reasoning regardless of what the upstream provider returned.

Unified AI Gateway documentation for multi-model, multi-provider, and coding-plan use cases.