A Z.ai desktop user reported thinking reverting to medium after one turn,
burning ~200% of a week's credits in 4 days despite reasoning_effort: false
in config.yaml. Four compounding bugs:
- _session_info reported reasoning_effort "" for disabled reasoning,
indistinguishable from unset — the desktop adopted it after the first
turn, wiping its sticky "thinking off" pick so every later chat
reverted to the default effort.
- config.set key=reasoning always wrote agent.reasoning_effort to global
config.yaml, so every desktop model-menu selection (preset.effort ??
'medium') clobbered the user's configured value. Now session-scoped
like the messaging gateway's /reasoning, landing on
create_reasoning_override so lazily-built sessions keep it too.
- YAML `reasoning_effort: false`/`off`/`no` (boolean False) was coerced
to "" by every loader's `str(x or "")`, silently re-enabling thinking.
parse_reasoning_effort now treats False/"false"/"disabled" as
{"enabled": False}; loaders (tui gateway, gateway, cli, cron,
delegate) pass the raw value through. The desktop config reader also
crashed on the boolean (false.trim()), aborting voice/STT settings.
- The zai provider profile never sent thinking on the wire, and GLM-4.5+
defaults to thinking ON server-side — so disabling reasoning was a
silent no-op on direct Z.ai, the actual token burner. The profile now
emits extra_body.thinking {"type": "enabled"|"disabled"} for
thinking-capable GLM models, mirroring the DeepSeek profile.
Also: /new (session reset) now carries reasoning_config across the
rebuild like model_override; config.get reasoning prefers the session's
live value and maps a config False to "none"; Settings shows "Off"
instead of a blank select for hand-written false.
80 lines
2.7 KiB
Python
80 lines
2.7 KiB
Python
"""ZAI / GLM provider profile.
|
|
|
|
Z.AI's GLM-4.5-and-later chat models default to thinking-mode ON when the
|
|
request omits ``thinking``. Hermes' ``reasoning_config = {"enabled": False}``
|
|
was previously a silent no-op on this route — the base profile emits nothing,
|
|
so users who turned thinking off (desktop toggle, ``/reasoning none``,
|
|
``reasoning_effort: none``/``false`` in config.yaml) kept burning thinking
|
|
tokens on every turn.
|
|
|
|
:meth:`ZaiProfile.build_api_kwargs_extras` translates the Hermes reasoning
|
|
config into the wire shape Z.AI's OpenAI-compat endpoint expects:
|
|
|
|
{"extra_body": {"thinking": {"type": "enabled" | "disabled"}}}
|
|
|
|
When no reasoning preference is set (``reasoning_config is None``) the field
|
|
is omitted so the server default applies, matching prior behavior. GLM
|
|
models before 4.5 (e.g. ``glm-4-9b``) don't accept ``thinking`` and are left
|
|
untouched.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import re
|
|
from typing import Any
|
|
|
|
from providers import register_provider
|
|
from providers.base import ProviderProfile
|
|
|
|
_GLM_VERSION_RE = re.compile(r"^glm-(\d+)(?:\.(\d+))?")
|
|
|
|
|
|
def _model_supports_thinking(model: str | None) -> bool:
|
|
"""GLM thinking-capable model families: glm-4.5 and later (4.5, 4.6, 5…)."""
|
|
m = (model or "").strip().lower()
|
|
match = _GLM_VERSION_RE.match(m)
|
|
if not match:
|
|
return False
|
|
major = int(match.group(1))
|
|
minor = int(match.group(2) or 0)
|
|
return (major, minor) >= (4, 5)
|
|
|
|
|
|
class ZaiProfile(ProviderProfile):
|
|
"""Z.AI / GLM — extra_body.thinking enabled/disabled."""
|
|
|
|
def build_api_kwargs_extras(
|
|
self, *, reasoning_config: dict | None = None, model: str | None = None, **context
|
|
) -> tuple[dict[str, Any], dict[str, Any]]:
|
|
extra_body: dict[str, Any] = {}
|
|
top_level: dict[str, Any] = {}
|
|
|
|
if not _model_supports_thinking(model):
|
|
return extra_body, top_level
|
|
|
|
# Only emit when the user expressed a preference; omitting the field
|
|
# keeps the server default (enabled) exactly as before.
|
|
if isinstance(reasoning_config, dict):
|
|
enabled = reasoning_config.get("enabled") is not False
|
|
extra_body["thinking"] = {"type": "enabled" if enabled else "disabled"}
|
|
|
|
return extra_body, top_level
|
|
|
|
|
|
zai = ZaiProfile(
|
|
name="zai",
|
|
aliases=("glm", "z-ai", "z.ai", "zhipu"),
|
|
env_vars=("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
|
|
display_name="Z.AI (GLM)",
|
|
description="Z.AI / GLM — Zhipu AI models",
|
|
signup_url="https://z.ai/",
|
|
fallback_models=(
|
|
"glm-5.2",
|
|
"glm-5",
|
|
"glm-4-9b",
|
|
),
|
|
base_url="https://api.z.ai/api/paas/v4",
|
|
default_aux_model="glm-4.5-flash",
|
|
)
|
|
|
|
register_provider(zai)
|