Chat API¶

The gateway exposes two chat endpoints: one in the OpenAI wire format and one in the Ollama wire format. Both accept the same extended set of parameters and return the same structure — just wrapped in the respective format.

POST /v1/chat/completions¶

OpenAI-compatible chat endpoint. Drop-in replacement for https://api.openai.com/v1/chat/completions.

Request Body¶

All standard OpenAI fields are accepted verbatim and forwarded to the upstream. Geodesia adds the following optional fields:

Standard OpenAI Fields (forwarded)¶

Field	Type	Required	Description
`messages`	`array`	✅	Conversation history. Each element has `role` (`user`, `assistant`, `system`, `tool`) and `content` (string or content-parts array).
`model`	`string`	✅	Model name. If omitted, defaults to the gateway's `upstream_model`.
`stream`	`boolean`	—	`true` for Server-Sent Events (SSE) streaming, `false` for a single JSON response. Default `true` in the gateway.
`max_tokens`	`integer`	—	Maximum tokens to generate. The gateway clamps values that exceed the model's context window rather than raising an error.
`temperature`	`float`	—	Sampling temperature (0 = deterministic, 1 = creative).
`top_p`	`float`	—	Nucleus sampling probability (0–1).
`n`	`integer`	—	Number of completions to generate. Forwarded directly to the upstream.
`stop`	`string` \| `array`	—	Stop sequences. Forwarded directly.
`frequency_penalty`	`float`	—	Forwarded directly.
`presence_penalty`	`float`	—	Forwarded directly.

Geodesia Extensions¶

Field	Type	Default	Description
`context`	`string`	`""`	Explicit grounding context. Geodesia scores the answer against this text using the faithfulness detection axis (`halluc_context`). It is also injected into the upstream request so the model can use it when answering.
`mode` / `glad_mode`	`string`	(gateway config)	Per-request enforcement mode. `"block"` withholds flagged content. `"passthrough"` returns the answer but annotates it. Overrides the gateway's configured `block_input`/`block_output` for this request only. See Enforcement Modes.
`threshold_overrides`	`object`	—	Per-axis detection probability thresholds for this request only. Example: `{"prompt_safety": 0.75, "jailbreak": 0.60}`. Supported keys: `halluc_context`, `halluc_closedbook`, `prompt_safety`, `answer_safety`, `jailbreak`. These take precedence over calibrated defaults and the gateway's configured `thresholds`.
`rag`	`object`	—	Knowledge base configuration for this request. See Knowledge Base. Fields: `collection_id` (string, required), `top_k` (int, default 5), `rerank` (bool, default true), `verify` (bool, default true), `verify_deep` (bool, default true).
`web_search`	`boolean`	`false`	When `true`, the gateway searches the live web, screens every page through the GLAD-BERT firewall, and grounds the answer in the safe pages (citing sources). Requires `GW_WEBSEARCH_ENABLED=1` (default). With a Tavily key configured it uses Tavily, otherwise the key-less DuckDuckGo fallback. See Live Web Search.
`pass_extra`	`integer`	`1`	Number of independent answer samples to draw for closed-book uncertainty estimation. Values > 1 cause the gateway to re-sample the answer `N` times at temperature 0.8 and compute consistency signals. Only applied when the upstream supports log-probabilities and the turn has no explicit context. Use sparingly — each extra sample doubles the generation cost.
`self_consistency`	`boolean`	`false`	Shorthand for `pass_extra > 1`. When `true`, the gateway draws `self_consistency_samples` extra samples.
`self_consistency_samples`	`integer`	—	Number of extra samples when `self_consistency` is `true`.

Response Body¶

The standard OpenAI response is returned unchanged, with three additional top-level fields:

{
  "id": "chatcmpl-geodesia-1718000000000",
  "object": "chat.completion",
  "model": "my-model",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "glad_decision": "passed",
  "glad_mode": "blocking",
  "geodesia": {
    "axis_energy": {
      "halluc_context": { "p_detector": 0.12, "flag": false, "threshold": 0.35, "available": true },
      "halluc_closedbook": { "p_detector": 0.08, "flag": false, "threshold": 0.50, "fact_seeking": true, "available": true },
      "prompt_safety": { "p_detector": 0.02, "flag": false, "threshold": 0.90, "available": true },
      "answer_safety": { "p_detector": 0.03, "flag": false, "threshold": 0.57, "available": true },
      "jailbreak": { "p_detector": 0.01, "flag": false, "threshold": 0.57, "available": true }
    },
    "brake": false,
    "dominant_axis": "halluc_context"
  }
}

Top-level Extension Fields¶

Field	Values	Description
`glad_decision`	`"passed"` \| `"blocked"`	Overall decision. `"blocked"` means at least one axis was flagged (the answer may still be returned in passthrough mode).
`glad_mode`	`"blocking"` \| `"passthrough"`	Enforcement mode that was applied to this response.
`glad_scores`	`{"safety_decision_rule": "axis_name"}`	Present only when `glad_decision` is `"blocked"`. Identifies which axis triggered the block.

The `geodesia` Object¶

See API Response Format for the full structure. In brief:

Field	Description
`axis_energy`	Per-axis detection results (score, flag, threshold, availability)
`brake`	`true` if any answer-region axis is flagged
`dominant_axis`	The axis with the highest detection score
`energy_dHmax_joule`	(internal diagnostic; not used in standard integrations)
`rag`	Present when RAG is active; contains `sources`, `n_sources`, and `verification` results
`input`	Present when streaming — the input-region analysis sent in the first SSE chunk

When the request is blocked¶

If blocking mode is on and an axis fires, the content field is replaced with a block notice:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "[Geodesia blocked — prompt safety (input)]"
      },
      "finish_reason": "content_filter"
    }
  ],
  "glad_decision": "blocked",
  "glad_mode": "blocking"
}

The finish_reason is "content_filter" (compatible with standard OpenAI client handling).

POST /api/chat¶

Ollama-compatible chat endpoint. Identical behaviour to /v1/chat/completions but uses Ollama's request/response wire format.

Request Body¶

{
  "model": "llama3.2",
  "stream": true,
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "context": "",
  "mode": "block",
  "threshold_overrides": {}
}

All Geodesia extension fields (context, mode, threshold_overrides, rag, pass_extra) are supported identically.

Response Body¶

{
  "model": "llama3.2",
  "message": {
    "role": "assistant",
    "content": "The capital of France is Paris."
  },
  "done": true,
  "glad_decision": "passed",
  "glad_mode": "blocking",
  "geodesia": { ... }
}

Streaming¶

When stream: true, responses are delivered as Server-Sent Events (SSE) for OpenAI format or NDJSON for Ollama format.

Early detection in streams¶

For streaming requests, Geodesia evaluates the accumulating generation every cadence_tokens tokens (default 32). If an output-region axis fires before the generation is complete:

The gateway injects a mid-stream break notice: \n\n[Geodesia: generation halted — energy barrier]
It sends the final SSE chunk with finish_reason: "content_filter" and the detection payload in the geodesia field
It sends data: [DONE] and closes the stream

This means a user may see the beginning of a response before it is cut off. If your application should never show partial flagged content, use non-streaming mode or apply client-side filtering on finish_reason.

Detection payload in streams¶

For streaming responses, the detection payload arrives in two places:

First chunk — the input analysis (prompt-region axes: prompt_safety, jailbreak) is attached to the very first empty-delta chunk so you can display safety status immediately.
Final chunk (or the early-stop chunk) — the full output analysis for all 5 axes is attached to the last chunk.

Working Examples¶

Basic request — blocking mode¶

curl -s http://localhost:8800/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "stream": false,
    "messages": [{"role":"user","content":"What is the capital of France?"}]
  }' | jq '{answer: .choices[0].message.content, decision: .glad_decision, brake: .geodesia.brake}'

Grounded request with context¶

curl -s http://localhost:8800/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "stream": false,
    "context": "The Eiffel Tower was built between 1887 and 1889 and stands 330 metres tall.",
    "messages": [{"role":"user","content":"How tall is the Eiffel Tower?"}]
  }' | jq '{answer: .choices[0].message.content, halluc_context: .geodesia.axis_energy.halluc_context}'

Passthrough mode — review what the model would have said¶

curl -s http://localhost:8800/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "stream": false,
    "mode": "passthrough",
    "messages": [{"role":"user","content":"How do I pick a lock?"}]
  }' | jq '{answer: .choices[0].message.content, decision: .glad_decision, axis: .glad_scores}'

Lower the safety threshold for a specific request¶

curl -s http://localhost:8800/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "stream": false,
    "threshold_overrides": {"prompt_safety": 0.70, "jailbreak": 0.55},
    "messages": [{"role":"user","content":"Explain a penetration test methodology."}]
  }'

Request with RAG knowledge base¶

curl -s http://localhost:8800/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "stream": false,
    "rag": {"collection_id": "my-policy-docs", "top_k": 5, "verify": true},
    "messages": [{"role":"user","content":"What is our refund policy?"}]
  }' | jq '{answer: .choices[0].message.content, sources: .geodesia.rag.sources[].title}'

Chat API¶

POST /v1/chat/completions¶

Request Body¶

Standard OpenAI Fields (forwarded)¶

Geodesia Extensions¶

Response Body¶

Top-level Extension Fields¶

The geodesia Object¶

When the request is blocked¶

POST /api/chat¶

Request Body¶

Response Body¶

Streaming¶

Early detection in streams¶

Detection payload in streams¶

Working Examples¶

Basic request — blocking mode¶

Grounded request with context¶

Passthrough mode — review what the model would have said¶

Lower the safety threshold for a specific request¶

Request with RAG knowledge base¶

The `geodesia` Object¶