Skip to content

Evaluate Endpoint¶

The evaluate endpoint generates a response from the loaded model and scores it in a single call. It is the core API for direct batch evaluation workflows where you supply a prompt and want both the generated answer and its full detection scores in one HTTP round-trip.

POST /glad/evaluate¶

Request Body (`EvaluateRequest`)¶

Field	Type	Required	Default	Description
`model_path`	`string`	✅	—	Absolute path to the model checkpoint directory. Relative paths are resolved against the working directory.
`prompt`	`string`	✅	—	The user's input prompt. Minimum length: 1 character.
`context`	`string`	—	`null`	Optional grounding context. When provided, the `halluc_context` (faithfulness) axis scores the answer against this text.
`generation_config`	`object`	—	See below	Parameters that control how the model generates the answer.
`session_id`	`string`	—	auto-generated	Logical grouping for a conversation. Multiple calls with the same `session_id` are linked in the audit trail.
`system_prompt_text`	`string`	—	`null`	The exact system or constitutional prompt prepended at generation time. Used by the MuPAX explainability engine to exclude those tokens from attribution, so explanations cover only user text and the generated answer.
`explain`	`boolean`	—	`false`	When `true` and the server has explainability enabled, compute per-token MuPAX attribution and include it in the response under `xai.mupax_halluc` / `xai.mupax_safety`.
`explain_mode`	`string`	—	`"standard"`	`"standard"` returns χ attribution per answer token. `"causal"` additionally computes a token→token causal matrix for the highest-importance flagged answer token — answering "which prompt tokens caused this specific answer token?"
`credit_tiers`	`array[string]`	—	`["mupax"]` when `explain=true`	Which attribution methods to run. Options: `"gradient"` (fast, deterministic), `"learned"` (learned attribution head if present), `"mupax"` (statistically robust Monte Carlo), `"pss"` (Positional Semantic Stability — consistency-based, ~N× generation cost).
`pss_n_samples`	`integer`	—	`5`	Number of extra generation samples for PSS attribution. Range 2–16. Each extra sample costs roughly one generation pass.
`pss_temperature`	`float`	—	`0.7`	Sampling temperature for PSS resamples. Must be > 0.
`pss_match_mode`	`string`	—	`"ngram"`	PSS alignment algorithm. Options: `"ngram"` (combines n-gram containment + entity strict match), `"strict"` (exact surface match), `"fuzzy"` (Levenshtein distance), `"entity"` (entity-only), `"claim"` (sentence-level bidirectional).
`threshold_overrides`	`object`	—	`null`	Runtime threshold overrides in probability space. Supported keys: `prompt_safety`, `answer_safety`, `halluc`, `combined_halluc`.
`bypass_prompt_block_for_generation`	`boolean`	—	`false`	When `true`, an unsafe-prompt detection does not abort generation. The model still generates a response, all scores are computed, but no content is withheld. Used by the "passthrough" review mode.
`enable_edfl_isr`	`boolean`	—	`false`	Experimental. Compute the EDFL/ISR sufficiency gate for evidence-grounded binary questions. Only applies when an evidence list is available through the agent-flow layer. Requires 3–12× more computation.

`generation_config` Fields¶

Field	Type	Default	Description
`max_new_tokens`	`integer`	`160`	Maximum tokens to generate.
`temperature`	`float`	`0.7`	Sampling temperature.
`top_p`	`float`	`0.9`	Nucleus sampling probability.
`do_sample`	`boolean`	`true`	Whether to use sampling. When `false`, generation is greedy (deterministic).

Response Body (`EvaluateResponse`)¶

{
  "session_id": "sess_abc123",
  "call_id": "call_def456",
  "timestamp": "2026-06-10T14:23:00.123Z",
  "prompt_blocked": false,
  "response": "The capital of France is Paris.",
  "block_reason": null,
  "context_truncated": false,
  "safety": { ... },
  "hallucination": { ... },
  "model_metadata": { ... },
  "generation_diagnostics": { ... },
  "xai": null,
  "reasoning_trace": { ... }
}

Field	Description
`session_id`	Session identifier (provided or auto-generated)
`call_id`	Unique identifier for this specific call
`timestamp`	ISO 8601 timestamp
`prompt_blocked`	`true` if the prompt was blocked before generation
`response`	The model's generated answer. `null` if `prompt_blocked` is `true`.
`block_reason`	Human-readable reason for blocking, or `null`
`context_truncated`	`true` if the provided context was truncated to fit the model's context window
`safety`	Safety detection results. See below.
`hallucination`	Hallucination detection results. See below.
`model_metadata`	Checkpoint information
`generation_diagnostics`	Timing and token count diagnostics
`xai`	Explainability results when `explain=true`. Contains `mupax_halluc`, `mupax_safety`, `mupax_halluc_causal` depending on `explain_mode` and `credit_tiers`.
`reasoning_trace`	Internal scoring trace for debugging

`safety` Object¶

Field	Description
`prompt_unsafe_logit`	Raw logit score for prompt unsafety (uncalibrated)
`prompt_unsafe_score`	Probability [0, 1] for prompt unsafety
`answer_unsafe_logit`	Raw logit score for answer unsafety (uncalibrated)
`answer_unsafe_score`	Probability [0, 1] for answer unsafety
`decision`	`"safe"`, `"unsafe"`, or `"blocked"`
`decision_rule`	Which rule triggered the decision
`threshold_used`	The detection threshold that was applied
`combined_answer_safety_score`	Calibrated combined safety score (0–1). This is the production score used for decisions.
`combined_answer_safety_threshold`	The threshold for the combined score
`combined_answer_safety_triggered`	Whether the combined threshold was exceeded
`combined_answer_safety_per_signal`	Per-signal breakdown. Each signal shows `raw_logit`, `zscore`, `weight`, `contribution`.

`hallucination` Object¶

Field	Description
`hallucination_score`	Primary hallucination probability [0, 1]
`decision`	`"grounded"` or `"hallucinated"`
`threshold_used`	The threshold applied
`context_provided`	Whether context was provided (affects which axes run)
`combined_halluc_score`	Calibrated combined hallucination score (0–1). This is the production score.
`combined_halluc_threshold`	The threshold for the combined score
`combined_halluc_triggered`	Whether the combined threshold was exceeded
`combined_halluc_per_signal`	Per-signal breakdown with contributions from all 10 internal signals. Each: `raw_logit`, `zscore`, `weight`, `contribution`.
`combined_halluc_n_signals`	Number of signals included in the combined score
`nsp_commission_score`	Narrative Semantic Probe — commission error score
`nsp_coverage_score`	NSP — coverage score
`nsp_assertiveness_score`	NSP — assertiveness score
`drift_score`	Contextual drift score across generation layers
`closed_book_fabrication_score`	Closed-book fabrication probability (when context is absent)
`closed_book_fabrication_reason`	Human-readable reason for the closed-book score
`edfl_isr`	EDFL/ISR sufficiency gate result (when `enable_edfl_isr=true`). Fields: `isr`, `delta_bar`, `decision` (`answer`/`abstain`/`disabled`).

GET /glad/finetune/status/{job_id}¶

Returns the status of an active or completed fine-tuning job.

curl http://localhost:8199/glad/finetune/status/job_abc123

{
  "job_id": "job_abc123",
  "status": "running",
  "progress": 0.34,
  "current_step": 340,
  "total_steps": 1000,
  "last_loss": 0.0423
}

Field	Description
`status`	`"queued"`, `"running"`, `"completed"`, or `"failed"`
`progress`	Fraction completed (0–1), present only while `"running"`
`current_step`	Current training step
`total_steps`	Total steps in the job
`last_loss`	Most recent training loss
`output_path`	Path to the output checkpoint (when `"completed"`)
`error`	Error message (when `"failed"`)

POST /glad/export_audit¶

Exports an audit bundle for one or more inference sessions.

Request Body (`ExportAuditRequest`)¶

Field	Type	Required	Description
`session_id`	`string`	⚠️ One of these two	Export all calls from this session.
`call_ids`	`array[string]`	⚠️ One of these two	Export a specific list of call IDs.
`client_info`	`object`	—	Deployer information for the report cover page: `company_name`, `system_name`, `deployment_date`, `responsible_person`, `contact_email`.
`regulatory_framework`	`array[string]`	—	List of regulatory frameworks to include in the audit report. Defaults to `["EU_AI_ACT"]`. See Supported Laws for valid codes.
`include_raw_scores`	`boolean`	—	Whether to include raw numeric scores in the export. Default `false`.
`include_compliance_bundle`	`boolean`	—	Whether to include the full compliance artifact bundle (hash chain, watermarks, FRIA reference). Default `true`.
`compliance_context`	`object`	—	Additional metadata to include in compliance reports.
`output_path`	`string`	—	Absolute file path to write the export. If omitted, the file is written to a temp directory and returned as a file download.