Causal Explainability¶
The Causal Explainability (XAI) feature lets you understand why Geodesia flagged — or passed — a particular response. It computes token-level attribution: a score for each word in the input that indicates how much that word contributed to the detection outcome.
This is done entirely black-box — no access to the upstream model's internals, no GPU memory from the generator, no autograd. Geodesia perturbs the input and re-scores the output with its own compact detection model, which makes causal XAI available regardless of which upstream LLM you are using.
Two Attribution Methods¶
Occlusion (gradient_causal)¶
Occlusion removes one text segment at a time from the input (prompt or context) and re-scores the answer with each perturbation. The importance of a segment is proportional to the change in the detection score when that segment is removed.
- Speed: ~3–8 seconds per response (depends on the number of segments and the answer length)
- Interpretation: A high positive score means "removing this segment would have reduced the detection flag" — i.e., this segment contributed to the problem.
- Best for: Short inputs with a clear structure, or when you need a quick explanation without Monte Carlo sampling.
MuPAX (mupax_causal)¶
MuPAX (Monte Carlo Perturbation Attribution via Exclusion) draws random subsets of input segments, scores the answer with each subset, and fits a linear surrogate model over the results. This gives a statistically robust attribution via kernel SHAP.
- Speed: ~0.4–2 seconds for default settings (batched scoring)
- Interpretation: Each segment's χ (chi) value is its SHAP-style contribution to the detection score. Positive values increase the score; negative values reduce it.
- Best for: Longer inputs, more reliable attribution, interactive exploration in the UI.
- Configurable: You can adjust the number of samples (
mupax_n_samples) and the acceptance threshold (mupax_threshold_percentile) to trade speed for precision.
API Endpoint¶
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
prompt | string | ✅ | The user's prompt. Do not include the system/constitutional prompt here — it is automatically excluded from attribution. |
response / full_response | string | ✅ | The model's generated answer that you want to explain. |
context | string | — | The grounding context (RAG chunks, document text). When provided, attribution is computed over the context rather than the prompt. |
method | string | — | "gradient_causal" for occlusion (default), "mupax_causal" for Monte Carlo MuPAX. |
mupax_n_samples / mupax_samples / mc_samples | integer | — | Number of Monte Carlo samples for MuPAX. Default 200. More samples = more accurate attribution but slower. |
mupax_threshold_percentile | float | — | Acceptance threshold percentile for MuPAX sampling (0–1). Default 0.2. Lower = stricter (more samples rejected, slower). |
Response Body¶
{
"prompt": "What is the capital of France?",
"full_response": "The capital of France is Paris.",
"detection_type": "halluc_context",
"base_score": 0.12,
"xai": {
"method": "mupax_causal",
"mupax_halluc_causal": {
"detection_type": "halluc_context",
"base_score": 0.12,
"units": [
{"text": "What is", "importance": 0.02, "effect": "neutral"},
{"text": "the capital of France?", "importance": 0.48, "effect": "positive"}
],
"top_tokens": [
{"token": "capital", "position": 2, "importance": 0.48, "retention_frequency": 0.62, "conditional_goodness": 0.87},
{"token": "France", "position": 4, "importance": 0.39, "retention_frequency": 0.58, "conditional_goodness": 0.82}
],
"threshold_W": 0.14,
"n_accepted": 187,
"n_total": 200
}
}
}
Response Fields¶
| Field | Description |
|---|---|
detection_type | Which detection axis the attribution is for (e.g., halluc_context, answer_safety) |
base_score | The detection score for the full, unperturbed input (matches the score in the regular chat response) |
xai.mupax_halluc_causal | Full MuPAX attribution result (present when method="mupax_causal") |
xai.gradient_causal | Full occlusion attribution result (present when method="gradient_causal") |
Per-unit attribution (units)¶
Each entry in units represents one text segment (typically a sentence or clause):
| Field | Description |
|---|---|
text | The text of this segment |
importance | Absolute attribution score [0, 1]. Higher = more important to the detection outcome |
effect | "positive" (increased the score), "negative" (decreased it), or "neutral" |
Per-token attribution (top_tokens)¶
The top-K most important individual tokens:
| Field | Description |
|---|---|
token | The token text (e.g., a word or subword) |
position | Token position in the input sequence |
importance | χ attribution value |
retention_frequency | Proportion of samples in which this token was included in "high-score" configurations |
conditional_goodness | Mean detection score in samples where this token was retained |
Usage Example¶
curl -s -X POST http://localhost:8800/v1/glad/causal-explainability/analyze \
-H "Content-Type: application/json" \
-d '{
"prompt": "According to the document, when was the Eiffel Tower built?",
"context": "The Eiffel Tower was constructed between 1887 and 1889.",
"response": "The Eiffel Tower was built in 1885.",
"method": "mupax_causal",
"mupax_n_samples": 200
}'
This will show that the date "1885" is the most attributable token for the hallucination flag — because removing the date assertion from the answer (or the date context from the document) most strongly changes the detection score.
Configuration¶
The following gateway environment variables control XAI behaviour:
| Variable | Default | Description |
|---|---|---|
GW_XAI_MAXLEN | 512 | Maximum token length for XAI scoring passes. Lower values reduce memory usage and latency but may miss long-context attribution. |
GW_XAI_SRC_CHARS | 2400 | Maximum characters of source text (context + prompt) submitted for attribution. Longer inputs are truncated from the beginning. |
GW_ANALYZE_MIN_MAXLEN | 512 | Minimum maxlen when the gateway auto-halves on OOM. The gateway retries with half the maxlen if it hits a GPU out-of-memory error. |
In the Web UI¶
The Causal Intelligence page in the web UI provides an interactive view of token attribution:
- Select any assistant message in the chat history
- Click Compute to run attribution on that message
- Toggle between Outcome view (aggregated per-axis scores) and Token→Token view (MuPAX causal graph showing which prompt tokens caused which answer tokens)
- Adjust Samples and W% percentile to trade speed for accuracy
Causal XAI is off by default
The UI's "Causal XAI ready" toggle is off by default to avoid unexpected latency. Enable it in Settings before using the Causal Intelligence page.