Every query is scored for confidence against a growing corpus. High-confidence queries resolve on CPU. Everything else routes to GPU. The corpus builds itself — your cost floor drops automatically over time.
Routing precision and local-serve economics vary with the confidence threshold. Measured on a 48-claim benchmark spanning science, history, technology, medical, and economics domains — including adversarial cases designed to trap keyword classifiers.
| Threshold | Local Serve% routed to CPU | Routing Precisionaccuracy when CPU serves | REFUTED Recallfalse claims correctly flagged | GPU Savingsvs. all-GPU baseline |
|---|---|---|---|---|
| 0.50 | 72.9% | 68.6% | 45.0% | 79% |
| 0.55 | 72.9% | 68.6% | 45.0% | 79% |
| 0.60 | 64.6% | 71.0% | 55.0% | 75% |
| 0.65 | 50.0% | 70.8% | 65.0% | 65% |
| 0.70 deployed | 25.0% | 75.0% | 85.0% | 46% |
Why 0.70? Lower thresholds serve more queries locally but increase false-positive CPU routes — claims incorrectly marked as supported. At 0.70, routing precision reaches 75% with 85% REFUTED recall: the minimum acceptable bar for production use. As the corpus grows via the response flywheel, more claims naturally clear 0.70 — local serve rate and savings increase without touching the threshold.
Detects structural divergence between two passages — works best on texts with meaningfully different vocabulary and flow, not just different numbers in the same sentence structure.