Benchmarking insurance extraction

Insurance extraction benchmarks need document pass rate, required-field score, exact-label score, and evidence score. One aggregate number is not enough.

Published May 10, 2026

One score hides too much

An extraction can look successful while missing a required field or attaching weak evidence. A single aggregate score can hide the exact failure that matters in production.

PolDex tracks multiple dimensions: document pass rate, required-field score, exact-label score, evidence score, corpus count, evaluated documents, blocked reason, and publication state. Each dimension answers a different reliability question.

Labels must be source-verifiable

Gold labels are only useful if they can be tied to visible source text, tables, clauses, schedules, or declarations. If the label cannot be verified against the document, it should not become benchmark truth.

This is especially important for public proof. PolDex uses real public documents for diagnostic corpora so the benchmark story can be inspected without private-customer claims.

Benchmarks become release gates

A benchmark is not just a marketing page. It is a release gate. New schema behavior should pass the current corpus before it becomes public-facing.

That turns regression testing into product discipline. Every hardened schema must keep passing as FastScript gains new readers, rules, and normalizers.

One score hides too much

Labels must be source-verifiable

Benchmarks become release gates

More from Benchmarks.

Evidence before confidence

Schema hardening is the product