PolDex abstains or surfaces conflicts instead of fabricating certainty. Schema-valid is not the same as correct.
An abstention on an ambiguous field is better than a confident wrong answer.
Every returned fact must be traceable to a specific location in the source document.
When a base policy and an endorsement contradict each other, both values surface with an explicit conflict, not a silently resolved value.
Benchmark documents include messy scans, unusual formatting, and edge-case endorsements — not just clean template policies.
Extracted value matches the ground truth for that field across a held-out evaluation set.
Evidence pointer (page, section, citation) correctly identifies the source of the extracted value.
Rate at which known contradictions between document sections are identified and surfaced.
Rate at which PolDex correctly returns unknown state rather than hallucinating a value for ambiguous fields.
PolDex benchmark documents include poorly OCR'd scans, mid-cycle endorsement stacks, non-standard certificate formats, and carrier-specific layouts. Not just clean template policies.
Base policy, CG forms, AI endorsements, schedule of locations, policy jacket
Fleet schedules, certificates, driver rosters, MCS-90, hired/non-owned
BPP, BI/EE, commercial building, blanket vs specific limits, co-insurance
NCCI forms, experience mod worksheets, payroll class schedules
Following-form, aggregate limits, underlying required schedules
E&O, D&O, cyber, claims-made, retro dates, consent-to-settle
Policies with 30+ endorsements where each endorsement modifies a previous one. Effective date precedence must be correctly applied.
Base policy and endorsement state different aggregate limits. PolDex must surface both values and identify which supersedes.
Real-world scanned documents with noise, rotation, incomplete OCR. PolDex operates on extracted text, not raw pixels.
Carrier-specific forms that do not follow ACORD or standard ISO structure. Field labels are inconsistent or absent.
Benchmark results are based on held-out evaluation sets annotated by domain experts. Results are reported per-field, per-document-family, and segmented by document quality tier.
PolDex does not report a single headline accuracy number. Accuracy is field-specific — aggregate limit extraction is not the same difficulty as additional insured identification.
Full benchmark methodology available to enterprise buyers under NDA.