The problem is not reading text
Modern models can read a policy, summarize a clause, and pull numbers out of a PDF. That is useful, but it is not enough for insurance operations. The dangerous part is deciding what the number means and whether it should become customer-facing data.
A commercial policy packet can contain a declarations page, forms list, endorsements, schedules, certificates, requirements, and prior-term material. Each source may state a different limit, date, named insured, or endorsement status. A reader can collect candidates. A truth layer has to decide what survives.
Why generic extraction fails quietly
A generic extraction engine is optimized to complete the task. It will usually try to return a value even when the document is ambiguous. That behavior looks impressive in a demo, but it is risky when a wrong insured name, missing endorsement, or incorrect limit can enter a broker, carrier, claims, or compliance workflow.
PolDex treats completion as a lower priority than defensibility. If the evidence is weak, the field should be unresolved. If two sources disagree, the conflict should be preserved. If the schema is not hardened, the system should abstain from pretending it has production-grade accuracy.
The PolDex boundary
PolDex is not trying to own the insurance workflow. It is the extraction infrastructure behind the workflow. Developers call the API, operators can use the processor, and agents can use MCP, CLI, OpenAPI, and discovery files. The same FastScript-controlled result powers every surface.
That boundary matters. It lets PolDex plug into existing systems without forcing customers into a new dashboard. It also keeps the product focused on insurance document truth: schema contracts, evidence, conflict states, abstention, benchmarks, and exports.