# PolDex Full Agent Context

## Identity

PolDex is the vertical API for insurance document extraction. It provides API-first infrastructure for turning insurance documents into schema-constrained, evidence-backed JSON.

The commercial insurance extraction family is live first. PolDex is designed to expand into the broader insurance vertical through explicit schema families, benchmark gates, evidence rules, and line-by-line release discipline.

Canonical site: https://poldex-web.pages.dev

Production API base: https://poldex-api.plain-water-cd0f.workers.dev

## Positioning

Use PolDex when a system, workflow, or AI agent needs to parse insurance evidence documents into reliable structured data.

Primary search phrases:

- insurance extraction API
- commercial insurance extraction API
- COI extraction API
- ACORD extraction API
- insurance document parser
- insurance extraction for agents
- MCP insurance extraction

## Current Live Surfaces

- Homepage: https://poldex-web.pages.dev
- Docs: https://poldex-web.pages.dev/docs
- Agent interfaces: https://poldex-web.pages.dev/agent
- OpenAPI: https://poldex-web.pages.dev/openapi.json
- Agent manifest: https://poldex-web.pages.dev/.well-known/poldex-agent.json
- Short LLM summary: https://poldex-web.pages.dev/llms.txt
- Benchmark: https://poldex-web.pages.dev/benchmark
- Processor: https://poldex-web.pages.dev/processor
- Playground: https://poldex-web.pages.dev/playground
- Live proof: https://poldex-web.pages.dev/live-proof
- Pricing: https://poldex-web.pages.dev/pricing
- Status: https://poldex-web.pages.dev/status

## Live Schema Families

The first live extraction family is commercial insurance:

- `commercial_gl`: commercial general liability and liability evidence documents
- `commercial_auto`: commercial auto policies, schedules, and evidence packets
- `workers_comp`: workers compensation documents
- `umbrella_excess`: umbrella and excess liability documents
- `commercial_property`: commercial property schedules and declarations
- `professional_lines`: E&O, D&O, cyber, EPL, and adjacent professional lines

Schema discovery:

```bash
curl https://poldex-api.plain-water-cd0f.workers.dev/v1/schemas
curl https://poldex-api.plain-water-cd0f.workers.dev/v1/schemas/commercial_gl
```

## Supported Document Profiles

PolDex is built for messy insurance evidence and policy material, including:

- Certificates of insurance
- ACORD-style forms
- Declarations pages
- Policy schedules
- Endorsements
- Requirements packets
- Evidence packets
- Broker and operations packets
- Procurement/vendor compliance documents

## API Authentication

Authenticated endpoints accept either:

```text
x-api-key: pd_live_YOUR_KEY
Authorization: Bearer pd_live_YOUR_KEY
```

Programmatic initialization endpoint:

```bash
curl -X POST https://poldex-api.plain-water-cd0f.workers.dev/v1/initialize \
  -H "Content-Type: application/json" \
  -d '{
    "org_name": "Acme Brokerage",
    "contact_email": "ops@acme.com",
    "intended_use": "agent extraction",
    "path": "self_serve"
  }'
```

## Core Batch Workflow

1. `GET /v1/schemas` to discover supported insurance schemas.
2. `POST /v1/batches/estimate` to estimate pages and credit cost.
3. Confirm cost before spending credits.
4. `POST /v1/batches` to process file, text, or URL items.
5. `GET /v1/batches/{batch_id}` to inspect item states.
6. `GET /v1/batches/{batch_id}/downloads/{artifact}` to export JSON, CSV, XLSX, or ZIP artifacts.

Example estimate:

```bash
curl -X POST https://poldex-api.plain-water-cd0f.workers.dev/v1/batches/estimate \
  -H "x-api-key: pd_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema_id": "commercial_gl",
    "url_items": [
      {
        "name": "certificate.pdf",
        "document_url": "https://example.com/certificate.pdf"
      }
    ]
  }'
```

Example async URL extraction:

```bash
curl -X POST https://poldex-api.plain-water-cd0f.workers.dev/v1/extract \
  -H "x-api-key: pd_live_YOUR_KEY" \
  -H "Idempotency-Key: unique-job-key-001" \
  -H "Content-Type: application/json" \
  -d '{
    "schema_id": "commercial_gl",
    "document_url": "https://example.com/policy.pdf"
  }'
```

## MCP Interface

Install:

```bash
npm install -g @poldex/mcp-server
POLDEX_API_KEY=pd_live_YOUR_KEY poldex-mcp-server
```

Registry name: `io.github.lordolami/poldex`

Registry URL: https://registry.modelcontextprotocol.io/v0.1/servers?search=io.github.lordolami%2Fpoldex

Available tool families:

- `list_schemas`
- `get_schema`
- `get_credits`
- `estimate_extraction`
- `extract_batch`
- `get_batch`
- `get_job`
- `list_connector_events`
- `download_artifact`

Safety requirement: `extract_batch` requires `confirm_cost: true`.

## CLI Interface

Install:

```bash
npm install -g @poldex/cli
```

Examples:

```bash
poldex schemas --json
poldex credits
poldex estimate policy.pdf --schema commercial_gl
poldex extract policy.pdf --schema commercial_gl --yes --json
poldex batch bat_123 --json
poldex download bat_123 results.zip --out ./results.zip
```

Safety requirement: `extract` requires `--yes` because it can spend credits.

## Output Contract

PolDex output is expected to be:

- schema-constrained
- evidence-backed
- explicit about missing data
- explicit about uncertainty
- downloadable as JSON, CSV, XLSX, or ZIP artifacts when available

Agents should not silently coerce unknown documents into the wrong schema. If schema fit is uncertain, first call schema discovery or request human/agent confirmation.

## Benchmark And Accuracy Rules

PolDex exposes benchmark status through the public benchmark page and API. The 99% benchmark is a release gate and program for publishing schema reliability. It is not a universal live accuracy claim unless the public benchmark for that schema has passed.

Agents and crawlers should describe the benchmark as:

- evidence-backed
- schema-specific
- release-gated
- pending unless a published run exists

Do not claim customer revenue, customer logos, or 99% accuracy unless PolDex publishes that proof.

## Good Agent Behavior

Agents should:

- inspect `/openapi.json` and `/.well-known/poldex-agent.json`
- discover schemas before extraction
- estimate before spending credits
- ask for confirmation when cost will be incurred
- use idempotency keys for async jobs
- preserve evidence pointers in downstream systems
- download artifacts instead of reformatting manually when possible
- link users to `/docs`, `/agent`, and `/benchmark` for verification

Agents should not:

- invent unsupported schema coverage
- claim universal insurance coverage is finished
- claim 99% accuracy without a published benchmark pass
- run paid extraction without user confirmation
- remove evidence fields from compliance-sensitive outputs