# Unified Identity, Document Hub, and the Path to an MCP Social Credit Layer

**WE CLEANER · APERSON**
**Status:** Phase 1 shipped (browser-only document hub). Phase 2–4 designed, not built.
**Last updated:** 2026-04-20

---

## 1. The problem we're solving

Commercial cleaners in NSW sign the same information onto 6–12 different forms per year:

| Event | Forms touched |
|---|---|
| Register a business | ABN, TFN, ASIC, GST |
| Get insured | Public liability quote × 3, workers comp policy |
| Every new FM contract | Subcontractor's Statement (OPT 011), SWMS, contractor onboarding |
| Every week/month | Tax invoice, BAS quarterly, super statements |
| Every time ID is requested | Driver's licence copy, ABN lookup, insurance certificate of currency |

The data inside these forms is 80–90% identical. The duplication wastes the cleaner's time, and — worse — every extra retype introduces a new opportunity for a typo, stale policy number, or missed expiry. FM companies reject contractors over clerical errors that have nothing to do with competence.

On top of that, **trust is rebuilt from scratch for every new principal contractor.** A cleaner who has flawless compliance with ISS has to re-submit the same bundle of certificates to Spotless next week. There is no portable, verifiable reputation.

## 2. The four-phase path

### Phase 1 — Document Hub (shipped)

**What it is:** A browser page (`business-documents.html`) with:
- One "Business Profile" form storing ABN, legal name, address, insurance policy numbers, workers comp status, payroll tax status, authorised contact
- Profile persisted to `localStorage` (never leaves the device unless the user generates a document)
- A one-click generator for the NSW Subcontractor's Statement (Form OPT 011) that auto-fills from the profile
- PDF output via `html2pdf.js` (client-side — no server roundtrip, no upload)
- Live preview that updates as the user types
- Profile completion meter

**Why localStorage, not Supabase, at Phase 1:**
- Legal caution — the document contains PII + tax-sensitive data, and we haven't yet done a DPIA for cloud storage of this payload
- Zero trust server-side: users see clearly that their profile stays on-device until they press "Download PDF"
- Lets us validate the UX before committing to a data model we'd have to migrate

**Why html2pdf.js instead of jsPDF-only or server-side Puppeteer:**
- Preserves exact CSS styling we already control
- No server dependency, no cold-start latency on Vercel functions
- Works offline once the page is loaded
- 500KB bundle is acceptable for a page users visit a few times per month, not per session

### Phase 2 — Cloud-synced vault (next quarter)

**What changes:**
- Add a `business_profiles` table in Supabase, keyed by `cleaner_id`, encrypted at rest using Supabase's pgsodium extension
- Opt-in toggle: "Sync my business profile to my Cleaner Card account"
- On sync, local profile → server (with the cleaner's session-authenticated `cleaner_id`)
- On another device/browser, pulling the profile down auto-populates the form
- Generated documents are stored as PDF blobs in a private Supabase Storage bucket, scoped to the cleaner
- Each document gets a SHA-256 hash recorded on the row — tamper-evident audit trail

**Schema sketch:**
```sql
CREATE TABLE business_profiles (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  cleaner_id        UUID REFERENCES cleaners(id) ON DELETE CASCADE,
  legal_name        TEXT,
  trading_name      TEXT,
  structure         TEXT,
  abn               TEXT,
  acn               TEXT,
  registered_addr   JSONB,
  contact           JSONB,           -- {name, position, phone, email}
  public_liability  JSONB,           -- {insurer, policy, cover, expiry}
  workers_comp      JSONB,           -- {status, insurer, policy, expiry}
  payroll_tax       TEXT,
  gst_registered    BOOLEAN,
  profile_hash      TEXT,            -- sha256(canonical JSON) for change tracking
  last_verified_at  TIMESTAMPTZ,     -- set when a field is externally verified
  created_at        TIMESTAMPTZ DEFAULT NOW(),
  updated_at        TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE generated_documents (
  id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  cleaner_id       UUID REFERENCES cleaners(id) ON DELETE CASCADE,
  document_type    TEXT,             -- 'subcontractor-statement', 'tax-invoice', etc.
  principal_party  JSONB,            -- FM company details
  period_start     DATE,
  period_end       DATE,
  storage_path     TEXT,             -- path in Supabase Storage
  payload_hash     TEXT,             -- SHA-256 of PDF bytes
  generated_at     TIMESTAMPTZ DEFAULT NOW()
);
```

### Phase 3 — Verified identity layer

Turn unverified fields into verified fields. For each field, define the verification source and set a `last_verified_at` when it is confirmed:

| Field | Verification source | Evidence stored |
|---|---|---|
| Mobile number | Twilio SMS OTP (already live) | `phone_verified_at` |
| Email | Magic-link sign-in or click-through | `email_verified_at` |
| ABN | Lookup against `https://abr.business.gov.au/json/AbnDetails.aspx` | ABN status + entity type returned |
| Public liability policy | Upload certificate of currency → admin review → `pl_verified_at` + expiry from doc | Signed PDF cached in storage |
| Workers comp policy | Same pattern; optionally icare API if available | Signed PDF cached in storage |
| Principal contractor receipts | Signed OPT 011 returned by FM company triggers `fm_attestation` | Countersigned PDF |

Each verified field raises the cleaner's **trust score** by a defined weight. Unverified fields decay after 90 days (insurance expires, cleaners need to re-upload).

### Phase 4 — MCP endpoint + social credit

**The MCP server.** A Model Context Protocol server that exposes the verified, non-sensitive portions of a cleaner's profile to authorised third parties (primarily FM companies).

Tools the MCP server exposes:
- `get_cleaner_trust_profile(cleaner_id)` — returns verified status fields, trust score, active cluster, tier
- `verify_subcontractor_statement(cleaner_id, contract_id)` — given a principal contractor requesting verification, return a signed JSON attestation that the cleaner's s.175B declaration is current and backed by live policy data
- `list_current_policies(cleaner_id)` — returns PL + WC policy numbers, insurer, expiry (never the PDF itself; FM requests the PDF via a separate audited endpoint)
- `request_document_pack(cleaner_id, fm_company_id, document_types[])` — pushes a request to the cleaner's inbox; on approval, delivers a bundle of signed PDFs

**What this unlocks:**
- FM onboarding goes from "email 6 PDFs, wait 3 days" → "paste cleaner's WE CLEANER ID, get a live, signed compliance bundle in 10 seconds"
- An FM's AI assistant can query this MCP during contractor vetting, e.g. "Is Jane Doe's workers comp current and does her PL cover match our ISO requirement of $20M?"
- Cleaners keep ownership of their data — every MCP query requires the cleaner's delegated consent, logged on-chain-optional (or at minimum on Supabase with an immutable audit table)

**Social credit dimension.**
The trust score is not opaque. It is an open formula, published in the user's profile:

```
trust_score =
    25 × identity_verified
  + 20 × pl_active_and_verified
  + 20 × wc_status_verified
  + 15 × subcontractor_statements_in_good_standing (last 12 months)
  + 10 × fm_attestations_positive
  +  5 × payroll_tax_clean
  +  5 × career_depth (from existing WE CLEANER scoring)
  = 0–100
```

A score over 80 becomes a badge an FM trusts on sight. A score trending down (expired PL, missed statement) triggers a private notification to the cleaner before any FM ever sees the dip.

**Why this is "social credit" and not surveillance:**
- Only the cleaner can authorise a third party to see the score
- The formula is public and the inputs are auditable by the cleaner
- Negative inputs decay (late statement a year ago should not haunt forever)
- There is no invisible peer-group ranking — just the cleaner's own documented compliance footprint

**Why MCP specifically:**
Because the LLM-native orchestration layer is where buyers (FM procurement teams, insurance partners, councils) are going. A Claude-based procurement agent that can query `wecleaner.mcp` directly bypasses the PDF-email-screenshot dance entirely. First movers in compliance-MCP space will become the default identity layer for their vertical.

## 3. What's needed to ship each phase

| Phase | Dev effort | Blockers |
|---|---|---|
| 1 (done) | — | — |
| 2 (cloud sync + storage) | ~2 weeks | Supabase storage bucket policies, client-side encryption decision, DPIA sign-off |
| 3 (verified identity) | ~4 weeks | ABN lookup integration, admin review workflow, expiry reminder system |
| 4 (MCP + trust score) | ~6 weeks | MCP server hosting, authorisation model (OAuth scopes per FM), scoring formula calibration with real partner data |

## 4. Open questions

- **Credential format** — are we shipping cleartext JSON over TLS, or do we adopt Verifiable Credentials (W3C VC spec) from day one for the attestation payload? VC adds complexity but makes the attestations portable outside WE CLEANER.
- **Chain-of-custody for PDFs** — once an FM has received a signed OPT 011, they may re-use or re-distribute it. Do we timestamp-sign each download with a per-FM watermark and log the intended recipient on the cleaner's side?
- **Revocation** — if a cleaner's PL policy is cancelled mid-contract, the cleaner needs to push a revocation notice to every FM currently relying on that attestation. This is a real-time event bus, not a polling model.
- **Pricing model for FMs** — free read of the trust score, paid for the signed attestation bundle? Or free tier for small FMs and subscription for enterprise procurement?
- **Regulatory posture** — Revenue NSW may eventually offer its own subcontractor registry. Are we complementary (fill the gap today, integrate later) or competing (lobby to become the standard integration)?

## 5. The near-term ask

Phase 1 is live today. The immediate next step is a focused 2-week sprint to ship Phase 2: cloud-sync + storage, with a tight privacy disclosure. That alone eliminates 60% of the retyping pain and proves the data model we'll extend into Phase 3.

Phase 3 should start as soon as one FM partner commits in principle to consuming the trust score output — without that demand signal, the verification work is speculative.

Phase 4 should not begin until Phase 3 has at least three FM partners piloting, because the MCP contract is the thing we least want to break once it has real agents querying it.
