Skip to content
Vetting protocol

What the badge actually asserts.

Five pillars, weighted. Each pillar is scored independently on the L0–L4 scale by two reviewers, with a calibration discussion on any divergence greater than one level. The pillar categories, methods, and pass bars are public. The internal scoring keys are not. Because the moment scoring keys are public, applicants train to them and the rubric stops measuring anything.

P-01
Technical screen
weight · 25%
Method

Two senior engineers independently tear down a code sample of the agency's choice and a representative client repo. Borderline files receive a live technical interview with a third reviewer.

What we score for
  • Architectural judgment under real constraints
  • Code review hygiene. PR conventions, comment density, test coverage at the boundary
  • Operational maturity. Deploy story, observability, incident response
  • Honest scoping. What was deferred and why
Evidence captured

Repo samples (with NDA), reviewer notes, per-criterion scoresheet, calibration discussion if scores diverge.

Red flags

Inability to produce a representative repo. Test coverage that ends at the controller. PRs without review.

P-02
Past-work review
weight · 20%
Method

Two redacted case studies are reviewed for the shape of the engagement, not the screenshots. We look for retention, scope evolution, and how the agency handled the moment something went wrong.

What we score for
  • Multi-year retention with at least one named client
  • A documented incident or scope shock and the written outcome
  • Shipped artifacts traceable to the agency's contribution
  • Realistic claims. Outcomes the buyer would confirm
Evidence captured

Case study packets, shipped-artifact links, retention data, score.

Red flags

Logo wall without case studies. Outcomes the named client would not corroborate. Single-engagement portfolios.

P-03
Reference checks
weight · 20%
Method

Three structured calls with prior clients of the agency's choice. Standardized question set; recorded with consent; transcribed and retained. We pull at least one reference from each of: in-flight, recently closed, and concluded over a year ago.

What we score for
  • Consistency between the agency's narrative and the client's recollection
  • Specifics on communications, escalation, and milestone behavior
  • Willingness to re-engage on the next project
  • Honest accounts of what didn't go well
Evidence captured

Reference contacts, transcribed responses, scored against the rubric, cross-checked against the case study set.

Red flags

Refusal to provide an old reference. References that hedge on willingness to re-engage. Glowing reviews with no specifics.

P-04
Communications assessment
weight · 20%
Method

An async writing sample (a real expectation-setting memo or escalation note) plus a 30-minute live call with the principal. Tested for clarity, responsiveness, and the specific muscle of saying hard things in writing.

What we score for
  • Concise, unambiguous writing in English
  • Ability to set expectations without softening them into vagueness
  • Live presence. Active listening, summarization, clean follow-ups
  • Recovery posture when challenged
Evidence captured

Writing sample, call notes, scored response to a scripted prompt.

Red flags

Memo that buries the bad news. Live call dominated by junior staff with no decision rights. AI-generated writing without disclosure.

P-05
Security & process baseline
weight · 15%
Method

A structured baseline questionnaire covering data handling, access controls, SDLC maturity, IP and contract hygiene, and incident response. Material claims are spot-checked against supporting artifacts.

What we score for
  • Least-privilege access in source and infra
  • A written incident response runbook that names a human
  • Subcontractor disclosure and IP assignment chain
  • Background-check posture appropriate to client tier
Evidence captured

Checklist responses, supporting documents (policies, runbooks), spot-check notes.

Red flags

Shared admin accounts. No written incident response. IP assignments that don't follow the labor chain.

Revocable, by design

The mark you can lose is the mark worth holding.

QA flags accumulate from client complaints, missed SLAs, and security events. Severe events trigger immediate suspension; the verification page and directory update within 15 minutes of a revocation decision. Published revocations are recorded in the changelog without retraction; appeals are reviewed by a partner who did not work the file.