Vetting protocol

What the badge actually asserts.

Five pillars, weighted. Each pillar is scored independently on the L0–L4 scale by two reviewers, with a calibration discussion on any divergence greater than one level. The pillar categories, methods, and pass bars are public. The internal scoring keys are not. Because the moment scoring keys are public, applicants train to them and the rubric stops measuring anything.

P-01

Technical screen

weight · 25%

Method

Two senior engineers independently tear down a code sample of the agency's choice and a representative client repo. Borderline files receive a live technical interview with a third reviewer.

What we score for

Architectural judgment under real constraints
Code review hygiene. PR conventions, comment density, test coverage at the boundary
Operational maturity. Deploy story, observability, incident response
Honest scoping. What was deferred and why

Evidence captured

Repo samples (with NDA), reviewer notes, per-criterion scoresheet, calibration discussion if scores diverge.

Red flags

Inability to produce a representative repo. Test coverage that ends at the controller. PRs without review.

P-02

Past-work review

weight · 20%

Method

Two redacted case studies are reviewed for the shape of the engagement, not the screenshots. We look for retention, scope evolution, and how the agency handled the moment something went wrong.

What we score for

Multi-year retention with at least one named client
A documented incident or scope shock and the written outcome
Shipped artifacts traceable to the agency's contribution
Realistic claims. Outcomes the buyer would confirm

Evidence captured

Case study packets, shipped-artifact links, retention data, score.

Red flags

Logo wall without case studies. Outcomes the named client would not corroborate. Single-engagement portfolios.

P-03

Reference checks

weight · 20%

Method

Three structured calls with prior clients of the agency's choice. Standardized question set; recorded with consent; transcribed and retained. We pull at least one reference from each of: in-flight, recently closed, and concluded over a year ago.

What we score for

Consistency between the agency's narrative and the client's recollection
Specifics on communications, escalation, and milestone behavior
Willingness to re-engage on the next project
Honest accounts of what didn't go well

Evidence captured

Reference contacts, transcribed responses, scored against the rubric, cross-checked against the case study set.

Red flags

Refusal to provide an old reference. References that hedge on willingness to re-engage. Glowing reviews with no specifics.

P-04

Communications assessment

weight · 20%

Method

An async writing sample (a real expectation-setting memo or escalation note) plus a 30-minute live call with the principal. Tested for clarity, responsiveness, and the specific muscle of saying hard things in writing.

What we score for

Concise, unambiguous writing in English
Ability to set expectations without softening them into vagueness
Live presence. Active listening, summarization, clean follow-ups
Recovery posture when challenged

Evidence captured

Writing sample, call notes, scored response to a scripted prompt.

Red flags

Memo that buries the bad news. Live call dominated by junior staff with no decision rights. AI-generated writing without disclosure.

P-05

Security & process baseline

weight · 15%

Method

A structured baseline questionnaire covering data handling, access controls, SDLC maturity, IP and contract hygiene, and incident response. Material claims are spot-checked against supporting artifacts.

What we score for

Least-privilege access in source and infra
A written incident response runbook that names a human
Subcontractor disclosure and IP assignment chain
Background-check posture appropriate to client tier

Evidence captured

Checklist responses, supporting documents (policies, runbooks), spot-check notes.

Red flags

Shared admin accounts. No written incident response. IP assignments that don't follow the labor chain.

Revocable, by design

The mark you can lose is the mark worth holding.

QA flags accumulate from client complaints, missed SLAs, and security events. Severe events trigger immediate suspension; the verification page and directory update within 15 minutes of a revocation decision. Published revocations are recorded in the changelog without retraction; appeals are reviewed by a partner who did not work the file.

Apply for vetting →Read the full rubric What the badge means