Guide

How to write a technical interview rubric that actually works

A step-by-step guide to building a technical interview rubric — choosing dimensions, weighting them, and grounding every score in evidence so your hiring decisions are defensible.

Most "rubrics" are a number out of ten and a gut feeling wearing a lab coat. A real rubric does two jobs: it makes interviewers measure the same things, and it makes a hiring decision you can explain to the candidate, the team, and your legal counsel. Here's how to build one.

Step 1: Start from the job, not the format

Before you pick questions, list what the role actually requires in its first six months. A backend role drowning in on-call incidents needs failure-mode instinct. A greenfield product role needs decomposition and speed. Write down 4–6 capabilities that, if the hire were great at them, would make the role a success.

Those become your dimensions. Resist the urge to grade things that don't matter for this role just because they're easy to score.

Step 2: Define each dimension behaviorally

A dimension is useless if two interviewers read it differently. For each one, write what strong and weak look like in observable behavior:

  • Verification — strong: writes a test that fails before the fix and passes after; reads test output rather than trusting a green check.
  • Verification — weak: declares the bug fixed without reproducing it; writes a test that passes whether or not the fix works.

Now "7/10 on verification" means something specific, and is defensible.

Step 3: Weight to the role

Not every dimension matters equally. Assign weights that sum to 100%. A staff systems role might put 30% on tradeoff reasoning and 10% on raw code quality; a junior role might invert that. The weights encode what you actually care about — make them explicit so the composite score reflects the role, not the loudest interviewer.

Step 4: Demand evidence for every score

This is the step most teams skip, and it's the one that matters most. Every score must point to a specific moment: a line of code, a prompt, a test run, a comment. "No hire" backed by "accepted the assistant's caching layer without noticing it broke invalidation — 14:32" is a defensible decision. "No hire, felt junior" is a lawsuit and a coin flip.

Evidence-cited scoring also kills two biases at once: the halo effect (charming candidate, inflated scores) and hindsight ("I knew they were bad"). If you can't cite it, it didn't count.

Step 5: Calibrate, then trust the process

Run two interviewers against the same recorded session and compare. Where they diverge, your dimension definitions are ambiguous — fix the words, not the people. Once calibrated, the rubric should produce consistent scores across interviewers and candidates.

How Probe uses your rubric

This is exactly the model Probe runs. You define the dimensions and weights; a silent watcher AI grades each session against them and produces a scorecard where every dimension cites the transcript moment behind it. You stay in control — disagree, open the evidence, override with a click. The rubric isn't decoration; it's the contract the grader is held to.

A good rubric turns interviewing from an art into something you can audit. That's the whole point.


Put it to work: the senior Python engineer interview · the staff backend engineer interview · or read about AI-native interviews.