Engineers work with AI now. Your interview should too.
Probe runs realistic engineering tasks with the AI assistant turned on, while a second AI silently grades the work against your rubric. You stop policing the tool and start measuring judgment.
Every candidate in your pipeline already has an AI assistant open in another tab. The traditional technical interview pretends otherwise — and in doing so, measures the wrong thing. Whiteboards test recall. Take-homes test patience and have no visibility into who actually wrote the code. Both assume an absence of tools that will never occur again after the offer.
An AI-native technical interview assumes the assistant is present, because in the job it always will be. The interview becomes a test of the thing you actually wanted to measure: can this person decompose a real problem, prompt well, and notice when their assistant is confidently wrong?
How it works
You write the rubric. Probe runs the interview. The candidate works in a real editor with a real AI assistant they can ask anything — while a silent watcher AI observes everything and grades against the dimensions you care about.
- A session you shape. Pick the task type, the language, the rounds, and the rubric dimensions and weights.
- Real work, real tools. A real editor, real compilation, real test runs, and the AI assistant the candidate would actually use on the job — no simulation.
- A silent watcher. A second AI sees the diff, the prompts, and the test runs. It never talks to the candidate; it builds an evidence-cited score and flags the moments that matter.
- A scorecard you can defend. Every recommendation traces back to a verbatim transcript moment. Disagree? Override it with one click.
What it measures that a whiteboard can't
The signal lives in the process: prompt quality, verification discipline, and recovery when the assistant goes down a wrong path. A final diff hides all of it. The watcher captures the whole event stream, so you see how someone thinks with the tools they'll really use — not how well they memorized an algorithm. For the why behind this, read why banning AI won't fix your interview.
Common questions
Doesn't letting candidates use AI make it impossible to tell who's good?
The opposite. The assistant amplifies whoever is driving it. A candidate who leans on AI without judgment falls apart on a realistic task faster than on a clean-room puzzle — they prompt their way into corners, miss the assistant's mistakes, and can't explain their choices. Strong engineers stand out more clearly, not less.
How is the candidate scored if the AI did some of the work?
A second AI — a silent watcher — observes the whole session (diff, prompt history, test runs) and grades it against the dimensions you define: decomposition, verification, judgment with the assistant, and so on. Every score cites a specific moment in the transcript, so it measures the candidate's judgment, not the assistant's output.
What kinds of tasks does Probe use?
Five coding task types — productionize working-but-flawed code, build a feature in an unfamiliar codebase, refactor while preserving behavior, review an AI-written PR, and an open-ended build — plus an optional behavioral round. Each is chosen to surface a different kind of judgment. Python, Java, and C++ are supported today.
Is it live or asynchronous?
Asynchronous. Candidates open a link and work on their own schedule — no installs, no scheduling, no live proctor. You get an evidence-cited scorecard within minutes of submission and review it whenever it's convenient.
Run your first AI-native interview this week.
Set up in under five minutes. No installs for candidates.
Get started