Question 1

Doesn't letting candidates use AI make it impossible to tell who's good?

Accepted Answer

The opposite. The assistant amplifies whoever is driving it. A candidate who leans on AI without judgment falls apart on a realistic task faster than on a clean-room puzzle — they prompt their way into corners, miss the assistant's mistakes, and can't explain their choices. Strong engineers stand out more clearly, not less.

Question 2

How is the candidate scored if the AI did some of the work?

Accepted Answer

A second AI — a silent watcher — observes the whole session (diff, prompt history, test runs) and grades it against the dimensions you define: decomposition, verification, judgment with the assistant, and so on. Every score cites a specific moment in the transcript, so it measures the candidate's judgment, not the assistant's output.

Question 3

What kinds of tasks does Probe use?

Accepted Answer

Five coding task types — productionize working-but-flawed code, build a feature in an unfamiliar codebase, refactor while preserving behavior, review an AI-written PR, and an open-ended build — plus an optional behavioral round. Each is chosen to surface a different kind of judgment. Python, Java, and C++ are supported today.

Question 4

Is it live or asynchronous?

Accepted Answer

Asynchronous. Candidates open a link and work on their own schedule — no installs, no scheduling, no live proctor. You get an evidence-cited scorecard within minutes of submission and review it whenever it's convenient.

Engineers work with AI now. Your interview should too.

How it works

What it measures that a whiteboard can't

Common questions

Doesn't letting candidates use AI make it impossible to tell who's good?

How is the candidate scored if the AI did some of the work?

What kinds of tasks does Probe use?

Is it live or asynchronous?