How to interview engineers who use AI (without getting fooled)

If you've decided to stop banning AI in your interviews, the next question is the practical one: how do you actually run the thing? An assistant that can one-shot a FizzBuzz can also one-shot a naive version of most screening questions. So your question design, and what you choose to watch, have to change.

Here's the playbook.

1. Pick tasks where the first answer is a trap

The assistant will happily produce a solution. Your job is to choose problems where the obvious solution is subtly wrong — and a thoughtful engineer notices.

Good shapes:

Productionize working-but-flawed code. Hand them something that passes the happy path and quietly breaks on concurrency, large inputs, or a malformed request. The assistant rarely volunteers the failure mode; the candidate has to go looking.
Build a feature in an unfamiliar codebase. Now the bottleneck isn't writing code, it's comprehension — reading enough of the system to make the assistant useful.
Review an AI-written pull request. Put the candidate on the other side of the tool. Can they catch what a plausible-looking diff gets wrong?

These are exactly the five task types Probe is built around, for this reason.

2. Watch the process, not just the artifact

Final code tells you what they shipped. The path tells you who they are. The signals worth capturing:

Prompt quality. Vague, kitchen-sink prompts vs. precise, well-scoped ones. The good ones treat the assistant like a sharp junior, not a search box.
Verification behavior. Do they run the tests? Write new ones? Read the output, or just trust a green checkmark?
Recovery. When the assistant goes down a wrong path, do they notice and correct, or follow it off the cliff?

You can't reconstruct this from a final diff. You need the event stream — edits, prompts, runs, in order.

3. Separate the interviewer from the grader

Here's a subtle trap: if the same person is talking to the candidate and judging them, the conversation contaminates the judgment. You remember the candidate who was charming, not the one who was correct.

Probe splits these. The candidate works with their assistant; a silent watcher grades the session against your rubric without ever interrupting. No interviewer-in-the-loop bias, no "they seemed smart" halo. Just evidence-cited scoring tied to what actually happened.

4. Score judgment, with citations

Define your dimensions up front — decomposition, verification, code quality, communication, whatever matters for the role — and weight them. Then insist every score points at a moment: "Accepted the assistant's caching layer without noticing it broke invalidation — see 14:32." A rubric without citations is just a vibe with a number attached.

A quick checklist

Before you run an AI-allowed loop, confirm you can answer yes to all of these:

The task has a non-obvious failure mode the assistant won't surface for them.
You're capturing prompts, edits, and test runs — not just the final submission.
Scoring is separated from the live conversation.
Every dimension on the rubric can be cited to a transcript moment.

Get those right and the assistant stops being a threat to your signal. It becomes the thing that reveals it.

Next: see how the silent watcher turns a messy session into a defensible scorecard, or browse the interview guides for role-specific rubrics.