Interviewing a senior Python engineer is not interviewing a junior one with harder questions. The thing you're testing is judgment: can this person own an ambiguous problem, make defensible tradeoffs, and keep a codebase healthy under real constraints? Syntax trivia and algorithm puzzles tell you almost none of that.
This guide covers what to actually measure, and how.
What "senior" means in Python specifically
Python's strengths (readability, batteries-included, huge ecosystem) and its sharp edges (the GIL, dynamic typing, dependency sprawl, runtime surprises) shape what senior looks like:
- Knows where the dynamism bites. Comfortable with type hints and
mypy, understands when duck typing helps and when it hides bugs. - Reasons about concurrency honestly. Knows the difference between I/O-bound (where
asyncio/threads help) and CPU-bound (where the GIL means you reach for multiprocessing or a native extension). - Treats dependencies as liabilities. Pins versions, understands the supply-chain surface, doesn't
pip installtheir way out of every problem. - Writes tests that test something.
pytestfluency is table stakes; the senior signal is whether their tests would actually catch the regression they claim to prevent.
Questions that surface real signal
Skip the brain-teasers. Use tasks that look like the job:
- "Here's a working endpoint that falls over under load. Make it production-ready." Watch whether they find the N+1 query, the unbounded cache, the missing timeout — not whether they can recite Big-O.
- "Add a feature to this unfamiliar service." Comprehension is the senior bottleneck. Can they read enough of the system to extend it safely?
- "Review this AI-generated PR." Plausible, green-tests-passing Python that's subtly wrong is everywhere now. Catching it is a core senior skill.
These map directly to the five task types Probe ships with — chosen because each isolates a different dimension of judgment.
Interviewing Python engineers who use AI
Your senior Python hire will use an AI assistant on day one. The interview should reflect that. An assistant can scaffold a Flask route or a Pydantic model instantly — so the writing isn't the test. The test is whether they catch the assistant's mistakes: the mutable default argument, the race in the "thread-safe" cache, the test that passes whether or not the bug is fixed.
That's the case for letting the assistant in and grading the judgment around it. (More on that in why banning AI won't fix your interview.)
A starting rubric
Weight these to the role, then insist every score cites a moment in the transcript:
| Dimension | What strong looks like |
|---|---|
| Decomposition | Breaks the ambiguous task into the right pieces before coding |
| Verification | Tests the actual failure mode, reads output, doesn't trust green |
| Code quality | Idiomatic, typed where it matters, no hidden tradeoffs |
| Judgment with AI | Catches the assistant's subtle errors, prompts precisely |
| Communication | Can explain why, not just what |
If you want a deeper treatment of building the rubric itself, see how to write a technical interview rubric.
Hiring across the stack? See the staff backend engineer interview guide, or the Java coding interview guide.