The Java coding interview: beyond the algorithm puzzle

Java interviews have a reputation for being the most ritualized of all: inheritance trivia, HashMap internals, and a whiteboard algorithm to finish. Almost none of it predicts whether someone writes good Java in a real service. Here's what to test instead.

What actually separates strong Java engineers

Java's verbosity and maturity mean the differentiators are rarely "can they write a loop." They're:

Concurrency judgment. Java gives you real threads, and real footguns. Do they understand synchronized vs. java.util.concurrent, visibility and volatile, and why a "thread-safe" class can still be used unsafely?
API and type design. Strong typing is a tool. Do they use it — immutability, sealed types, Optional where it clarifies — or fight it?
Resource discipline. Try-with-resources, connection/thread pools, bounded queues. The senior tell is whether resources are always released, even on the error path.
Verification. JUnit fluency is baseline; the signal is whether their tests exercise the concurrency or edge case they claim to cover.

Tasks that beat the puzzle

Replace "reverse a linked list" with tasks that look like the job:

Productionize a Spring endpoint that breaks under load. The interesting bugs — the unbounded thread pool, the missing timeout, the shared mutable state — are exactly the production failures you want them to catch.
Refactor a tangled service without changing behavior. Java codebases get crufty; can they clean one up and prove nothing broke?
Review an AI-generated PR. Modern assistants write fluent, plausible Java that's subtly wrong about concurrency surprisingly often. Catching it is the skill.

Each maps to one of Probe's five task types, chosen to isolate a specific dimension of judgment rather than recall.

Interviewing Java engineers who use AI

An assistant will happily generate a @RestController, a builder, or a stream pipeline. So the writing isn't the bottleneck — the judgment is. Does the candidate notice when the generated code mishandles a checked exception, leaks a resource, or introduces a data race? Letting the assistant in and grading the judgment around it is far more predictive than banning it (see why banning AI won't fix your interview).

A starting rubric

Dimension	What strong looks like in Java
Concurrency judgment	Reaches for the right primitive; spots unsafe sharing
API / type design	Uses the type system to prevent bugs, not just satisfy the compiler
Resource discipline	Releases everything, even on the error path
Verification	Tests actually exercise the claimed behavior
Judgment with AI	Catches the assistant's subtle concurrency/resource errors

Weight to the role and cite every score. For the mechanics of building one, see how to write a technical interview rubric.