Syntax, Struggle, and Expertise

I strike with words that will pierce through your heart by RetnaI

For decades, the path to programming expertise ran through syntax struggle—writing, breaking, debugging, repeat. LLMs just eliminated that grind. Junior devs now learn by reviewing AI-generated code instead of writing from scratch.

Can this work?

A follow-up to Review-First Programming

For decades, most developers built expertise through syntax struggle—writing, breaking, fixing, repeat. LLMs just eliminated that pipeline. We don't know what replaces it.

What Syntax Actually Did

When developers spent hours debugging why their list comprehension failed, they weren't just learning bracket placement. For most developers, the syntax struggle forced:

  • Deep pattern internalization: Recognizing when to think transformationally versus iteratively
  • Performance intuition: Understanding memory implications of different constructs
  • Error diagnosis: Building mental models of how things break
  • Idiom fluency: Absorbing community conventions through repeated exposure

Syntax wasn't measuring these skills from outside; it was the friction that typically built them. For most developers, you couldn't separate "learning Python syntax" from "learning to think in Python" because the struggle was the primary learning mechanism.

What LLMs Change

LLMs don't just reduce syntax knowledge; they eliminate the training pipeline that syntax struggle provided. A developer who prompts their way to list comprehensions might never develop the pattern recognition that came from writing (and breaking) them hundreds of times.

This creates a legitimately open question: Can review-first development build the same expertise that write-first development did?

The Optimistic Case

Film directors who can't operate cameras still develop visual judgment through repeated review. Editors who struggle with blank pages still learn narrative structure through evaluating others' writing. Maybe reviewing thousands of LLM-generated functions builds better pattern recognition than writing hundreds manually.

The cognitive load shift might even help: instead of juggling syntax details while learning architecture, developers can focus purely on architectural judgment while syntax is handled externally.

The Language Layer Complication

There's another possibility: the expertise never lived in syntax at all; rather, it lived in the ability to articulate intent clearly.

When you write if x is None vs if x == None, the real knowledge isn't muscle memory of Python operators. It's the English-encoded concept "check identity, not equality." That understanding lives in comments, variable names, test descriptions, PR discussions—the English layer wrapping the code.

This suggests the training pipeline isn't broken, just inverted:

Old path: Struggle with syntax → build mental models → encode understanding in syntax muscle memory

New path: Articulate understanding in English prompts → verify LLM translated correctly → build mental models through mismatch detection

If this is true, LLMs might actually be better teachers because they force more explicit articulation. When you tell an LLM "check identity, not equality," you're being clearer than when you write is from habit. The prompt is self-documenting; the syntax was always ambiguous.

This expertise is also more transferable. "Check identity, not equality" works in Python, JavaScript, and Go. The syntax knowledge was always language-specific.

But there's a catch: this only works if you already know to distinguish identity from equality. The question remains whether you can learn these conceptual distinctions through prompting and review or whether you need to discover them through the friction of making the wrong choice and debugging the consequences.

The Pessimistic Case

Of course, there's a difference between directors who trained as cinematographers before moving to review, and directors who only ever reviewed. The expertise might require the struggle to develop initially, even if it's later expressed through review rather than generation.

We're running an uncontrolled experiment on an entire generation of developers. Some will build deep expertise through prompt-review cycles. Others will plateau at surface-level pattern matching, lacking the foundational understanding that comes from direct syntax struggle.

Where Syntax Still Matters Intensely

Even if review-first development works, certain contexts still require direct syntax knowledge:

Security-critical code: When reviewing authentication logic, knowing that Python's is checks identity while == checks equality isn't optional. The LLM might generate syntactically perfect but semantically dangerous code.

Performance optimization: Understanding that list comprehensions create new lists while generator expressions stream values matters when processing gigabytes of data. The syntax choice encodes critical performance characteristics.

Debugging corrupted output: Anthropic's September 2025 postmortem described infrastructure bugs that corrupted Claude's tokens—right syntax, wrong characters. Catching these required developers who knew what valid code should look like.

Legacy system integration: When interfacing with code written before LLMs, you need to read and understand syntax that no LLM will touch. The expertise can't be delegated.

This creates a paradox: syntax knowledge becomes simultaneously less necessary (for generation) and more critical (for verification)—but also much rarer. Like assembly programming today: rarely needed but irreplaceable when required.

The Research Reality

Recent studies show where "perfect syntax" fails in LLM output:

A November 2024 analysis of GPT-4 and Gemini found seven categories of non-syntactic errors—mistakes that occur even when syntax is flawless:

  • Conditional logic errors: Missing edge cases despite perfect if-statement syntax
  • Semantic misalignment: Implementing addition when asked for XOR operations
  • Mathematical errors: Calculating averages as (n+m+1)//2 instead of (n+m)/2
  • Reference errors: Calling methods that don't exist with perfect call syntax
  • API misuse: Using deprecated methods (February 2025) like Azure's old OpenAIClient instead of AzureOpenAIClient

A September 2024 study analyzing 12,837 errors found AssertionError as the most common failure—code that compiles, runs, and gives wrong results.

What's striking: GPT-4, despite having the highest overall accuracy, showed the largest deviations when wrong. When it makes mistakes, those mistakes require substantial logical revisions, not syntax tweaks.

Catching these requires exactly the expertise that syntax struggle used to build: deep understanding of what code should do, not just what it does do.

The Honest Uncertainty

We're in the middle of a transition we don't fully understand:

The old path (write → struggle → learn → master) built expertise reliably but slowly.

The new path (prompt → review → refine → ???) builds something quickly, but we don't yet know if that something is the same expertise or a different capability entirely.

Maybe review-first development creates better architects who think in systems rather than implementations. Maybe it creates developers who can't debug deeply because they never built that muscle memory. Probably both, depending on how developers use the tools. But we won't know the final form until we see what the developers who learned only through LLM review can and can't do compared to those who came up through syntax struggle.

Connecting to Review-First Programming

This explains both the power and the risk of review-first programming. LLMs excel at syntax generation, freeing developers to focus on architecture and verification. But that focus requires expertise that may have previously required syntax struggle to build.

Or maybe not. If expertise actually lives in the English layer (the ability to articulate "check identity, not equality" rather than reflexively typing is) then LLMs might be forcing developers to build clearer conceptual models from the start.

You're not worse at programming because you can't write boilerplate from memory. But you might also not be building the pattern recognition that syntax struggle provided. Or you might be building something better: language-agnostic conceptual clarity instead of language-specific muscle memory.

The question isn't whether syntax matters anymore. It's whether we can build programming expertise through explicit articulation and verification rather than through implicit struggle and discovery. And whether "explicit from the start" produces the same depth as "discovered through pain, then made explicit."

We're finding out.