Claude Code, Cursor, Antigravity — the tools are real, the productivity claims are mostly real, and the new skill is shaping work for an agent rather than doing it. A senior engineer's field notes on what changed and what didn't.
Agents produce code that compiles, passes tests, and is sometimes wrong in ways that no compiler or unit test catches. What an evaluation layer in CI looks like — beyond lint and test — when half the PRs were drafted by a model.