/tag
Agents produce code that compiles, passes tests, and is sometimes wrong in ways that no compiler or unit test catches. What an evaluation layer in CI looks like — beyond lint and test — when half the PRs were drafted by a model.