'In some iterations, coding agent put on a hat of security engineer. For instance - it created a hasMinimalEntropy function meant to "detect obviously fake keys with low character variety". I don't know why.'
Yes, you do know why. Because somewhere in its training, that functionality was linked to "quality" or "improvement". Remember what these things do at their core: really good auto-complete.
'The prompt, in all its versions, always focuses on us improving the codebase quality. It was disappointing to see how that metric is perceived by AI agent.'
Really? It's disappointing to see how that metric is perceived by humans, and the AIs are trained on things humans made. If people can't agree on "codebase quality", especially the ones who write loudly about it on the intetnet, it's going to be impossible for AI agents to agree. A better prompt actually specifying what _you_ consider to be improvements would have been so much better: perhaps minimize 3rd party deps, or minimize local utils reimplementing existing 3rd party libs, or add quality typechecks.
'The leading principle was to define a few vanity metrics and push for "more is better".'
Yeah, because this is probably the most common thing it saw in training. Programmers actually making codebase quality improvements are just quietly doing it, while the ones shouting on the internet (hence into the training data) about how their [bad] techniques [appear to] improve quality are also the ones picking vanity metrics and pushing for "more is better".
'I've prompted Claude Code to failure here'
Not really a failure: it did exactly what you asked: impoved "codebase quality" according to its training data. If you _required_ a human engineer to do the same thing 200 times, you'd get similar results as they run out of real improvements and start scouring the web for anything that anybody ever considered an "improvement", which very definitely includes vanity metrics and "more is better" regarding test count and coverage. You just showed that these AIs aren't much more than their training data. It's not actually thinking about quality, it's just barfing up things it has seen called "codebase quality improvements", regardless of the actual quality of those improvements.