HN Reader

Is There a Half-Life for the Success Rates of AI Agents?

113

This very much aligns with my experience — I had a case yesterday where opus was trying to do something with a library, and it encountered a build error. Rather than fix the error, it decided to switch to another library. It then encountered another error and decided to switch back to the first library.

I don’t think I’ve encountered a case where I’ve just let the LLM churn for more than a few minutes and gotten a good result. If it doesn’t solve an issue on the first or second pass, it seems to rapidly start making things up, make totally unrelated changes claiming they’ll fix the issue, or trying the same thing over and over.

8 hours agoby mikeocool

This was always my mental model. If you have a process with N steps where your probability of getting a step right is p, your chance of success is pᶰ, or 0 as N → ∞.

It affects people too. Something I learned halfway through a theoretical physics PhD in the 1990s was that a 50-page paper with a complex calculation almost certainly had a serious mistake in it that you'd find if you went over it line-by-line.

I thought I could counter that by building a set of unit tests and integration tests around the calculation and on one level that worked, but in the end my calculation never got published outside my thesis because our formulation of the problem turned a topological circle into a helix and we had no idea how to compute the associated topological factor.

8 hours agoby PaulHoule

The amusing things LLMs do when they have been at a problem for some time and cannot fix it:

- Removing problematic tests altogether

- Making up libs

- Providing a stub and asking you to fill in the code

7 hours agoby prmph

So as the space for possible decisions increases, it increases the likelihood of models to end up with bad "decisions". And what is the correlation between the increase in "survival rate" and the increase in model parameters, compute power and memory (context)?

7 hours agoby einrealist

I don't think this has anything to do with AI. There's a half life for success rates.

7 hours agoby __MatrixMan__

This is another reason why there’s no point in carefully constructing prompts and contexts trying to coax the right solution out of an LLM. The end result becomes more brittle with time.

If you can’t zero shot your way to success the LLM simply doesn’t have enough training for your problem and you need a human touch or slightly different trigger words. There have been times where I’ve gotten a solution with such a minimal prompt it practically feels like the LLM read my mind, that’s the vibe.

8 hours agoby deadbabe

another article on xyz problem with LLMs, which will probably be solved by model advancements in 6/12 months.

8 hours agoby ldjkfkdsjnv