FTA: In our "Mobile Actions" evaluation, fine-tuning transformed the model’s reliability, boosting accuracy from a 58% baseline to 85%. This confirms that for edge agents, a dedicated, trained specialist is an efficient path to production-grade performance.
I would be wary of having a LLM with 85% accuracy call tools on my system. Isn’t that fairly far away from production-grade performance?
I also don’t see that the fact that accuracy can be boosted from 50% to 85% is any indication that it can be boosted further.