the "lambda the ultimate" papers and the birth of scheme was a loong time ago, so it grates on my ears to hear this topic presented as "an optimization". Yes, it is sometimes an optimization a compiler can make, but the idea is much better presented as a useful semantic of a language.
in the same way that passing parameters to a subfunction "creates" a special set of local variables for the subfunction, the tail recursion semantic updates this set of local variables in an especially clean way for loop semantics, allowing "simultaneous assignment" from old values to new ones.
(yes, it would be confusing with side effected C/C++ operators like ++ because then you'd need to know order of evaluation or know not to do that, but those are already issues in those languages quite apart from tail recursion)
because it's the way I learned it, I tend to call the semantic "tail recursion" and the optimization "tail call elimination", but since other people don't do the same it's somewhat pointless; but I do like to crusade for awareness of the semantic beyond the optimization. If it's an optimization, you can't rely on it because you could blow the stack on large loops. If it's a semantic, you can rely on it.
(the semantic is not entirely "clean" either. it's a bit of a subtle point that you need to return straightaway the return value of the tail call or it's not a tail call. fibonacci is the sum of the current with the next so it's not a tail call unless you somewhat carefully arrange the values you pass/keep around. also worth pointing out that all "tail calls" are up for consideration, not just recursive ones)