It's well known, but this video[1] is a proof of concept demonstration from 4 years ago, Casey Muratori called out Microsoft's new Windows Terminal for slow performance and people argued that it wasn't possible, practical, or maintainable to make a faster terminal and that his claims of "thousands of frames per second" were hyperbolic, and one person said it would be a "PHD level research project".
In response, Casey spent <1 week making RefTerm, a skeleton proto-terminal with the same constraints Microsoft people had - using Windows APIs for things, using DirectDraw with GPU rendering, handling terminal escape codes, colours, blinking, custom fonts, missing font character fallback, line wrap, scrollback, Unicode and Right-to-Left Arabic combining characters, etc. RefTerm had 10x faster throughput than Windows Terminal and ran at 6-7000 frames per second. It was single-threaded, not profiled, not tuned, no advanced algorithms, no-cheating by sending some data to /dev/null, all it had to speed it up was simple code without tons of abstractions and a Least Recently Used (LRU) glyph cache to avoid re-rendering common characters, written the first way that he thought of. Around that time he did a video series on that YouTube channel about optimization and arguing that even talking about 'optimization' was too hopeful, we should be talking about 'not-pessimization', that most software is not slow because it has unavoidable complexity and abstractions needed to help maintenance, it's slow because it's choked by a big pile of do-nothing code and abstraction layers added for ideological reasons which hurt maintenance as well as performance.
[1] https://www.youtube.com/watch?v=hxM8QmyZXtg - "How fast should an unoptimized terminal run?"
This video[2] is another specific details one, Jason Booth talking about his experience of game development, and practical examples of changing data layout and C++ code to make it do less work, be more cache friendly, have better memory access patterns, and run orders of magnitude faster without adding much complexity and sometimes removing complexity.
[2] https://www.youtube.com/watch?v=NAVbI1HIzCE - "Practical Optimizations"