Thanks for sharing; you clearly spent a lot of time making this easy to digest. I especially like the tokens-to-embedding visualisation.
I recently had some trouble converting a HF transformer I trained with PyTorch to Core ML. I just couldn’t get the KV cache to work, which made it unusably slow after 50 tokens…