interesting exercise and well written. my followon questions/work would be:
1a. temperature=100000 is interesting too. obviously "ideal" temperature lies somewhere between 0 and 100000. has anyone ablated temperature vs intelligence? surely i'm not the first person to this idea. commonly people try to set temp=0 to get "deterministic" or "most factual" output but we all know that is just Skinner pigeon pecking.
1b. can we use "avg temperature" as a measure in the way that we use perplexity as a measure? if we see temperature as inverted perplexity with some randomness thrown in, are they basically the same thing inverted? or subtly different?
1c. what's the "avg temperature" of most human communication? whats the "avg temperature" of a subset of "good writers"? whats the "avg temperature" of a subset of "smart writers"?
2a. rerun this negative exercise with constrained vocab to english
2b. RL a model to dynamically adjust its own temperature when it is feeling 1) less confident 2) in brainstorm mode
2c. dynamically inject negative temperature every X tokens in a decode, then judge/verify the outcome, to create high variance synthetic data?
its hard for me to follow the train of thought on 2 because negative temp is essentially not that different from ultrahigh temp in practice.