Could this perform better by having the internal representation of Minecraft instead of raw pixels?
It seems rather tenuous to keep pounding on 'training via pixels' when really a game's 2D/3D output is an optical trick at best.
I understand Sergey Brin/et al had a grandiose goal for DeepMind via their Atari games challenge - but why not try alternate methods - say build/tweak games to be RL-friendly? (like MuJoCo but for games)
I don't see the pixel-based approach being as applicable to the practical real world as say when software divulges its direct, internal state to the agent instead of having to fake-render to a significantly larger buffer.
I understand Dreamer-like work is a great research area and one that will garner lots of citations for sure.