Mm. This paper makes it hard to understand what they've done.
For example:
> MultiDiffusion remains confined to bounded domains: all windows must lie within a fixed finite canvas, limiting its applicability to unbounded worlds or continuously streamed environments.
> We introduce InfiniteDiffusion, an extension of MultiDiffusion that lifts this constraint. By reformulating the sampling process to operate over an effectively infinite domain, InfiniteDiffusion supports seamless, consistent generation at scale.
…but:
> The hierarchy begins with a coarse planetary model, which generates the basic structure of the world from a rough, procedural or user-provided layout. The next stage is the core latent diffusion model, which transforms that structure into realistic 46km tiles in latent space. Finally, a consistency decoder expands these latents into a high-fidelity elevation map.
So, the novel thing here is slightly better seemless diffusion image gen.
…but, we generate using a heirsrchy based on a procedural layout.
So basocally, tldr: take perlin noise, resize it, and then image-2-image use it as a seed to generate detailed tiles?
People have already been doing this.
Its not novel.
The novel part here is making the detailed tiles slightly nicer.
Eh. :shrug:
The paper obfuscates this, quite annoyingly.
Its unclear to me why you cant just use multi diffusion for this, given your top level input is already bounded (eg. User input) and not infinite.