Semantically Guided Diffusion Models for Infinitely Large Toroidal Textures

Semantično vodeni difuzijski modeli za neskončno velike toroidne teksture

Mentor: izr. prof. dr. Janez Perš1, Somentor: prof. dr. Matija Marolt2, Somentor: asist. dr. Žiga Lesar2
1University of Ljubljana, Faculty of Electrical Engineering, 2University of Ljubljana, Faculty of Computer and Information Science
Master's thesis, 2026

Seamless textures of arbitrary size

A two-stage pipeline generates seamless bark textures of arbitrary size on a periodic toroidal canvas.

  • Stage 1 — Wave Function Collapse / MRF produces a low-resolution semantic map with controllable global class proportions.
  • Stage 2 — A latent diffusion model conditioned on the map synthesises the full-resolution texture on an infinite canvas with periodic boundary conditions.
MRF-generated semantic mask

Semantic mask generated by the MRF model (5120×5120).

Diffusion-synthesised bark texture

Bark texture synthesised by the diffusion model (5120×5120).

High-resolution bark texture (10240x10240)

A single seamless texture at 10240×10240 — the sampler scales to very high resolution.

Abstract

Diffusion generative models have set the state of the art in image synthesis, but they are typically trained on small crops and produce images of comparable size. Many practical applications instead require very large, periodic images: seamless textures for graphics, 360° panoramas, or planet-scale maps. Existing approaches struggle with seam artefacts at tile boundaries and a lack of global consistency, such as overly long uniform regions or unrealistic class proportions.

We address this with a two-stage pipeline. First, simple generative models — Wave Function Collapse, Markov Random Fields, autoregressive models and neural cellular automata — produce a low-resolution semantic map that combines local pattern statistics with global class proportions. Second, a latent diffusion model in the spirit of DiffInfinite synthesises a high-resolution image from the map using overlapping patch updates, with periodic indexing adapted to toroidal, cylindrical and spherical geometries.

As the main practical case study we tackle tree-log bark: a log is cylindrical along its circumference, yet in practice only part of its surface can be photographed at high resolution. From a limited set of field-captured and segmented bark images we learn a model that generates seamless toroidal and cylindrical textures with controllable distributions of bark, knots and mechanical damage.

Method

The pipeline runs in four stages, decoupling what appears in the texture (the semantic layout) from how it looks (the appearance).

1. Field capture and preprocessing

In collaboration with the Department of Forestry and Renewable Forest Resources at the Biotechnical Faculty, we captured high-resolution photos of spruce logs at several sites. A custom rotate-and-crop toolchain isolates each log from the background, while an unwrap step reprojects the visible cylindrical surface into a flat 2D image, removing the perspective distortion that would otherwise corrupt training patches.

2. Semantic segmentation

Approximately 100 hand-labelled bark images, reduced from seven to three classes (bark, knots/blind buds, mechanical damage), are used to fine-tune a DeepLabv3+ResNet50 segmenter pre-trained on ImageNet. The model produces both the masks used to train the diffusion model and the masks used to evaluate generated textures by comparing class proportions. The fine-tuned segmenter is available on the Hugging Face Hub.

3. Semantic map generation (WFC / MRF)

A Wave Function Collapse algorithm extracts local compatibility rules from the segmented bark masks and propagates them deterministically across an arbitrarily large canvas. A Markov Random Field variant offers a stochastic alternative that more closely matches global class proportions. Both produce low-resolution semantic maps that respect periodic boundary conditions, and can be edited or sketched by hand for controllable output.

4. Infinite-canvas latent diffusion

Building on Stable Diffusion and DiffInfinite, a latent diffusion model is conditioned on the semantic map via cross-attention. Sampling is performed on a large latent canvas: overlapping patches are denoised with one timestep counter per latent pixel, ensuring globally consistent denoising without seams. By indexing patch coordinates modulo the canvas dimensions, the same model produces fully periodic toroidal textures, cylindrical textures wrapping around a log, and (as future work) spherical 360° panoramas via equirectangular projection.

Seamless, Tileable Texture

A single generated bark texture:

A single generated bark texture

The same texture tiled 2×2 — it repeats seamlessly in both directions with no visible seams, confirming that the generated texture is fully tileable.

The same bark texture tiled 2x2, seamless

Training Progression

Sample evolution across checkpoints

Drag the slider to scrub through 40 checkpoints saved every 5,000 steps of training, from 5k up to 200k steps, all generated from the same semantic map.

Early checkpoint (5k steps).

Early checkpoint (5,000 steps)

Loading...

5,000 steps

Final checkpoint (200k steps).

Final checkpoint (200,000 steps)


Quantitative evaluation

TODO — to be added.

Dataset and Code

The bark image dataset, segmentation masks and trained model weights are released alongside the thesis. The dataset and trained model weights are hosted on the Hugging Face Hub; the full preprocessing, segmentation, semantic-map and training pipelines are on GitHub.

The DiffInfinite-derived diffusion code lives in a separate diffinfinite-bark fork, included as a submodule, so that upstream attribution and our modifications (P2 loss, learning-rate schedule, toroidal indexing) remain clearly visible in the commit history.

BibTeX

The thesis is not published yet — a citation will be added once it is available.