Introduction
set is one of the most beautiful mathematical objects ever discovered, a fractal so intricate that no matter how much you zoom in, you keep finding infinite detail. But what if we asked a neural network to learn it?
At first glance, this sounds like an odd question. The Mandelbrot set is fully deterministic, there’s no data, no noise, no hidden rules. But this simplicity makes it a perfect sandbox to study how neural networks represent complex functions.
In this article, we’ll explore how a simple neural network can learn to approximate the Mandelbrot set, and how Gaussian Fourier Features completely transform its performance, turning blurry approximations into sharp fractal boundaries.
Along the way, we’ll dive into why vanilla multi-layer perceptrons (MLPs) struggle with high-frequency patterns (the spectral bias problem) and how Fourier features solve it.
The Mandelbrot Set
The Mandelbrot set is defined over the complex plane. For each complex number \(c\in \mathbb{C}\), we consider the iterative sequence:
$$z_{n+1} = z_n^2 + c, z_0 = 0$$
If this sequence remains bounded, then \(c\) belongs to the Mandelbrot set.
In practice, we approximate this condition using the escape-time algorithm. The escape-time algorithm iterates the sequence up to a fixed maximum number of steps and monitors the magnitude \(|z_n|\). If \(|z_n|\) exceeds a chosen escape radius (typically 2), the sequence is guaranteed to diverge, and \(c\) is classified as outside of the mandelbrot set. If the sequence does not escape within the maximum number of iterations, \(c\) is assumed to belong to the Mandelbrot set, with the iteration count often used for visualization purposes.
Turning the Mandelbrot Set into a Learning Problem
To train our neural network, we need two things. First, we must define a learning problem, that is, what the model should predict and from what inputs. Second, we need labeled data: a large collection of input-output pairs drawn from that problem.
Defining the Learning Problem
At its core, the Mandelbrot set defines a function over the complex plane. Each point \(c = x +iy \in \mathbb{C}\) is mapped to an outcome: either the sequence generated by the iteration remains bounded, or it diverges. This inmediately suggests a binary classification problem, where the input is a complex number and the output indicates whether the number is inside the set or not.
However, this formulation poses difficulties for learning. The boundary of the Mandelbrot set is infinitely complex, and arbitrarily small perturbations in \(c\) can change the classification outcome. From a learning perspective, this results in a highly discontinuous target function, making optimization unstable and data-inefficient.
To obtain a smoother and more informative learning target, we instead reformulate the problem using the escape-time information introduced in the previous section. Rather than predicting a binary label, the model is trained to predict a continuous variable derived from the iteration count at which the sequence escapes.
To produce a continuous target function we do not use the raw escape iteration count directly. The raw escape iteration count is a discrete quantity, using this would introduce discontinuities, particularly near the boundary of the Mandelbrot set, where small changes in \(c\) can cause large jumps in the iteration count. To address this, we employ a smooth escape-time value, which incorporates the escape iteration count to produce a continuous target. In addition, we also apply a logarithmic scaling that spreads early escape values and compresses larger ones, yielding a more balanced target distribution.
def smooth_escape(x: float, y: float, max_iter: int = 1000) -> float:
c = complex(x, y)
z = 0j
for n in range(max_iter):
z = z*z + c
r2 = z.real*z.real + z.imag*z.imag
if r2 > 4.0:
r = math.sqrt(r2)
mu = n + 1 - math.log(math.log(r)) / math.log(2.0) # smooth
# log-scale to spread small mu
v = math.log1p(mu) / math.log1p(max_iter)
return float(np.clip(v, 0.0, 1.0))
return 1.0With this definition, the Mandelbrot set becomes a regression problem. The neural network is trained to approximate a function $$f : \mathbb{R}^2 \rightarrow [0,1]$$
mapping spatial coordinates \((x, y)\) in the complex plane to a smooth escape-time value.
Data Sampling Strategy
A uniform sampling of the complex plane would be highly inefficient, most points lie far from the boundary and carry little information. To address this, the dataset is biased towards boundary regions by oversampling and filtering points whose escape values fall within a predefined band.
def build_boundary_biased_dataset(
n_total=800_000,
frac_boundary=0.7,
xlim=(-2.4, 1.0),
res_for_ylim=(3840, 2160),
ycenter=0.0,
max_iter=1000,
band=(0.35, 0.95),
seed=0,
):
"""
- Mix of uniform samples + boundary-band samples.
- 'band' selects points with target in (low, high), which tends to concentrate near boundary.
"""
rng = np.random.default_rng(seed)
ylim = compute_ylim_from_x(xlim, res_for_ylim, ycenter=ycenter)
n_boundary = int(n_total * frac_boundary)
n_uniform = n_total - n_boundary
# Uniform set
Xu = sample_uniform(n_uniform, xlim, ylim, seed=seed)
# Boundary pool: oversample, then filter by band
pool_factor = 20
pool = sample_uniform(n_boundary * pool_factor, xlim, ylim, seed=seed + 1)
yp = np.empty((pool.shape[0],), dtype=np.float32)
for i, (x, y) in enumerate(pool):
yp[i] = smooth_escape(float(x), float(y), max_iter=max_iter)
mask = (yp > band[0]) & (yp < band[1])
Xb = pool[mask]
yb = yp[mask]
if len(Xb) < n_boundary:
# If band too strict, relax it automatically
keep = min(len(Xb), n_boundary)
print(f"[warn] Boundary band too strict; got {len(Xb)} boundary points, using {keep}.")
Xb = Xb[:keep]
yb = yb[:keep]
n_boundary = keep
n_uniform = n_total - n_boundary
Xu = sample_uniform(n_uniform, xlim, ylim, seed=seed)
else:
Xb = Xb[:n_boundary]
yb = yb[:n_boundary]
yu = np.empty((Xu.shape[0],), dtype=np.float32)
for i, (x, y) in enumerate(Xu):
yu[i] = smooth_escape(float(x), float(y), max_iter=max_iter)
X = np.concatenate([Xu, Xb], axis=0).astype(np.float32)
y = np.concatenate([yu, yb], axis=0).astype(np.float32)
# Shuffle once
perm = rng.permutation(X.shape[0])
return X[perm], y[perm], ylimBaseline Model: A Deep Residual MLP
Our first attempt uses a deep residual MLP that takes raw Cartesian coordinates \((x, y)\) as input and predicts the smooth escape value.
# Baseline model
class MLPRes(nn.Module):
def __init__(
self,
hidden_dim=256,
num_blocks=8,
act="silu",
dropout=0.0,
out_dim=1,
):
super().__init__()
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
self.in_proj = nn.Linear(2 , hidden_dim)
self.in_act = activation()
self.blocks = nn.Sequential(*[
ResidualBlock(hidden_dim, act=act, dropout=dropout)
for _ in range(num_blocks)
])
self.out_ln = nn.LayerNorm(hidden_dim)
self.out_act = activation()
self.out_proj = nn.Linear(hidden_dim, out_dim)
def forward(self, x):
x = self.in_proj(x)
x = self.in_act(x)
x = self.blocks(x)
x = self.out_act(self.out_ln(x))
return self.out_proj(x)# Residual block
class ResidualBlock(nn.Module):
def __init__(self, dim: int, act: str = "silu", dropout: float = 0.0):
super().__init__()
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
# pre-norm-ish (LayerNorm helps a lot for stability with deep residual MLPs)
self.ln1 = nn.LayerNorm(dim)
self.fc1 = nn.Linear(dim, dim)
self.ln2 = nn.LayerNorm(dim)
self.fc2 = nn.Linear(dim, dim)
self.act = activation()
self.drop = nn.Dropout(dropout) if dropout and dropout > 0 else nn.Identity()
# optional: small init for the last layer to start near-identity
nn.init.zeros_(self.fc2.weight)
nn.init.zeros_(self.fc2.bias)
def forward(self, x):
h = self.ln1(x)
h = self.act(self.fc1(h))
h = self.drop(h)
h = self.ln2(h)
h = self.fc2(h)
return x + hThis network has ample capacity: deep, residual, and trained on a large dataset with stable optimization.
Result
The global shape of the Mandelbrot set is clearly recognizable. However, fine details near the boundary are visibly blurred. Regions that should exhibit intricate fractal structure appear overly smooth, and thin filaments are either poorly defined or entirely missing.
This is not a matter of resolution, data, or depth. So what is going wrong?
The Spectral Bias Problem
Neural networks have a well-known problem called spectral bias:
they tend to learn low-frequency functions first, and struggle to represent functions with rapid oscillations or fine detail.
The Mandelbrot boundary is dominated by highly irregular, filled with small-scale structures, especially near its boundary. To capture it, the network would need to represent very high-frequency variations in the output as \(x\) and \(y\) change.
Fourier Features: Encoding Coordinates in Frequency Space
One of the most elegant solutions to the spectral bias problem was introduced in 2020 by Tancik et al. in their paper Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains.
The idea is to transform the input coordinates before feeding them into the neural network. Instead of giving the raw \((x, y)\), we pass their sinusoidal projection otno random directions in a higher-dimensional space.
Formally:
$$\gamma(x)=[sin(2 \pi Bx),cos(2 \ pi Bx)]$$
where \(B \in \mathbb{R}^{d_{in}×d_{feat}}\) is a random Gaussian matrix.
This mapping acts like a random Fourier basis expansion, letting the network more easily represent high-frequency details.
class GaussianFourierFeatures(nn.Module):
def __init__(self, in_dim=2, num_feats=256, sigma=5.0):
super().__init__()
B = torch.randn(in_dim, num_feats) * sigma
self.register_buffer("B", B)
def forward(self, x):
proj = (2 * torch.pi) * (x @ self.B)
return torch.cat([torch.sin(proj), torch.cos(proj)], dim=-1)Multi-Scale Gaussian Fourier Features
A single frequency scale might not be sufficient. The Mandelbrot set exhibits structure at all resolutions (a distinctive feature of fractal geometry).
To capture this, we use multi-scale Gaussian Fourier features, combining several frequency bands:
class MultiScaleGaussianFourierFeatures(nn.Module):
def __init__(self, in_dim=2, num_feats=512, sigmas=(2.0, 6.0, 10.0), seed=0):
super().__init__()
# split features across scales
k = len(sigmas)
per = [num_feats // k] * k
per[0] += num_feats - sum(per)
Bs = []
g = torch.Generator()
g.manual_seed(seed)
for s, m in zip(sigmas, per):
B = torch.randn(in_dim, m, generator=g) * s
Bs.append(B)
self.register_buffer("B", torch.cat(Bs, dim=1))
def forward(self, x):
proj = (2 * torch.pi) * (x @ self.B)
return torch.cat([torch.sin(proj), torch.cos(proj)], dim=-1)This effectively provides the network with a multi-resolution frequency basis, perfectly aligned with the self-similar nature of fractals.
Final Model
The final model has the same architecture as the baseline model, the only difference is that it uses the MultiScaleGaussianFourierFeatures.
class MLPFourierRes(nn.Module):
def __init__(
self,
num_feats=256,
sigma=5.0,
hidden_dim=256,
num_blocks=8,
act="silu",
dropout=0.0,
out_dim=1,
):
super().__init__()
self.ff = MultiScaleGaussianFourierFeatures(
2,
num_feats=num_feats,
sigmas=(2.0, 6.0, sigma),
seed=0
)
self.in_proj = nn.Linear(2 * num_feats, hidden_dim)
self.blocks = nn.Sequential(*[
ResidualBlock(hidden_dim, act=act, dropout=dropout)
for _ in range(num_blocks)
])
self.out_ln = nn.LayerNorm(hidden_dim)
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
self.out_act = activation()
self.out_proj = nn.Linear(hidden_dim, out_dim)
def forward(self, x):
x = self.ff(x)
x = self.in_proj(x)
x = self.blocks(x)
x = self.out_act(self.out_ln(x))
return self.out_proj(x)Training Dynamics
Training Without Fourier Features
The model slowly learns the overall shape of the Mandelbrot set, but then plateaus. Additional training fails to add more detail.
Training With Fourier Features
Here the coarse structures appear first, followed by progressively finer details. The model continues to refine its prediction instead of plateauing.
Final Results
Both models used the same architecture, dataset, and training procedure. The network is a deep residual multilayer perceptron (MLP) trained as a regression model on the smooth escape-time formulation.
- Dataset: 1.000.000 samples from the complex plane, with 70% of points concentrated near the fractal boundary thanks to the biased data sampling.
- Architecture: Residual MLP with 20 residual blocks, and a hidden dimension of 512 units.
- Activation Function: SiLU
- Training: 100 epochs, batch size 4096, Adam-based optimizer, Cosine Annealing scheduler.
The only difference between the two models is the representation of the input coordinates. The baseline model uses the raw Cartesian coordinates, while the second model uses the multi-scale Fourier Features for the representation.
Global View


Zoom 1 View


Zoom 2 View


Conclusions
Fractals such as the Mandelbrot set are an extreme example of functions dominated by high-frecuency structures. Approximating them directly from raw coordinates forces neural networks to synthesize increasingly detailed oscillations, a task for which MLPs are poorly suited.
What this article shows is that the limitation is not architectural capacity, data volume, or optimization. It is representation.
By encoding input coordinates with multi-scale Gaussian Fourier Features, we shift much of the problem’s complexity into the input space. High-frequency structure becomes explicit, allowing an otherwise ordinary neural network to approximate a function that would be too complex in its original form.
This idea extends beyond fractals. Coordinate-based neural networks are used in computer graphics, physics-informed learning, and singal processing. In all of these settings, the choice of input encoding can be the difference between smooth approximations and rich, high-detailed structure.
Note on visual assets
All images, animations, and videos shown in this article were generated by the author from the outputs of the neural network models described above. No external fractal renderers or third-party visual assets were used. The full code used for training, rendering images, and generating animations is available in the accompanying repository.



