Teaching A Neural Network The Mandelbrot Set

Introduction

set is one of the most beautiful mathematical objects ever discovered, a fractal so intricate that no matter how much you zoom in, you keep finding infinite detail. But what if we asked a neural network to learn it?

At first glance, this sounds like an odd question. The Mandelbrot set is fully deterministic, there’s no data, no noise, no hidden rules. But this simplicity makes it a perfect sandbox to study how neural networks represent complex functions.

In this article, we’ll explore how a simple neural network can learn to approximate the Mandelbrot set, and how Gaussian Fourier Features completely transform its performance, turning blurry approximations into sharp fractal boundaries.

Along the way, we’ll dive into why vanilla multi-layer perceptrons (MLPs) struggle with high-frequency patterns (the spectral bias problem) and how Fourier features solve it.

The animation above shows a neural network learning the Mandelbrot set

The Mandelbrot Set

The Mandelbrot set is defined over the complex plane. For each complex number $c\in \mathbb{C}$, we consider the iterative sequence:

$$z_{n+1} = z_n^2 + c, z_0 = 0$$

If this sequence remains bounded, then $c$ belongs to the Mandelbrot set.

In practice, we approximate this condition using the escape-time algorithm. The escape-time algorithm iterates the sequence up to a fixed maximum number of steps and monitors the magnitude $|z_n|$. If $|z_n|$ exceeds a chosen escape radius (typically 2), the sequence is guaranteed to diverge, and $c$ is classified as outside of the mandelbrot set. If the sequence does not escape within the maximum number of iterations, $c$ is assumed to belong to the Mandelbrot set, with the iteration count often used for visualization purposes.

Turning the Mandelbrot Set into a Learning Problem

To train our neural network, we need two things. First, we must define a learning problem, that is, what the model should predict and from what inputs. Second, we need labeled data: a large collection of input-output pairs drawn from that problem.

Defining the Learning Problem

At its core, the Mandelbrot set defines a function over the complex plane. Each point $c = x +iy \in \mathbb{C}$ is mapped to an outcome: either the sequence generated by the iteration remains bounded, or it diverges. This inmediately suggests a binary classification problem, where the input is a complex number and the output indicates whether the number is inside the set or not.

However, this formulation poses difficulties for learning. The boundary of the Mandelbrot set is infinitely complex, and arbitrarily small perturbations in $c$ can change the classification outcome. From a learning perspective, this results in a highly discontinuous target function, making optimization unstable and data-inefficient.

To obtain a smoother and more informative learning target, we instead reformulate the problem using the escape-time information introduced in the previous section. Rather than predicting a binary label, the model is trained to predict a continuous variable derived from the iteration count at which the sequence escapes.

To produce a continuous target function we do not use the raw escape iteration count directly. The raw escape iteration count is a discrete quantity, using this would introduce discontinuities, particularly near the boundary of the Mandelbrot set, where small changes in $c$ can cause large jumps in the iteration count. To address this, we employ a smooth escape-time value, which incorporates the escape iteration count to produce a continuous target. In addition, we also apply a logarithmic scaling that spreads early escape values and compresses larger ones, yielding a more balanced target distribution.

def smooth_escape(x: float, y: float, max_iter: int = 1000) -> float:
    c = complex(x, y)
    z = 0j
    for n in range(max_iter):
        z = z*z + c
        r2 = z.real*z.real + z.imag*z.imag
        if r2 > 4.0:
            r = math.sqrt(r2)
            mu = n + 1 - math.log(math.log(r)) / math.log(2.0)  # smooth
            # log-scale to spread small mu
            v = math.log1p(mu) / math.log1p(max_iter)
            return float(np.clip(v, 0.0, 1.0))
    return 1.0

With this definition, the Mandelbrot set becomes a regression problem. The neural network is trained to approximate a function $$f : \mathbb{R}^2 \rightarrow [0,1]$$

mapping spatial coordinates $(x, y)$ in the complex plane to a smooth escape-time value.

Data Sampling Strategy

A uniform sampling of the complex plane would be highly inefficient, most points lie far from the boundary and carry little information. To address this, the dataset is biased towards boundary regions by oversampling and filtering points whose escape values fall within a predefined band.

def build_boundary_biased_dataset(
    n_total=800_000,
    frac_boundary=0.7,
    xlim=(-2.4, 1.0),
    res_for_ylim=(3840, 2160),
    ycenter=0.0,
    max_iter=1000,
    band=(0.35, 0.95),
    seed=0,
):
    """
    - Mix of uniform samples + boundary-band samples.
    - 'band' selects points with target in (low, high), which tends to concentrate near boundary.
    """
    rng = np.random.default_rng(seed)
    ylim = compute_ylim_from_x(xlim, res_for_ylim, ycenter=ycenter)

    n_boundary = int(n_total * frac_boundary)
    n_uniform  = n_total - n_boundary

    # Uniform set
    Xu = sample_uniform(n_uniform, xlim, ylim, seed=seed)

    # Boundary pool: oversample, then filter by band
    pool_factor = 20
    pool = sample_uniform(n_boundary * pool_factor, xlim, ylim, seed=seed + 1)

    yp = np.empty((pool.shape[0],), dtype=np.float32)
    for i, (x, y) in enumerate(pool):
        yp[i] = smooth_escape(float(x), float(y), max_iter=max_iter)

    mask = (yp > band[0]) & (yp < band[1])
    Xb = pool[mask]
    yb = yp[mask]

    if len(Xb) < n_boundary:
        # If band too strict, relax it automatically
        keep = min(len(Xb), n_boundary)
        print(f"[warn] Boundary band too strict; got {len(Xb)} boundary points, using {keep}.")
        Xb = Xb[:keep]
        yb = yb[:keep]
        n_boundary = keep
        n_uniform = n_total - n_boundary
        Xu = sample_uniform(n_uniform, xlim, ylim, seed=seed)

    else:
        Xb = Xb[:n_boundary]
        yb = yb[:n_boundary]

    yu = np.empty((Xu.shape[0],), dtype=np.float32)
    for i, (x, y) in enumerate(Xu):
        yu[i] = smooth_escape(float(x), float(y), max_iter=max_iter)

    X = np.concatenate([Xu, Xb], axis=0).astype(np.float32)
    y = np.concatenate([yu, yb], axis=0).astype(np.float32)

    # Shuffle once
    perm = rng.permutation(X.shape[0])
    return X[perm], y[perm], ylim

Baseline Model: A Deep Residual MLP

Our first attempt uses a deep residual MLP that takes raw Cartesian coordinates $(x, y)$ as input and predicts the smooth escape value.

# Baseline model
class MLPRes(nn.Module):
    def __init__(
        self,
        hidden_dim=256,
        num_blocks=8,
        act="silu",
        dropout=0.0,
        out_dim=1,
    ):
        super().__init__()
        activation = nn.ReLU if act.lower() == "relu" else nn.SiLU

        self.in_proj = nn.Linear(2 , hidden_dim)
        self.in_act = activation()

        self.blocks = nn.Sequential(*[
            ResidualBlock(hidden_dim, act=act, dropout=dropout)
            for _ in range(num_blocks)
        ])

        self.out_ln = nn.LayerNorm(hidden_dim)
        self.out_act = activation()

        self.out_proj = nn.Linear(hidden_dim, out_dim)

    def forward(self, x):
        x = self.in_proj(x)
        x = self.in_act(x)
        x = self.blocks(x)
        x = self.out_act(self.out_ln(x))
        return self.out_proj(x)

# Residual block
class ResidualBlock(nn.Module):
    def __init__(self, dim: int, act: str = "silu", dropout: float = 0.0):
        super().__init__()
        activation = nn.ReLU if act.lower() == "relu" else nn.SiLU

        # pre-norm-ish (LayerNorm helps a lot for stability with deep residual MLPs)
        self.ln1 = nn.LayerNorm(dim)
        self.fc1 = nn.Linear(dim, dim)

        self.ln2 = nn.LayerNorm(dim)
        self.fc2 = nn.Linear(dim, dim)

        self.act = activation()
        self.drop = nn.Dropout(dropout) if dropout and dropout > 0 else nn.Identity()

        # optional: small init for the last layer to start near-identity
        nn.init.zeros_(self.fc2.weight)
        nn.init.zeros_(self.fc2.bias)

    def forward(self, x):
        h = self.ln1(x)
        h = self.act(self.fc1(h))
        h = self.drop(h)
        h = self.ln2(h)
        h = self.fc2(h)
        return x + h

This network has ample capacity: deep, residual, and trained on a large dataset with stable optimization.

Result

The global shape of the Mandelbrot set is clearly recognizable. However, fine details near the boundary are visibly blurred. Regions that should exhibit intricate fractal structure appear overly smooth, and thin filaments are either poorly defined or entirely missing.

This is not a matter of resolution, data, or depth. So what is going wrong?

The Spectral Bias Problem

Neural networks have a well-known problem called spectral bias:
they tend to learn low-frequency functions first, and struggle to represent functions with rapid oscillations or fine detail.

The Mandelbrot boundary is dominated by highly irregular, filled with small-scale structures, especially near its boundary. To capture it, the network would need to represent very high-frequency variations in the output as $x$ and $y$ change.

Fourier Features: Encoding Coordinates in Frequency Space

One of the most elegant solutions to the spectral bias problem was introduced in 2020 by Tancik et al. in their paper Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains.

The idea is to transform the input coordinates before feeding them into the neural network. Instead of giving the raw $(x, y)$, we pass their sinusoidal projection otno random directions in a higher-dimensional space.

Formally:

$$\gamma(x)=[sin(2 \pi Bx),cos(2 \ pi Bx)]$$

where $B \in \mathbb{R}^{d_{in}×d_{feat}}$ is a random Gaussian matrix.

This mapping acts like a random Fourier basis expansion, letting the network more easily represent high-frequency details.

class GaussianFourierFeatures(nn.Module):
    def __init__(self, in_dim=2, num_feats=256, sigma=5.0):
        super().__init__()
        B = torch.randn(in_dim, num_feats) * sigma
        self.register_buffer("B", B)

    def forward(self, x):
        proj = (2 * torch.pi) * (x @ self.B)
        return torch.cat([torch.sin(proj), torch.cos(proj)], dim=-1)

Multi-Scale Gaussian Fourier Features

A single frequency scale might not be sufficient. The Mandelbrot set exhibits structure at all resolutions (a distinctive feature of fractal geometry).

To capture this, we use multi-scale Gaussian Fourier features, combining several frequency bands:

class MultiScaleGaussianFourierFeatures(nn.Module):
    def __init__(self, in_dim=2, num_feats=512, sigmas=(2.0, 6.0, 10.0), seed=0):
        super().__init__()
        # split features across scales
        k = len(sigmas)
        per = [num_feats // k] * k
        per[0] += num_feats - sum(per)

        Bs = []
        g = torch.Generator()
        g.manual_seed(seed)
        for s, m in zip(sigmas, per):
            B = torch.randn(in_dim, m, generator=g) * s
            Bs.append(B)

        self.register_buffer("B", torch.cat(Bs, dim=1))

    def forward(self, x):
        proj = (2 * torch.pi) * (x @ self.B)
        return torch.cat([torch.sin(proj), torch.cos(proj)], dim=-1)

This effectively provides the network with a multi-resolution frequency basis, perfectly aligned with the self-similar nature of fractals.

Final Model

The final model has the same architecture as the baseline model, the only difference is that it uses the MultiScaleGaussianFourierFeatures.

class MLPFourierRes(nn.Module):
    def __init__(
        self,
        num_feats=256,
        sigma=5.0,
        hidden_dim=256,
        num_blocks=8,
        act="silu",
        dropout=0.0,
        out_dim=1,
    ):
        super().__init__()
        self.ff = MultiScaleGaussianFourierFeatures(
            2,
            num_feats=num_feats,
            sigmas=(2.0, 6.0, sigma),
            seed=0
        )

        self.in_proj = nn.Linear(2 * num_feats, hidden_dim)

        self.blocks = nn.Sequential(*[
            ResidualBlock(hidden_dim, act=act, dropout=dropout)
            for _ in range(num_blocks)
        ])

        self.out_ln = nn.LayerNorm(hidden_dim)
        activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
        self.out_act = activation()

        self.out_proj = nn.Linear(hidden_dim, out_dim)

    def forward(self, x):
        x = self.ff(x)
        x = self.in_proj(x)
        x = self.blocks(x)
        x = self.out_act(self.out_ln(x))
        return self.out_proj(x)

Training Dynamics

Training Without Fourier Features

The model slowly learns the overall shape of the Mandelbrot set, but then plateaus. Additional training fails to add more detail.

Training With Fourier Features

Here the coarse structures appear first, followed by progressively finer details. The model continues to refine its prediction instead of plateauing.

Final Results

Both models used the same architecture, dataset, and training procedure. The network is a deep residual multilayer perceptron (MLP) trained as a regression model on the smooth escape-time formulation.

Dataset: 1.000.000 samples from the complex plane, with 70% of points concentrated near the fractal boundary thanks to the biased data sampling.
Architecture: Residual MLP with 20 residual blocks, and a hidden dimension of 512 units.
Activation Function: SiLU
Training: 100 epochs, batch size 4096, Adam-based optimizer, Cosine Annealing scheduler.

The only difference between the two models is the representation of the input coordinates. The baseline model uses the raw Cartesian coordinates, while the second model uses the multi-scale Fourier Features for the representation.

Global View

Zoom 1 View

Zoom 2 View

Conclusions

Fractals such as the Mandelbrot set are an extreme example of functions dominated by high-frecuency structures. Approximating them directly from raw coordinates forces neural networks to synthesize increasingly detailed oscillations, a task for which MLPs are poorly suited.

What this article shows is that the limitation is not architectural capacity, data volume, or optimization. It is representation.

By encoding input coordinates with multi-scale Gaussian Fourier Features, we shift much of the problem’s complexity into the input space. High-frequency structure becomes explicit, allowing an otherwise ordinary neural network to approximate a function that would be too complex in its original form.

This idea extends beyond fractals. Coordinate-based neural networks are used in computer graphics, physics-informed learning, and singal processing. In all of these settings, the choice of input encoding can be the difference between smooth approximations and rich, high-detailed structure.

Note on visual assets

All images, animations, and videos shown in this article were generated by the author from the outputs of the neural network models described above. No external fractal renderers or third-party visual assets were used. The full code used for training, rendering images, and generating animations is available in the accompanying repository.

Complete Code

Github repo

Top Posts

Meaghan Elwell on Participating Frontline Staff and Constructing Human-Centered Manufacturing

Bitcoin Adoption Booms Whereas Bear Market Deepens: Watch These Alerts

9 unverzichtbare Open-Supply-Safety-Instruments

Teaching a Neural Network the Mandelbrot Set

Designing Information and AI Methods That Maintain Up in Manufacturing

High 7 OpenClaw Instruments & Integrations You Are Lacking Out On

Nous Analysis Releases ‘Hermes Agent’ to Repair AI Forgetfulness with Multi-Stage Reminiscence and Devoted Distant Terminal Entry Help

Scaling Function Engineering Pipelines with Feast and Ray

Partially shared multi-modal embedding learns holistic illustration of cell state

A Coding Implementation to Simulate Sensible Byzantine Fault Tolerance with Asyncio, Malicious Nodes, and Latency Evaluation

Meaghan Elwell on Participating Frontline Staff and Constructing Human-Centered Manufacturing

Bitcoin Adoption Booms Whereas Bear Market Deepens: Watch These Alerts

9 unverzichtbare Open-Supply-Safety-Instruments

Designing Information and AI Methods That Maintain Up in Manufacturing

5 Helpful Python Scripts for Automated Knowledge High quality Checks

Exposing Spin apps on SpinKube with GatewayAPI

The wearable ecosystem trusted by Olympic athletes

Why final 12 months’s LG C5 OLED is the neatest TV purchase proper now – particularly at 50% off

Trending

Meaghan Elwell on Participating Frontline Staff and Constructing Human-Centered Manufacturing

Bitcoin Adoption Booms Whereas Bear Market Deepens: Watch These Alerts

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Teaching a Neural Network the Mandelbrot Set

Introduction

The Mandelbrot Set

Turning the Mandelbrot Set into a Learning Problem

Defining the Learning Problem

Data Sampling Strategy

Baseline Model: A Deep Residual MLP

Result

The Spectral Bias Problem

Fourier Features: Encoding Coordinates in Frequency Space

Multi-Scale Gaussian Fourier Features

Final Model

Training Dynamics

Training Without Fourier Features

Training With Fourier Features

Final Results

Global View

Zoom 1 View

Zoom 2 View

Conclusions

Note on visual assets

Complete Code

Related Posts