Mandelbrot set acceleration

At a "hard" location suggested by Fractal Universe on the forums, the BLA method finished in 37seconds while when I stopped KF after an hour it had only reached 2% progress on the pixel iterations, and that was with the iteration count reduced from 2100100100 to 500100100 as otherwise it ran out of memory on my 32GB desktop...

This new method is magic, massive thanks to Zhuoran on the forum for dropping the two hints (rebasing when |Z+z|<|z| and using bivariate linear interpolation), albeit without great detail or source code for the latter.

Putting on hold proof of correctness using error bounds as I don't understand it enough (and hoping someone else can do it and/or teach me).

Next to investigate is applying the method to other escape time fractals like the Burning Ship. I suppose A and B would become 2x2 real matrices instead of complex numbers, not sure what matrix norm to use when calculating the safe radius. Needs to take into account distance from axis for abs-folding, as well as from origin for squaring nonlinearity...

Mandelbrot set acceleration

turns out the glitchy images I was getting were down to bad assumptions when looking up the data in the tables, adding extra data to the tables helped me diagnose where indexes where getting mixed up.... very lots of time wasted thinking I had bad "it's ok to do linear here" radius calculations....

rough benchmarks at 1920x1080:

"Flake" by Dinkydau

kf-2.15.4 : 5.1s

kf-2.15.5pre : 2.6s

bla : 0.5s

"Evolution of Trees" by Dinkydau

kf-2.15.4 : 1m21s

kf-2.15.5pre : 37s

bla : 2.4s

this is just with the linear approximation without error control, so results may turn out bad for some locations. the backward error analysis stuff is something I may revisit later (must remember to use relative error in output and absolute error in input: relative error in input is no good if inputs -> 0 like c does at the reference)

Mandelbrot set acceleration

Better is merging BLAs with the quadratic terms too so the backwards error radius can be computed. I think I'm doing it wrong because it's only valid for a narrow |c| range instead of all c in image...

Show thread

https://mathr.co.uk/smoltech/#exhibition

exhibition opening tonight!

Friday 26th November 2021 7pm

CT-20

73 Tontine St

Folkestone

CT20 1JR,

United Kingdom

http://ct-20.org

computer-based works by me

curated by @netzzz

Mandelbrot set acceleration

Maybe it would work to have a merging strategy.

For example, 1-step BLA is easy to compute. Merging neighbouring BLAs is possible:

ZA z + ZB c = YA(XA z + XB c) + YB c

So

ZA = YA XA

ZB = YA XB + YB

And

Z skip = X skip + Y skip

Finally

Z radius = min(X radius, Y radius / (|XA| + 1))

is probably wrong but I would need paper to work it out...

Then build tables of M 1-steps, M/2 2-steps, M/4 4-steps etc by merging neighbouring BLAs without overlaps. So the total table size is O(M), which equals the work needed to construct it. Much better than O(M^2)

When iterating pixels, try larger steps (with smaller radius) first if applicable, which means at most log M to search through. This was O(M) in the previous implementation. It may take O(log L) steps to skip L iterations which previously could have been 1 step, but the reduction in search time may be worth it.

Mandelbrot set acceleration

Seems the table gets fully populated even when computed on demand.

The backwards error control is expensive but seems to work (correct image first try); but skipping is pessimised: now takes 7mins vs 20secs with the earlier version with fudge factor to avoid bad images.

Wondering if there is some dynamic programming technique to apply to reduce required work from O(M^2) to O(M log M).

For example product_m^n 2z_i = product_1^n 2z_i / product_1^{m-1} 2 z_i. so a precomputed table of running products costing O(M) could reduce cost of sub products from O(n-m) to O(1).

Iteration of B -> 2 Z_i B + 1 starting from m until n gives

sum_k=m^{n+1} product_j=k^n 2Z_j

where the empty product gives 1.

Applying the previous identity, this is

Sum_k=m^n+1 product_j=1^n 2z_j / product_j=1^k-1 2z_j

=

(product_j=1^n 2z_j) (sum_k=m^n+1 1/product_j=1^k-1 2zj)

=

P[n] (S[n+1] - S[m-1])

where

P[n] = 2 Z_n P[n-1]

S[n] = S[n-1] + 1/P[n-1]

can be computed in O(M)

Mandelbrot set acceleration

...continued...

TABLE COMPUTATION

|z| := cut off radius

m := starting iteration

h := image size in pixels

k := pixel spacing

A := 1

B := 0

D := 0

E := 0

F := 0

l := 0

r := 4

while (m + l < M)

Z := Z[m + l]

A' := 2 Z A

B' := 2 Z B + 1

D' := 2 Z D + A^2

E' := 2 (Z E + A B)

F' := 2 Z F + B^2

R := max{ |D|, |E|/2, |F| }

a := R h

b := R h^2 k

c := R h^3 k^2 - |B| h k

r' := (-b +/- sqrt(b^2 - 4 a c)) / 2 a

if r' < r

Table[m].insert_front{l, r, A, B)

if r < |z|

break

r := r'

A := A'

B := B'

D := D'

E := E'

F := F'

l := l + 1

CAVEATS

- doing table computation up front with |z| = 0 takes O(M^2) time, which is infeasible for large M (even though it can be parallelized)

- doing table computation on demand makes parallelism much more complicated, but subsequent read-only accesses should be fast/simple - this is ok as long as Write Once Read Many applies)

something like this perhaps

T := atomic read Table[m]

if T == nullptr

acquire mutex lock for Table[m]

T := atomic read Table[m]

if T == nullptr

// we are the first thread

T = compute table

Table[m] := atomic write T

else

// other thread got there first

unlock mutex for Table[m]

use T

...(3/3) (maybe more later when I've actually implemented all of it)

Mandelbrot set acceleration

...continued...

h := image size in pixels

k := pixel spacing

so c is bounded by h k

R (|z|^2 + |z| |c| + |c|^2) / (|B| |c|) < 1 / h

becomes

R h |z|^2 + R h |c| |z| + R h |c|^2 - |B| |c| < 0

which is quadratic in |z|, so we can solve for |z| < r

want to find r such that this is valid for any |c| < h k, so can compute only once per image. then for all |z| < r, error in z -> A z + B c is small.

INPUT

C := view center (high precision)

M := reference count limit

N := iteration count limit

Zoom := magnifaction factor relative to [-2,2] x [-2,2]

REFERENCE

Z[M] low precision array

Z := 0 (high precision)

m := 0

while (|Z| < 2 && m < M)

Z[m] := round(Z)

Z := Z^2 + c

m := m + 1

TABLE DATA

struct TableItem

{

int k

real r

complex A, B

}

Table[M] : list<TableItem>

PIXEL

c := pixel coordinates (relative to C) * pixel_spacing

z := 0

n := 0

m := 0

while (|Z[m] + z| < 2 && n < N)

// perturbed iteration

z := (2 Z[m] + z) z + c

n := n + 1

m := m + 1

// rebase

if (|Z[m] + z| < z)

z := Z[m] + z

m := 0

if no Table[m]

// compute Table[m] until r < |z|

search Table[m] for smallest T->r > |z|

if T->k > 0

z := T->A * z + T->B * c

n := n + T->k

m := m + T->k

// rebase

if (|Z[m] + z| < z)

z := Z[m] + z

m := 0

...(2/n)...

Mandelbrot set acceleration

Z -> Z^2 + C

high precision reference orbit

z -> (2 Z+ z) z + c

low precision deltas relative to high precision orbit

|Z + z| < |z|

when this occurs, replace z with Z + z and reset the reference iteration count to 0 (this avoids glitches)

z -> A z + B c

A -> 2 Z A

B -> 2 Z B + 1

bivariate linear approximation to skip many iterations at once

|Z + z| >> |z|

|z| << (|B| |c| - |Z|) / (|A| + 1)

stopping condition based on ball arithmetic

flaw: this needs an arbitrary "fudge factor", 0.1 works for Dinkydau's Flake, 0.01 is required for Dinkydau's Evolution of Trees

solution (untested so far)

z -> A z + B c + D z^2 + E zc + F c^2

D -> 2 Z D + A^2

E -> 2 (Z E + A B)

F -> 2 Z F + B^2

R = max{ |D|, |E|/2, |F| }

z = A z + B c + R (|z^2| + |zc| + |c^2|)

Taylor remainder bound

z = (1 + e_z) (A z + B c)

= A z + B (1 + e_c) c

e_z =R (|z^2| + |zc| + |c^2|) / (A z + B c)

e_c = (A z + B c) e_z / (B c)

= R (|z^2| + |zc| + |c^2|) / (B c)

Backward error

For good enough image, want (1 + e_c) c to be in the same pixel as c: (1 + e_c) c < c + pixel_spacing so e_c < pixel_spacing / c. Now c is bounded by image size in pixels * pixel spacing, so want e_c < 1 / (image size in pixels)

...(1/n)...

I found a nice reference for #interval #Taylor methods:

> Rigorous Analysis of Nonlinear Motion in Particle Accelerators

> Kyoko Makino

> PhD thesis

> Department of Physics and Astronomy

> Michigan State University

> 1998

It uses regular real box intervals (two endpoints), but wasn't too hard to convert to complex ball intervals (midpoint and radius).

I implemented knighty's root finding algorithm (in the context of period detection in the #Mandelbrot set) on top of the primitive operations from the thesis, and now I think I understand how it works a bit more. I have to say knighty's original code that I copy/pasted was rather impenetrable:

https://code.mathr.co.uk/mandelbrot-numerics/blob/3964f6b78c8a6731a9dbdaa2d6d79cef9a7ed693:/c/lib/m_d_ball_period.c

I wrote exrmean because I found a way to emulate Fragmentarium-style multiple #subframe #antialiasing with #KF: render a zoom out sequence with the zoom size set to 1 (so each image shows the same location).

With jitter enabled, the pseudorandom seed for subpixel offsets is incremented each frame, so the images will be slightly different.

Averaging them all (in linear light, as used in EXR) will give a final image with higher quality than any individual frame, without having needed to render a huge image and downscale it (the previous option).

Enable "reuse reference" to avoid calculating it each subframe (unfortunately the series approximation is recalculated, which is not optimal).

Probably better than using vast amounts of temporary space for EXR files in this method would be to extend KF to do the subframe rendering itself. That'd be much more work than this quick hack though.

making art with maths and algorithms

Joined May 2018

<svg xmlns="http://www.w3.org/2000/svg" id="hometownlogo" x="0px" y="0px" viewBox="25 40 50 20" width="100%" height="100%"><g><path d="M55.9,53.9H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,53.9,55.9,53.9z"/><path d="M55.9,58.2H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,58.2,55.9,58.2z"/><path d="M55.9,62.6H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,62.6,55.9,62.6z"/><path d="M64.8,53.9c-0.7,0-1.3,0.6-1.3,1.3v8.8c0,0.7,0.6,1.3,1.3,1.3s1.3-0.6,1.3-1.3v-8.8C66,54.4,65.4,53.9,64.8,53.9z"/><path d="M60.4,53.9c-0.7,0-1.3,0.6-1.3,1.3v8.8c0,0.7,0.6,1.3,1.3,1.3s1.3-0.6,1.3-1.3v-8.8C61.6,54.4,61.1,53.9,60.4,53.9z"/><path d="M63.7,48.3c1.3-0.7,2-2.5,2-5.6c0-3.6-0.9-7.8-3.3-7.8s-3.3,4.2-3.3,7.8c0,3.1,0.7,4.9,2,5.6v2.4c0,0.7,0.6,1.3,1.3,1.3 s1.3-0.6,1.3-1.3V48.3z M62.4,37.8c0.4,0.8,0.8,2.5,0.8,4.9c0,2.5-0.5,3.4-0.8,3.4s-0.8-0.9-0.8-3.4C61.7,40.3,62.1,38.6,62.4,37.8 z"/><path d="M57,42.7c0-0.1-0.1-0.1-0.1-0.2l-3.2-4.1c-0.2-0.3-0.6-0.5-1-0.5h-1.6v-1.9c0-0.7-0.6-1.3-1.3-1.3s-1.3,0.6-1.3,1.3V38 h-3.9h-1.1h-5.2c-0.4,0-0.7,0.2-1,0.5l-3.2,4.1c0,0.1-0.1,0.1-0.1,0.2c0,0-0.1,0.1-0.1,0.1C34,43,34,43.2,34,43.3v7.4 c0,0.7,0.6,1.3,1.3,1.3h5.2h7.4h8c0.7,0,1.3-0.6,1.3-1.3v-7.4c0-0.2,0-0.3-0.1-0.4C57,42.8,57,42.8,57,42.7z M41.7,49.5h-5.2v-4.9 h10.2v4.9H41.7z M48.5,42.1l-1.2-1.6h4.8l1.2,1.6H48.5z M44.1,40.5l1.2,1.6h-7.5l1.2-1.6H44.1z M49.2,44.6h5.5v4.9h-5.5V44.6z"/></g></svg>