Fast calculation of the distance to cubic Bezier curves on the GPU

Solving quintic polynomial equations: the state of the art

Bézier curves are a core building block of text and 2D shapes rendering. There are several approaches to rendering them, but one especially challenging problem, both mathematically and technically, is computing the distance to a Bézier curve. For quadratic curves (one control point), this is fairly accessible, but for cubic (two control points) we’re going to see why it is so hard.

Having this distance field opens up many rendering possibilities. It’s hard, but it’s possible; here is a live proof:

In this visualization, I’m borrowing your device resources to compute the distance to the curve for every single pixel. The yellow points are the control points of the curve (in white) and the blue zone is a representation of the distance field.

In a previous article, we explained that a Bézier curve can be expressed as a polynomial. In our case, a cubic polynomial:

For a given point p in 2D space, the distance to that Bézier curve can be expressed as a length between our curve and p:

begin{aligned} d(t) &= ||B_3(t) – textbf{p}|| \ &= ||textbf{a}t^3 + textbf{b}t^2 + textbf{c}t + textbf{d} – textbf{p}|| end{aligned}

The length formula has an annoying square root, so we start with the distance squared for simplicity, which we are going to unroll:

The derivative of that function will allow us to identify critical points: that is, points where the distance starts growing or reducing. Said differently, solving D'(t)=0 will identify all the maximums and minimums (we’re interested in the latter) of D(t) (and thus d(t) as well).

A polynomial, this time of degree 5, emerges here. For conciseness, we can express D'(t) polynomial coefficients as a bunch of dot products:

Finally, we notice that solving D'(t)=0 is equivalent to solving D'(t)/2 = 0, so we simplify the expression:

Assuming we are able to solve this equation, we will get at most 5 values of t, among which we should find the shortest distance from p to the curve. Since t is bound within 0 and 1 (start and end of the curve), we will also have to test the distance at these locations.

The red dot in the blue field is a random point in space. The red lines show which distances are evaluated (at most 5+2) to find the smallest one.

root_find5() is our 5th degree root finder, that is the function that gives us all the t (at most 5) which satisfy:

Diving into the rabbit hole of solving polynomial numerically will lead you to insanity. But we still have to scratch the surface because superior degree solvers usually rely on it.

We cannot know in advance whether the division is going to succeed, so we do run divisions and only then check if they failed (and assume a reason for the failing). This is much more reliable than an arbitrary epsilon value. We also try to avoid duplicated roots.

As much as I like it, this implementation might not be the most stable numerically (even though I don’t have have strong data to back this claim). Instead, we may prefer the formula from Numerical Recipes:

begin{aligned} delta &= b^2-4ac \ q &= -frac{1}{2} (b + mathrm{sign}(b)sqrt{delta}) \ r_0 &= frac{q}{a} \ r_1 &= frac{c}{q} end{aligned}

This is not perfect at all (especially with the b²-4ac part). There are actually many other possible implementations, and this HAL CNRS paper shows how near impossible it is to make a correct one. It is an interesting but depressing read, especially since it “only” covers IEEE 754 floats, and we have no such guarantee on GPUs. We also don’t have fma() in WebGL, which greatly limits improvements. For now, it will have to do.

Solving polynomials of degree 5 cannot be solved analytically like quadratics. And even if they were, we probably wouldn’t do it because of numerical instability. Typically, in my experience, analytical 3rd degree polynomials solver do not provide reliable results.

The first iterative algorithm I picked was the Aberth–Ehrlich method. Nowadays, more appropriate algorithms exist, but at the time I started messing up with these problems (several years ago), it was a fairly good contender. This video explores how it works.

The convergence to the roots is quick, and it’s overall simple to implement. But it’s not without flaws. The main problem is that it works in complex space. We can’t ignore the complex roots because they all “respond” to each others. And filtering these roots out at the end implies some unreliable arbitrary threshold mechanism (we keep the root only when the imaginary part is close to 0).

The initialization process also annoyingly requires you to come up with a guess at what the roots are, and doesn’t provide anything relevant. Aberth-Ehrlich works by refining these initial roots, similar to a more elaborate Newton iterations algorithm. Choosing better initial estimates leads to a faster convergence (meaning less iterations).

The Cauchy bound specifies a space by defining the radius of a disk (complex numbers are in 2D space) where all the roots of a polynomial should lie within. We are going to use it for the initial guess, and more specifically its “tight” version (which unfortunately relies on pow()).

Since Aberth-Ehrlich is a refinement and not just a shrinking process, we define and use an inner disk that has half the area of Cauchy bound disk. That way, we’re more likely to start with initial guesses spread in the “middle” of the roots; this is where the √2 comes from in the formula below.

When the main coefficient is too small, we fall back on the 4th degree (and so on until we reach the analytic quadratic). The 4th and 3rd degree versions of this function are easy to guess (they’re pretty much identical, just removing one coefficient at each degree).

We’re also hardcoding a maximum of 16 iterations here because it’s usually enough. To have an idea of how many iterations are required in practice, following is a visualization of the heat map of the number of iterations for every pixel:

The big picture and the weaknesses of the algorithm should be pretty obvious by now. Among all drawbacks of this approach, there are also surprising pathological cases where the algorithm is not performing well. Fortunately, there were some progress on the state of the art in recent years.

In 2022, Cem Yuksel published a new algorithm for polynomial root solving. Initially I had my reservations because the official implementation had a few shortcomings on some edge cases, which made me question its reliability. It’s also optimized for CPU computation and is, to my very personal taste, overly complex.

Fortunately, Christoph Peters showed that it was possible on the GPU by implementing it for very large degrees, and without any recursion. Inspired by that, I decided to unroll it myself for degree 5.

One core difference with Aberth approach is that it is designed for arbitrary ranges. In our case this is actually convenient because, due to how Bézier curves are defined, we are only interested in roots between 0 and 1. We will need to adjust the Quadratic function to work in this range, as well as keeping the roots ordered:

The core logic of the algorithm relies on a cascade of derivatives for every degree. Christoph Peters provides an analytic formula to obtain the derivative for any degree. This is a huge helper when we need to work for an arbitrary degree, but in our case we can just differentiate manually:

Since we’re only interested in the roots, similar to what we did to D(t), we can simplify some of these expressions:

The purpose of that cascade of derivatives is to cut the curve into monotonic segments. In practice, the core function looks like this:

We could unroll cy_find3, cy_find4, and cy_find5, but to keep the code simple, the degree 3 to 5 will share the same function, with leading coefficients set to 0 (hopefully the compiler does its job properly).

The number of iterations might be larger but it is much faster (I observed a factor 3 on my machine), the code is shorter, and actually more reliable.

There isn’t a lot of code, and the paper provides a pseudo-code. But implementing it was actually challenging in many ways.

First of all, the authors didn’t seem to find relevant to mention that it only works if f(a)<00>f(b), you’re pretty much on your own. It requires just 2 lines of adjustments but figuring out this shortcoming of the algorithm was particularly unexpected.

begin{aligned} K_1 &= 0.1 \ K_2 &= 0.98(1+frac{1+sqrt{5}}{2})approx 2.56567 \ n_0 &= ? end{aligned}

I played with them for a while and couldn’t find any set that would make a real difference, so I ended up with the following:

The next step is to work with chains of Bézier curves to make up complex shapes (such as font glyphs). It will lead us to build a signed distance field. This is not trivial at all and mandates one or several dedicated articles. We will hopefully study these subjects in the not-so-distant future.

B_3(t) = textbf{a}t^3 + textbf{b}t^2 + textbf{c}t + textbf{d}

Where a, b, c and d are the vector coefficients derived from the start (P_0), end (P_3), and control points (P_1, P_2) using the following formulas (you can refer to the previous article for details):

begin{aligned} textbf{a} &= -P_0 + 3(P_1-P_2) + P_3 \ textbf{b} &= 3P_0 – 6P_1 + 3P_2 \ textbf{c} &= -3P_0 + 3P_1 \ textbf{d} &= P_0 end{aligned}

Our goal is to find the t value where d(t) is the smallest.

begin{aligned} D(t) &= d(t)^2 \ &= ||textbf{a}t^3 + textbf{b}t^2 + textbf{c}t + textbf{d} – textbf{p}||^2 \ &= (a_xt^3 + b_xt^2 + c_xt + d_x – p_x)^2 + (a_yt^3 + b_yt^2 + c_yt + d_y – p_y)^2 end{aligned}

It is a bit convoluted in our case but straightforward to compute:

begin{aligned} D'(t) &= t^5 6(textbf{a}cdottextbf{a}) \ &+ t^4 10(textbf{a}cdottextbf{b}) \ &+ t^3 (8(textbf{a}cdottextbf{c})+4(textbf{b}cdottextbf{b})) \ &+ t^2 6(textbf{a}cdot(textbf{d}-textbf{p}) + textbf{b}cdottextbf{c}) \ &+ t (4(textbf{b}cdot(textbf{d}-textbf{p})) + 2(textbf{c}cdottextbf{c})) \ &+ 2(textbf{c}cdot(textbf{d}-textbf{p})) \ end{aligned}

XDEFiANCE'e Quality Internet Shop