In
this post, I explained how to generate a random, uniform distribution of points on a disk. That problem turns out to be a special case of a more general technique that can be used to generate random numbers with any probability distribution you want. As a bonus, it also explains the seemingly-magical fix (taking the square root of a random distribution to find
r) that generates the desired result. But it's not magic--it's a really cool bit of mathematics.
As in my previous post on stochastic geometry, assume that you have a random number generator that outputs numbers randomly selected from 0 to 1. Each possible number has an equal chance of being chosen, so if you plotted how often each number was picked, you would get a uniform distribution.
Let's say, instead, you wanted to generate random numbers between 0 and 1 with a probability distribution function proportional to the polynomial
p(x) = -8x
4 + 8x
3 -2x
2 + 2x. Broadly speaking, we'd like to pick numbers around 0.7 the most often, with larger numbers being generated more often than smaller numbers. The distribution looks something like this
1:
The first step is to find the cumulative distribution function
c(x) of the probability distribution function
p(x). If you imagine the chart above as a histogram, the cumulative distribution function would give, for every x, the height of all the bars to the left of that x value. In other words, the cumulative distribution function gives the proportion of the area under the curve that lies to the left of x compared to the total area under the curve. This should sound familiar if you've ever taken a calculus course--to find
c(x), we take the integral of
p(x) and divide by the integral of
p(x) from 0 to 1
. If you haven't taken calculus, don't worry. Taking an integral in this context just means finding the area under a curve between two intervals, as described earlier.
Here is what
c(x) looks like, plotted alongside
p(x).
The next step is the easiest. Use the random number generator to generate as many random numbers as you need between 0 and 1. I picked five: 0.77375, 0.55492, 0.08021, 0.51151, and 0.18437
2.
Now, using
c(x) as a sort of translator, we can figure out which random numbers in our non-uniform distribution these numbers correspond to. It's important to realize that the random numbers we generated are
values of c(x) not values of
x. No matter what interval we use,
c(x) will always have values from 0 to 1, but we could always use a different probability distribution that had
x values from any real number to any other real number. In my research, I use this technique to generate random angles that have values from 0 to π, for instance. So, using these values of
c(x), we can interpolate to find the values of
x that they correspond to.
Here is the process of interpolation for the numbers I chose. The red points represent the uniformly distributed random values for
c(x). The yellow points represent the randomly generated
x values that have the same probability distribution as
p(x). Very roughly, 0.77375, 0.55492, 0.08021, 0.51151, and 0.18437 correspond to 0.78, 0.65, 0.25, 0.62, and 0.38 respectively, via the green graph of
c(x). Although it's hard to tell right now, if I generated enough numbers, we would indeed find we were picking numbers around 0.7 the most often, with more large numbers being generated than small numbers.
In the case of generating random points over a disk, we need to generate random values of r. We are more likely to find points at larger radii than smaller radii simply because a circle with a larger radius has a greater perimeter: perimeter is proportional to radius. Thus, our
p(x) is proportional to
r, and our
c(x) is proportional to r
2. This is why we need to take the non-intuitive step of taking the square root when generating uniform, random coordinates for the disk! While to our eyes, the result looks like a uniform covering of the disk, the distribution underneath isn't uniform at all.
This technique is also useful if you want to generate random values according to a Gaussian distribution, also known as a normal distribution or a bell curve. These distributions are ubiquitous in statistics and if you are familiar with image processing, they are the functions behind "Gaussian blur". But of course, they can be used to generate any probability distribution you like, not just these examples.
***
1 I picked this distribution because it's very easy to integrate and visually interesting. It's not actually related to my research at all, and I don't think there's anything especially interesting about it.
2 I really did generate these numbers with my computer--I didn't cherry pick them to look good!