A primer in everything you (n)ever wanted to know about (convergence in) probability
Don’t be scared of the maths in this post. I want to share one of the reasons I enjoyed studied probability at university. It’s an infuriating, beautiful, brilliant subject with myriad applications to search marketing. It’s also (mainly) based in common sense if you are prepared to try to get past the maths. Don’t be scared of the terminology - I’m going to explain things from scratch. And then I’m going to try to apply it to some practical advice.
If you have A-level maths (or equivalent), you should be able to follow. Here we go...
The bit of probability I want to show you is to do with a subject called convergence, but first, some basics.
What is probability?
The kind of probability you learn about at school is most easily thought about as:
The probability of an event occurring = (the number of ways that event can occur) / (the total number of equally likely outcomes)
So the probability of getting a head when you toss a (fair) coin is 1 / 2 (i.e. the number of heads = 1 and the total number of outcomes = 2, H or T).
This serves us pretty well for a lot of things. But pretty soon, you find outcomes that aren’t equally likely - at which point you start thinking of probability as the proportion of times you expect to get a particular result in a set of trials (you can use this definition of probability to cope with click through rates for example - the probability of getting a click being 0.2 means over time, 1 in 5 people click on your link).
[Side note: when you study more probability, you realise this is actually a circular definition as it relies on expectation which is a consequence of the definition of probability, but tough, we’re doing our best here.]
But what about when there are an infinite number of equally likely outcomes? What about if we are trying to pick a random integer? Then you start introducing (at school) probability distribution functions etc. (which are relatively boring, in my opinion). As you learn more about the subject, you start talking about probability spaces, metrics and integration. Ugggh. We’re not going to go there today (this is a blog post, not a textbook - I can talk about the interesting stuff...).
So let’s just carry on thinking about probability as the ‘chance of something happening’. For most practical purposes, you can think of it in terms of experiments. If I repeated this test a hundred (a thousand, a million) times, how many times would I expect to get a particular result.
One pretty cool (geek!) thing about probability when you start talking in terms of probability spaces, is that it becomes clear that you can have events that have zero probability of happening that could actually happen. The easiest way to think of this is to imagine you were to pick a random number (of any size) with equal likelihood of picking any number. Because there are infinitely many possible equally likely outcomes, it is not too hard to come to the conclusion that the probability of picking any particular number is 0. Yet you did pick one...
However, events with probability zero don’t factor into any calculations really - for this reason we talk about almost sure or almost certain for events with probability 1. Having probability 1 doesn’t actually mean something is certain - it simply means the probability of it not happening is 0 (though this could still happen). Confused yet? Loving it yet?
What is convergence?
Suppose you have a sequence of numbers x1, x2, ... , xn, ...
To say that the sequence ‘converges’ is to say that there is a number x such that no matter how small a number E you choose, there is a point in the sequence past which all the numbers in the sequence are closer to x than E.
An example would be the sequence 0.9, 0.99, 0.999, 0.999, ... which converges to 1 - i.e. it gets arbitrarily close to 1 as you go along.
Note, however, that it never actually reaches 1. Nevertheless, we say that the ‘limit of the sequence as n tends to infinity’ is 1.
Not all infinite sequences converge. The simplest example of a non-converging sequence is one that alternates between two numbers: 0, 1, 0, 1, 0, ...
While explaining that, I have subtly slipped in the concept of infinity. This is easily a subject large enough not just to be the subject of another post, but to be a graduate-level course all on its own. I am going to use infinity in a slightly sloppy way (in pure maths terms - so not many people will notice). I thought it was enough to try to explain convergence and probabilistic convergence in one day. Never mind the many levels of infinity!
The reason you (might) care about convergence is that it:
- underpins some of the elements of the Google pagerank algorithm
- is very important if you are working with any kind of iterative algorithm (especially those involving randomness)
- involves infinity, which is always cool
- gives you something to talk about at dinner parties
OK. Maybe just the first two reasons. But still.
What is random convergence?
Suppose you have a random sequence X1, X2, ... , Xn, ...
What I mean by this is that each of the Xi is a number picked according to some probability distribution (this is called a ‘Random Variable’). For example, I might say Xi = 0 if a coin lands heads up and Xi = 1 if a coin lands heads down for a sequence of coin tosses.
Then we define the probability space W as a set of points corresponding to “real-world outcomes”. For example, if x is a point in W, it might correspond to the outcome where we toss a coin and it keeps landing heads-up (H, H, H, H, ...) - this would give us a sequence X1(x), X2(x), X3(x), ... = 0, 0, 0, ... (according to our rule above). Another point y might correspond to one tail, then heads (T, H, H, H, ...) etc.
Now we can look at the probability of a particular outcome - let’s denote this P(x) - this is the probability of the outcome x occurring - in the example above, if our sequence is 10 coin tosses long, P(x) = P(10 heads in a row) = 0.5^10.
Okey dokey. Now consider the behaviour as n grows larger and larger (mathematicians say “as n tends towards infinity”).
Kinda hard to imagine isn’t it?
For any given outcome, we are left with a sequence x1, x2, x3, ... (which may or may not converge as n tends to infinity).
Because there are a whole range of these sequences, each with its own probability of occurring, we can define a number of kinds of convergence when we are talking about convergence of a random sequence. There are actually a few kinds, but I just wanted to quickly introduce two:
- convergence in probability
- convergence with probability 1
Convergence in probability
We say that a random sequence converges in probability if there exists a random variable X such that, for all e (no matter how small), the probability that the difference between Xn and X is greater than e tends to 0 as n tends to infinity.
This is actually quite a weak form of convergence and there are a lot of sequences that don’t look like they shouldn’t converge that do converge in probability.
Convergence with probability 1
As mentioned above, something can have probability 1 without actually being certain - which is why we call it ‘almost sure’ or ‘almost certain’. Convergence with probability 1 is a common form of convergence that is one of the strongest.
What it means is that almost all of the sequences that can possibly be generated converge in the regular sense (i.e. that the probability of getting a sequence that doesn’t converge is 0).
As I mentioned, the concept of convergence is important when you are dealing with iterative algorithms (and probabilistic convergence comes into play when you are dealing with iterative algorithms that have an element of randomness). One example of this kind of algorithm is a class known as Monte Carlo algorithms. These are used to find the answer to questions of probability by running a large number of theoretical trials and seeing what the result is.
The famous Google pagerank algorithm is an iterative algorithm with random elements and so the question of whether or not it will converge is potentially a highly complex one. I haven’t put too much thought into this (or done much background reading), but I guess I imagine they have! Even so, I imagine it converges ‘almost surely’ at best. This means that there is a chance the Internet wouldn’t converge... Wouldn’t that be bad? (Note - I know they use a limited number of iterations, so they’re not going to end up in an infinite loop, but still).
Having said all that, this post isn’t really about practical uses, but is more an adventure into some slightly advanced maths. I hope you enjoyed it.
Disclaimer, inspiration and further reading
Disclaimer: it’s a long time since I studied this stuff so I might have any bit of it wrong. It’s still fun though.
If you enjoyed that bit of maths, you might like the following (though some of these need waaay more maths):