## A primer in everything you (n)ever wanted to know about (convergence in) probability

Don’t be scared of the maths in this post. I want to share one of the reasons I enjoyed studied probability at university. It’s an infuriating, beautiful, brilliant subject with myriad applications to search marketing. It’s also (mainly) based in common sense if you are prepared to try to get past the maths. Don’t be scared of the terminology - I’m going to explain things from scratch. And then I’m going to try to apply it to some practical advice.

If you have A-level maths (or equivalent), you should be able to follow. Here we go...

## Probability

The bit of probability I want to show you is to do with a subject called convergence, but first, some basics.

## What is probability?

The kind of probability you learn about at school is most easily thought about as:

The probability of an event occurring = (the number of ways that event can occur) / (the total number of equally likely outcomes)

So the probability of getting a head when you toss a (fair) coin is 1 / 2 (i.e. the number of heads = 1 and the total number of outcomes = 2, H or T).

This serves us pretty well for a lot of things. But pretty soon, you find outcomes that aren’t equally likely - at which point you start thinking of probability as the proportion of times you expect to get a particular result in a set of trials (you can use this definition of probability to cope with click through rates for example - the probability of getting a click being 0.2 means over time, 1 in 5 people click on your link).

[Side note: when you study more probability, you realise this is actually a circular definition as it relies on expectation which is a consequence of the definition of probability, but tough, we’re doing our best here.]

But what about when there are an infinite number of equally likely outcomes? What about if we are trying to pick a random integer? Then you start introducing (at school) probability distribution functions etc. (which are relatively boring, in my opinion). As you learn more about the subject, you start talking about probability spaces, metrics and integration. Ugggh. We’re not going to go there today (this is a blog post, not a textbook - I can talk about the interesting stuff...).

So let’s just carry on thinking about probability as the ‘chance of something happening’. For most practical purposes, you **can** think of it in terms of experiments. If I repeated this test a hundred (a thousand, a million) times, how many times would I expect to get a particular result.

One pretty cool (geek!) thing about probability when you start talking in terms of probability spaces, is that it becomes clear that you can have events that have zero probability of happening that **could actually happen**. The easiest way to think of this is to imagine you were to pick a random number (of any size) with equal likelihood of picking any number. Because there are infinitely many possible equally likely outcomes, it is not too hard to come to the conclusion that the probability of picking any **particular** number is 0. Yet you did pick one...

However, events with probability zero don’t factor into any calculations really - for this reason we talk about *almost sure* or *almost certain* for events with probability 1. Having probability 1 doesn’t actually mean something is certain - it simply means the probability of it **not happening** is 0 (though this could still happen). Confused yet? Loving it yet?

## What is convergence?

Suppose you have a sequence of numbers x1, x2, ... , xn, ...

To say that the sequence ‘converges’ is to say that there is a number x such that no matter how small a number E you choose, there is a point in the sequence past which all the numbers in the sequence are closer to x than E.

An example would be the sequence 0.9, 0.99, 0.999, 0.999, ... which converges to 1 - i.e. it gets arbitrarily close to 1 as you go along.

Note, however, that it never actually reaches 1. Nevertheless, we say that the ‘limit of the sequence as n tends to infinity’ is 1.

Not all infinite sequences converge. The simplest example of a non-converging sequence is one that alternates between two numbers: 0, 1, 0, 1, 0, ...

While explaining that, I have subtly slipped in the concept of infinity. This is easily a subject large enough not just to be the subject of another post, but to be a graduate-level course all on its own. I am going to use infinity in a slightly sloppy way (in pure maths terms - so not many people will notice). I thought it was enough to try to explain convergence and probabilistic convergence in one day. Never mind the many levels of infinity!

The reason you (might) care about convergence is that it:

- underpins some of the elements of the Google pagerank algorithm
- is very important if you are working with any kind of iterative algorithm (especially those involving randomness)
- involves infinity, which is always cool
- gives you something to talk about at dinner parties

OK. Maybe just the first two reasons. But still.

## What is random convergence?

Suppose you have a random sequence X1, X2, ... , Xn, ...

What I mean by this is that each of the Xi is a number picked according to some probability distribution (this is called a ‘Random Variable’). For example, I might say Xi = 0 if a coin lands heads up and Xi = 1 if a coin lands heads down for a sequence of coin tosses.

Then we define the probability space W as a set of points corresponding to “real-world outcomes”. For example, if x is a point in W, it might correspond to the outcome where we toss a coin and it keeps landing heads-up (H, H, H, H, ...) - this would give us a sequence X1(x), X2(x), X3(x), ... = 0, 0, 0, ... (according to our rule above). Another point y might correspond to one tail, then heads (T, H, H, H, ...) etc.

Now we can look at the probability of a particular outcome - let’s denote this P(x) - this is the probability of the outcome x occurring - in the example above, if our sequence is 10 coin tosses long, P(x) = P(10 heads in a row) = 0.5^10.

Okey dokey. Now consider the behaviour as n grows larger and larger (mathematicians say “as n tends towards infinity”).

Kinda hard to imagine isn’t it?

For any given outcome, we are left with a sequence x1, x2, x3, ... (which may or may not converge as n tends to infinity).

Because there are a whole range of these sequences, each with its own probability of occurring, we can define a number of kinds of convergence when we are talking about convergence of a random sequence. There are actually a few kinds, but I just wanted to quickly introduce two:

- convergence in probability
- convergence with probability 1

## Convergence in probability

We say that a random sequence converges in probability if there exists a random variable X such that, for all e (no matter how small), the probability that the difference between Xn and X is greater than e tends to 0 as n tends to infinity.

This is actually quite a weak form of convergence and there are a lot of sequences that don’t look like they shouldn’t converge that do converge in probability.

## Convergence with probability 1

As mentioned above, something can have probability 1 without actually being certain - which is why we call it ‘almost sure’ or ‘almost certain’. Convergence with probability 1 is a common form of convergence that is one of the strongest.

What it means is that almost all of the sequences that can possibly be generated converge in the regular sense (i.e. that the probability of getting a sequence that doesn’t converge is 0).

## Practical uses

As I mentioned, the concept of convergence is important when you are dealing with iterative algorithms (and probabilistic convergence comes into play when you are dealing with iterative algorithms that have an element of randomness). One example of this kind of algorithm is a class known as Monte Carlo algorithms. These are used to find the answer to questions of probability by running a large number of theoretical trials and seeing what the result is.

The famous Google pagerank algorithm is an iterative algorithm with random elements and so the question of whether or not it will converge is potentially a highly complex one. I haven’t put too much thought into this (or done much background reading), but I guess I imagine they have! Even so, I imagine it converges ‘almost surely’ at best. This means that there is a chance the Internet wouldn’t converge... Wouldn’t that be bad? (Note - I know they use a limited number of iterations, so they’re not going to end up in an infinite loop, but still).

Having said all that, this post isn’t really about practical uses, but is more an adventure into some slightly advanced maths. I hope you enjoyed it.

## Disclaimer, inspiration and further reading

This post was inspired by conversations with by Hamlet Batista on SEOmoz.

Disclaimer: it’s a long time since I studied this stuff so I might have any bit of it wrong. It’s still fun though.

If you enjoyed that bit of maths, you might like the following (though some of these need waaay more maths):

Will founded Distilled with Duncan in 2005. Since then, he has consulted with some of the world’s largest organisations and most famous websites, spoken at most major industry events and regularly appeared in local and national press. Will is part... read more

Damn that brings back memories... Good post Will, I missed that post on SEOMoz, must check it out later.

Ah, Monte Carlo simulations... good times, good times.

Not long ago, I used some probability theory to come up with an almost foolproof way to win on online roulette. Unfortunately: (1) online roulette is illegal in my home state, and (2) the rate of winning is so slow (risk/reward ratio and all that) that I'd be better off taking a part-time job.

Pete - does that system involve martingales perchance?! The problem with most 'almost foolproof' systems is that the one time you don't win you lose so much more than you win the other times!

Online roulette is easy to win at though if you use casino bonuses.

Tom - It does, indeed; honestly, I didn't know it had a name. Of course, the original doubling strategy was pretty dangerous (as you said, when you lose, you lose big) and also easily thwarted by table limits. I started calculating other multipliers for win conditions with lower odds (3:1, 4:1, etc.). Unfortunately, you basically need a spreadsheet in front of you and have to bet such odd amounts that you could never do it in a real casino. Plus, when you win, you win so incrementally, that it's almost not worth the bother.

Sorry, I'm not sure what this had to do with the original post, come to think of it :) Fun with probability, I guess...

Hey Will!

Thanks for this excellent writeup. When you were talking about 'limit' you brought back memories of my first Calculus class. That is a very interesting concept.

From my research, I think the main reason they made those adjustments I mentioned on the SEOmoz post is to make sure the Power method converges on relatively few iterations. I learned they originally converged at 50. I assume they need more iterations now that the web is a lot bigger and so has to be the Google matrix.

I am taking a short vacation, but I'm following most things through my blackberry. Happy Holidays!

@Pete: I'm sure the casinos love "almost foolproof" ways of winning :) Martingale theory is fun - now you know what it's called, there is loads of stuff to read about it. It applies to a lot of gambling theory - the fact that you can potentially go bust breaks a lot of otherwise sound schemes.

@Hamlet: thanks for reading on your blackberry! Your contribution is great - I think I need to look into the convergence of PR a little more.

Enjoy the season everyone :)

Somehow, I figured the average pit boss would probably put 2+2 together when my bets looked like:

$1, $1, $1, $2, $2, $3, $4, $7...

Plus, I'd always be asking for change :)

That reminds me when I was still studying Probability. I got headache with that. anyways, thanks for this post .