Let’s be absolutely clear. They’ll be presenting a carefully-tailored PR play, and it won’t actually be true in the way it’ll be interpreted. But they’ll say it, or something like it, and people will lose their collective minds.
The statement in more detail will be something like:
Following the success of our machine-learning-based RankBrain experiments, we are rolling out more artificial intelligence into web search.
We have talked before about the large number of factors that have historically gone into determining which pages rank for specific queries. Anyone who has followed the history of Google knows that this has all been underpinned by the early breakthroughs in PageRank, which enabled us to use the hyperlink-based structure of the web to figure out which websites and pages were the right answers. This worked even for queries we had never seen before, and worked even better as the web scaled.
Back in October of last year, we announced that we were using our RankBrain artificial intelligence system to handle a very large fraction of the queries we see.
Today, we are excited to announce that RankBrain is the primary ranking factor, and that it is more important than any of the individual link-based elements of our older algorithms.
Blah blah excited to organize information. Mission. Users. Blah...
OK. It might not end exactly like that.
But to understand what they really mean when they say this, we have to be good at parsing PR-friendly statements - something those of us who’ve been in the industry a while are very used to doing.
There was a very illuminating comment on a Hacker News thread a little while ago - talking about how much trial-and-error and hand tuning goes into a human-written information-retrieval algorithm. It’s not all pure theoretical hypotheses leading to pure implementations. There is a lot of experimenting to see if you get better results in the real world when you tweak parameters:
oh... am I allowed to write code that doesn’t make any sense? I thought I wasn’t supposed to do that. And he was just like, well, just don’t worry about it, you are overthinking it, you can take all the square roots you want, multiply by 2 if it helps, add 5, whatever, just make things work and we can make it make sense later.
The imagined press release above actually focuses on the balance between this kind of human trial-and-error and the machine-led learning which is a very similar “blind” optimization against goals rather than being an opinionated design-led process. The announcement I imagine from Google will be that the relative importance has flipped - the machine is tweaking the dials.
What I do not anticipate seeing change at the same time is huge differences in where those dials end up. Links and link-related metrics like PageRank and its successors will be data points considered by RankBrain, and, unless the humans have been massively missing something over the last 5-10 years, RankBrain will continue to rate those link-based metrics very highly.
I believe that Google engineers have likely run multiple experiments in recent years attempting to replace link-data with something else (usage, social data, etc.) but I also believe that they haven’t been successful.
In the case of social data, the public signal is simply too sparse and the majority is proprietary to Facebook. In the case of usage data, it can tweak existing preferences, but can’t evaluate a new page, is extremely noisy, has conflicting signals (is a short visit a success or a failure?), is often opaque, and has huge manipulation issues.
What this means in practice is that even after whatever change is made to dial up the dependence on RankBrain and dial down the dependence on the human-tweaked algorithm, I believe that we will continue to see link metrics be better correlated to rankings than any other metric we have access to.
In other words, RankBrain will be more important than all the individual signals in the human-tweaked algorithm (including links) but links will remain the dominant signal that RankBrain itself uses.
The potential change in ranking signals once RankBrain becomes the primary signal.
I might be wrong about any of this, which is what makes all this interesting, but I think there’s precedent - when the original RankBrain announcement came out, Bloomberg attributed this quote to “senior research scientist” Greg Corrado:
In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query” [emphasis mine]
Now we just have to wait for them to say it’s the most important signal. Anyway, I’d love to hear your thoughts.
Want to know more about our thoughts on the future of search? Check out Searchscape - our detailed report on the topic (note that you can also drop your email address in at the bottom of that page to get future updates).
Oh, and while we’re at it, here’s another thought-provoking angle on the rise of the machines in relation to Google, but maybe that’s a blog post for another day:
How much longer is text generated by computers officially against the guidelines? Can’t be much longer: https://t.co/iasQ4jrbUm— Tom Critchlow (@tomcritchlow) March 1, 2016