I was fortunate enough to be able to interview Google’s John Mueller at SearchLove and quiz him about domain authority metrics, sub-domains vs. sub-folders and how bad is ranking tracking really.
I have previously written and spoken about how to interpret Google’s official statements, technical documentation, engineers’ writing, patent applications, acquisitions, and more (see: From the Horse’s Mouth and the associated video as well as posts like “what do dolphins eat?”). When I got the chance to interview John Mueller from Google at our SearchLove London 2018 conference, I knew that there would be many things that he couldn’t divulge, but there were a couple of key areas in which I thought we had seen unnecessary confusion, and where I thought that I might be able to get John to shed some light. [DistilledU subscribers can check out the videos of the rest of the talks here - we’re still working on permission to share the interview with John].
Mueller is Webmaster Trends Analyst at Google, and these days he is one of the most visible spokespeople for Google. He is a primary source of technical search information in particular, and is one of the few figures at Google who will answer questions about (some!) ranking factors, algorithm updates and crawling / indexing processes.
- New information and official confirmations
- Learning more about the structure of webmaster relations
- More interesting quotes from the discussion
- Algorithm changes don’t map easily to actions you can take
- Do Googlers understand the Machine Learning ranking factors?
- Why are result counts soooo wrong?
- More detail on the domain authority question
- Maybe put your nfsw and sfw content on different sub-domains
- John can “kinda see where [rank tracking] makes sense”
- We’ve come a long way
- Personal lessons from conducting an interview on stage
- Things John didn’t say
In the post below, I have illustrated a number of the exchanges John and I had that I think revealed either new and interesting information, or confirmed things we had previously suspected, but had never seen confirmed before on the record.
I thought I’d start, though, by outlining what I think were the most substantial details:
We had previously seen numerous occasions where Google spokespeople had talked about how metrics like Moz's Domain Authority (DA) were proprietary external metrics that Google did not use as ranking factors (this, in response to many blog posts and other articles that conflated Moz's DA metric with the general concept of measuring some kind of authority for a domain). I felt that there was an opportunity to gain some clarity.
"We've seen a few times when people have asked Google: "Do you use domain authority?" And this is an easy question. You can simply say: "No, that's a proprietary Moz metric. We don't use Domain Authority." But, do you have a concept that's LIKE domain authority?"
We had a bit of a back-and-forth, and ended up with Mueller confirming the following (see the relevant parts of the transcript below):
- Google does have domain level metrics that “map into similar things”
- New content added to an existing domain will initially inherit certain metrics from that domain
- It is not a purely link-based metric but rather attempts to capture a general sense of trust
I expect that practically everyone around the industry has seen at least some of the long-running back-and-forth between webmasters and Googlers on the question of sub-domains vs sub-folders (see for example this YouTube video from Google and this discussion of it). I really wanted to get to the bottom of this, because to me it represented a relatively clear-cut example of Google saying something that was different to what real-world experiments were showing.
I decided to set it up by coming from this angle: by acknowledging that we can totally believe that there isn’t an algorithmic “switch” at Google that classifies things as sub-domains and ranks them deliberately lower, but that we do regularly see real-world case studies showing uplifts from moving, and so asking John to think about why we might see that happen. He said [emphasis mine]:
To paraphrase that, the official line from Google is:
- Google has a concept of a “site” (see the discussion above about domain-level metrics)
- Sub-domains (or even sub-folders) can be viewed as not a part of the rest of the site under certain circumstances
- If we are looking at a sub-domain that Google views as not a part of the rest of the site, then webmasters may see an uplift in performance by moving the content to a sub-folder (that is viewed as part of the site)
Unfortunately, I couldn’t draw John out on the question of how one might tell in advance whether your sub-domain is being treated as part of the main site. As a result, my advice remains the same as it used to be:
In general, create new content areas in sub-folders rather than sub-domains, and consider moving sub-domains into sub-folders with appropriate redirects etc.
The thing that’s changed is that I think that I can now say this is in line with Google’s official view, whereas it used to be at odds with their official line.
Another area that I was personally curious about going into our conversation was about how John’s role fits into the broader Google teams, how he works with his colleagues, and what is happening behind the scenes when we learn new things directly from them. Although I don’t feel like we got major revelations out of this line of questioning, it was nonetheless interesting:
For example, I was curious to know whether, when we ask a question to which John doesn’t already know the answer he reviews the source code himself, turns to an engineer etc. Turns out:
- He does not generally attend search quality meetings (timezones!) and does not review the source code directly
- He does turn to engineers from around the team to find specialists who can answer his questions, but does not have engineers dedicated to webmaster relations
For understandable reasons, there is a general reluctance among engineers to put their heads above the parapet and be publicly visible talking about how things work in their world. We did dive into one particularly confusing area that turned out to be illuminating - I made the point to John that we would love to get more direct access to engineers to answer these kinds of edge cases:
Concrete example: the case of noindex pages becoming nofollow
At the back end of last year, John surprised us with a statement that pages that are noindex will, in the long run, eventually be treated as nofollow as well.
Although it’s a minor detail in many ways, many of us felt that this exposed gaps in our mental model. I certainly felt that the existence of the “noindex, follow” directive meant that there was a way for pages to be excluded from the index, but have their links included in the link graph.
What I found more interesting than the revelation itself was what it exposed about the thought process within Google. What it boiled down to was that the folk who knew how this worked - the engineers who’d built it - had a curse of knowledge. They knew that there was no way a page that was dropped permanently from the index could continue to have its links in the link graph, but they’d never thought to tell John (or the outside world) because it had never occurred to them that those on the outside hadn’t realised it worked this way [emphasis mine]:
Our conversation covered quite a wide range of topics, and so I’ve included some of the other interesting snippets here:
Googlers don’t necessarily know what you need to do differently in order to perform better, and especially in the case of algorithm updates, their thinking about “search results are better now than they were before” doesn’t translate easily into “sites that have lost visibility in this update can do XYZ to improve from here”. My reading of this situation is that there is ongoing value to the work SEOs to do interpret algorithm changes and longer-running directional themes to Google’s changes to guide webmasters’ roadmaps:
I’ve speculated that there is a long-run trend towards less explainability of search rankings, and that this will impact search engineers as well as those of us on the outside. We did at least get clarity from John that at the moment, they primarily use ML to create ranking factors that feed into more traditional ranking algorithms, and that they can debug and separate the parts (rather than a ML-generated SERP which would be much less inspectable):
And yes, they do have secret internal debugging tools, obviously, which John described as:
We had a bit of back-and-forth on result counts. I get that Google has told us that they aren’t meant to be exact, and are just approximations. So yeah, sure, but when you sometimes get a site: query that claims 13m results, you click to page 2 and find that there are only actually 11 - not 11m, actually just 11, you say to yourself that this isn’t a particularly helpful approximation. We didn’t really get any further on this than the official line we’ve heard before, but if you need that confirmed again:
On domain authority question that I mentioned above (not the Moz Domain Authority proprietary metric, but the general concept of a domain-level authority metric), here’s the rest of what John said:
I talked above about the new clarity we got on the sub-domain vs. sub-folder question and John explained some of the “is this all one site or not” thinking with reference to safe search. If you run a site with not safe for work / adult content that might be filtered out of safe search and have other content you want to have appear in regular search results, you could consider splitting that apart - presumably onto a different sub-domain - and Google can think about treating them as separate sites:
I wanted to see if I could draw John into acknowledging why marketers and webmasters might want or need rank tracking - my argument being that it’s the only way of getting certain kinds of competitive insight (since you only get Search Console for your own domains) and also that it’s the only way of understanding the impact of algorithm updates on your own site and on your competitive landscape.
I struggled to get past the kind of line that says that Google doesn’t want you to do it, it’s against their terms, and some people do bad things to hide their activity from Google. I have a little section on this below, but John did say:
But the ToS thing causes him problems when it comes to recommending tools:
It was nice to end with a nice shout out to everyone working hard around the industry, as well as a nice little plug for our conference [emphasis mine, obviously]:
Everything above is about things we learned or confirmed about search, or about how Google works. I also learned some things about what it’s like to conduct an interview, and in particular what it’s like to do so on stage in front of lots of people.
I mean, firstly, I learned that I enjoy it, so I do hope to do more of this kind of thing in the future. In particular, I found it a lot more fun than chairing a panel. In my personal experience, chairing a panel (which I’ve done more of in the past) requires a ton of mental energy on making sure that people are speaking for the right amount of time, that you’re moving them onto the next topic at the right moment, that everyone is getting to say their piece, that you’re getting actually interesting content etc. In a 1:1 interview, it’s simple: you want the subject talking as much as possible, and you can focus on one person’s words and whether they are interesting enough to your audience.
In my preparation, I thought hard about how to make sure my questions were short but open, and that they were self-contained enough to be comprehensible to John and the audience, and allow John to answer them well. I think I did a reasonable job but can definitely continue practicing to get my questions shorter. Looking at the transcript, I did too much of the talking. Having said that, my preparation was valuable. It was worth it to have understood John’s background and history first, to have gathered my thoughts, and to have given him enough information about my main lines of questioning to enable him to have gone looking for information he might not have had at his fingertips. I think I got that balance roughly right; enabling him to prep a reasonable amount while keeping a couple of specific questions for on the day.
I also need to get more agile and ask more follow-ups and continuation questions - this is hard because you are having to think on your feet - I think I did it reasonably well in areas where I’d deliberately prepped to do it. This was mainly in the more controversial areas where I knew what John’s initial line might be but I also knew what I ultimately wanted to get out of it or dive deeper into. I found it harder where I found it less expected that I hadn’t quite got 100% what I was looking for. It’s surprisingly hard to parse everything that’s just been said and figure out on the fly whether it’s interesting, new, and complete.
And that’s all from the comfort of the interrogator’s chair. It’s harder to be the questioned than the questioner, so thank you to John for agreeing to come, for his work in the prep, and for being a good sport as I poked and prodded at what he’s allowed to talk about.
I also got to see one of his 3D-printed Googlebot-in-a-skirt characters - a nice counterbalance to the gender assumptions that are too common in technical areas:
There are a handful of areas where I wish I’d thought quicker on my feet or where I couldn’t get deeper than the PR line:
"Kind of like Search Console"
I don’t know if I’d have been able to get more out of him even if I’d pushed, but looking back at the conversation, I think I gave up too quickly, and gave John too much of an “out” when I was asking about their internal toolset. He said it was “kind of like Search Console” and I put words in his mouth by saying “but better”. I should have dug deeper and asked for some specific information they can see about our sites that we can’t see in Search Console.
John can “kinda see where [rank tracking] makes sense”
I promised above to get a bit deeper into our rank tracking discussion. I made the point that “there are situations where this is valuable to us, we feel. So, yes we get Search Console data for our own websites, but we don't get it for competitors, and it's different. It doesn't give us the full breadth of what's going on in a SERP, that you might get from some other tools.”
We get questions from clients like, "We feel like we've been impacted by update X, and if we weren't rank tracking, it's very hard for us to go back and debug that." And so I asked John “What would your recommendation be to consultants or webmasters in those situations?”
(Yes, I did point out at this point that we’d happily consume an API).
Ultimately, John acknowledged that “maybe there are ways that [Google] can give you more information on what we think is happening” but I felt like I could have done a better job on pushing for the need for this kind of data on competitive activity, and on the market as a whole (especially when there is a Google update). It’s perhaps unsurprising that I couldn’t dig deeper than the official line here, nor could I have expected to get a new product update about a whole new kind of competitive insight data, but I remain a bit unsatisfied with Google’s perspective. I feel like tools that aggregate the shifts in the SERPs when Google changes their algorithm and tools that let us understand the SERPs where our sites are appearing are both valuable and Google is fixated on the ToS without acknowledging the ways this data is needed.
Are there really strong advocates for publishers inside Google?
John acknowledged being the voice of the webmaster in many conversations about search quality inside Google, but he also claimed that the engineering teams understand and care about publishers too:
I would have liked to have pushed a little harder on the changing “deal” for webmasters as I do think that some of the innovations that result in fewer clicks through to websites are fundamentally changing that. In the early days, there was an implicit deal that Google could copy and cache webmasters’ copyrighted content in return for driving traffic to them, and that this was a socially good deal. It even got tested in court [Wikipedia is the best link I’ve found for that].
When the copying extends so far as to remove the need for the searcher to click through, that deal is changed. John managed to answer this cleverly by talking about buying direct from the SERPs:
I should have pushed harder on the publisher examples - I’m reminded of this fantastic tweet from 2014. At least I know I still have plenty more to do.
So. Thank you John for coming to SearchLove, and for being as open with us as you were, and thank you to everyone behind the scenes who made all this possible.
Finally: to you, the reader - what do you still want to hear from Google? What should I dig deeper into and try to get answers for you about next time? Drop a comment below or drop me a line on Twitter.