NOTE: This story has been revised from when it was originally published in October 2015 to reflect the latest information.
Yesterday, news emerged that Google was using a machine-learning artificial intelligence system called “RankBrain” to help sort through its search results. Wondering how that works and fits in with Google’s overall ranking system? Here’s what we know about RankBrain.
The information covered below comes from three original sources and has been updated over time, with notes where updates have happened. Here are those sources:
First is the Bloomberg story that broke the news about RankBrain yesterday (See also our write-up of it). Second, additional information that Google has now provided directly to Search Engine Land. Third, our own knowledge and best assumptions in places where Google isn’t providing answers. We’ll make clear where these sources are used, when deemed necessary, apart from general background information.
RankBrain is Google’s name for a machine-learning artificial intelligence system that’s used to help process its search results, as was reported by Bloomberg and also confirmed to us by Google.
Machine learning is where a computer teaches itself how to do something, rather than being taught by humans or following detailed programming.
True artificial intelligence, or AI for short, is where a computer can be as smart as a human being, at least in the sense of acquiring knowledge both from being taught and from building on what it knows and making new connections.
True AI exists only in science fiction novels, of course. In practice, AI is used to refer to computer systems that are designed to learn and make connections.
How’s AI different from machine learning? In terms of RankBrain, it seems to us they’re fairly synonymous. You may hear them both used interchangeably, or you may hear machine learning used to describe the type of artificial intelligence approach being employed.
No. RankBrain is part of Google’s overall search “algorithm,” a computer program that’s used to sort through the billions of pages it knows about and find the ones deemed most relevant for particular queries.
It’s called Hummingbird, as we reported in the past. For years, the overall algorithm didn’t have a formal name. But in the middle of 2013, Google overhauled that algorithm and gave it a name, Hummingbird.
That’s our understanding. Hummingbird is the overall search algorithm, just like a car has an overall engine in it. The engine itself may be made up of various parts, such as an oil filter, a fuel pump, a radiator and so on. In the same way, Hummingbird encompasses various parts, with RankBrain being one of the newest.
In particular, we know RankBrain is part of the overall Hummingbird algorithm because the Bloomberg article makes clear that RankBrain doesn’t handle all searches, as only the overall algorithm would.
Hummingbird also contains other parts with names familiar to those in the SEO space, such as Panda, Penguin and Payday designed to fight spam, Pigeon designed to improve local results, Top Heavy designed to demote ad-heavy pages, Mobile Friendly designed to reward mobile-friendly pages and Pirate designed to fight copyright infringement.
PageRank is part of the overall Hummingbird algorithm that covers a specific way of giving pages credit based on the links from other pages pointing at them.
PageRank is special because it’s the first name that Google ever gave to one of the parts of its ranking algorithm, way back at the time the search engine began, in 1998.
Signals are things Google uses to help determine how to rank webpages. For example, it will read the words on a webpage, so words are a signal. If some words are in bold, that might be another signal that’s noted. The calculations used as part of PageRank give a page a PageRank score that’s used as a signal. If a page is noted as being mobile-friendly, that’s another signal that’s registered.
All these signals get processed by various parts within the Hummingbird algorithm to figure out which pages Google shows in response to various searches.
Google has fairly consistently spoken of having more than 200 major ranking signals that are evaluated that, in turn, might have up to 10,000 variations or sub-signals. It more typically just says “hundreds” of factors, as it did in yesterday’s Bloomberg article.
If you want a more visual guide to ranking signals, see our Periodic Table Of SEO Success Factors:
It’s a pretty good guide, we think, to general things that search engines like Google use to help rank webpages.
That’s right. From out of nowhere, this new system has become what Google says is the third-most important factor for ranking webpages. From the Bloomberg article:
RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.
When this story was originally written, Google wouldn’t tell us. Our assumption was this:
My personal guess is that links remain the most important signal, the way that Google counts up those links in the form of votes. It’s also a terribly aging system, as I’ve covered in my Links: The Broken “Ballot Box” Used By Google & Bing article from the past.
As for the second-most important signal, I’d guess that would be “words,” where words would encompass everything from the words on the page to how Google’s interpreting the words people enter into the search box outside of RankBrain analysis.
That turned out to be pretty much right. In March 2016, Google reveled the first two factors were content and links. Or links and content, because it wouldn’t say which was first. For more, see our article:
From emailing with Google, I gather RankBrain is mainly used as a way to interpret the searches that people submit to find pages that might not have the exact words that were searched for.
Yes, Google has found pages beyond the exact terms someone enters for a very long time. For example, years and years ago, if you’d entered something like “shoe,” Google might not have found pages that said “shoes,” because those are technically two different words. But “stemming” allowed Google to get smarter, to understand that shoes is a variation of shoe, just like “running” is a variation of “run.”
Google also got synonym smarts, so that if you searched for “sneakers,” it might understand that you also meant “running shoes.” It even gained some conceptual smarts, to understand that there are pages about “Apple” the technology company versus “apple” the fruit.
The Knowledge Graph, launched in 2012, was a way that Google grew even smarter about connections between words. More important, it learned how to search for “things not strings,” as Google has described it.
Strings means searching just for strings of letters, such as pages that match the spelling of “Obama.” Things means that instead, Google understands when someone searches for “Obama,” they probably mean US President Barack Obama, an actual person with connections to other people, places and things.
The Knowledge Graph is a database of facts about things in the world and the relationships between them. It’s why you can do a search like “when was the wife of obama born” and get an answer about Michele Obama as below, without ever using her name:
The methods Google already uses to refine queries generally all flow back to some human being somewhere doing work, either having created stemming lists or synonym lists or making database connections between things. Sure, there’s some automation involved. But largely, it depends on human work.
The problem is that Google processes three billion searches per day. In 2007, Google said that 20 percent to 25 percent of those queries had never been seen before. In 2013, it brought that number down to 15 percent, which was used again in yesterday’s Bloomberg article and which Google reconfirmed to us. But 15 percent of three billion is still a huge number of queries never entered by any human searcher — 450 million per day.
Among those can be complex, multi-word queries, also called “long-tail” queries. RankBrain is designed to help better interpret those queries and effectively translate them, behind the scenes in a way, to find the best pages for the searcher.
As Google told us, it can see patterns between seemingly unconnected complex searches to understand how they’re actually similar to each other. This learning, in turn, allows it to better understand future complex searches and whether they’re related to particular topics. Most important, from what Google told us, it can then associate these groups of searches with results that it thinks searchers will like the most.
Google didn’t provide examples of groups of searches or give details on how RankBrain guesses at what are the best pages. But the latter is probably because if it can translate an ambiguous search into something more specific, it can then bring back better answers.
While Google didn’t give groups of searches, the Bloomberg article did have a single example of a search where RankBrain is supposedly helping. Here it is:
What’s the title of the consumer at the highest level of a food chain
To a layperson like myself, “consumer” sounds like a reference to someone who buys something. However, it’s also a scientific term for something that consumes food. There are also levels of consumers in a food chain. That consumer at the highest level? The title — the name — is “predator.”
Entering that query into Google provides good answers, even though the query itself sounds pretty odd:
Now consider how similar the results are for a search like “top level of the food chain,” as shown below:
Imagine that RankBrain is connecting that original long and complicated query to this much shorter one, which is probably more commonly done. It understands that they are very similar. As a result, Google can leverage all it knows about getting answers for the more common query to help improve what it provides for the uncommon one.
Let me stress that I don’t know that RankBrain is connecting these two searches. I only know that Google gave the first example. This is simply an illustration of how RankBrain my be used to connect an uncommon search to a common one as a way of improving things.
Back in 2005, Microsoft starting using its own machine-learning system, called RankNet, as part of what became its Bing search engine of today. In fact, the chief researcher and creator of RankNet was recently honored. But over the years, Microsoft has barely talked about RankNet.
You can bet that will likely change. It’s also interesting that when I put the search above into Bing, given as an example of how great Google’s RankBrain is, Bing gave me good results, including one listing that Google also returned:
One query doesn’t mean that Bing’s RankNet is as good as Google’s RankBrain or vice versa. Unfortunately, it’s really difficult to come up with a list to do this type of comparison.
Google did give us one fresh example: “How many tablespoons in a cup?” Google said that RankBrain favored different results in Australia versus the United States for that query because the measurements in each country are different, despite the similar names.
I tried to test this by searching at Google.com versus Google Australia. I didn’t see much difference, myself. Even without RankBrain, the results would often be different in this way just because of the “old-fashioned” means of favoring pages from known Australian sites for those searchers using Google Australia.
Despite my two examples above being less than compelling as testimony to the greatness of RankBrain, I really do believe that it probably is making a big impact, as Google is claiming. The company is fairly conservative with what goes into its ranking algorithm. It does small tests all the time. But it only launches big changes when it has a great degree of confidence.
Integrating RankBrain, to the degree that it’s supposedly the third-most important signal, is a huge change. It’s not one that I think Google would do unless it really believed it was helping.
Google told us that there was a gradual rollout of RankBrain in early 2015 and that it’s been fully live and global for a few months now.
In October 2015, Google told Bloomberg that a “very large fraction” of the 15 percent of queries it normally never sees before were processed by RankBrain. In short, 15 percent or less.
In June 2016, news emerged that RankBrain was being used for every query that Google handles. See our story about that:
All learning that RankBrain does is offline, Google told us. It’s given batches of historical searches and learns to make predictions from these.
Those predictions are tested, and if proven good, then the latest version of RankBrain goes live. Then the learn-offline-and-test cycle is repeated.
Typically, how a query is refined — be it through stemming, synonyms or now RankBrain — has not been considered a ranking factor or signal.
Signals are typically factors that are tied to content, such as the words on a page, the links pointing at a page, whether a page is on a secure server and so on. They can also be tied to a user, such as where a searcher is located or their search and browsing history.
So when Google talks about RankBrain as the third-most important signal, does it really mean as a ranking signal? Yes. Google reconfirmed to us that there is a component where RankBrain is directly contributing somehow to whether a page ranks.
How exactly? Is there some type of “RankBrain score” that might assess quality? Perhaps, but it seems much more likely that RankBrain is somehow helping Google better classify pages based on the content they contain. RankBrain might be able to better summarize what a page is about than Google’s existing systems have done.
Or not. Google isn’t saying anything other than there’s a ranking component involved.
Google told us people who want to learn about word “vectors” — the way words and phrases can be mathematically connected — should check out this blog post, which talks about how the system (which wasn’t named RankBrain in the post) learned the concept of capital cities of countries just by scanning news articles:
There’s a longer research paper this is based on here. You can even play with your own machine learning project using Google’s word2vec tool. In addition, Google has an entire area with its AI and machine learning papers, as does Microsoft.
Also be sure to see our article, How Machine Learning Works, As Explained By Google.