The SEO technology space could benefit tremendously from the establishment of technical standards. Implementation of Google’s own specifications is inconsistent within our tools and can lead less-experienced SEOs to believe their sites are in better shape than they are.
In the same way that the W3C rallied around the definition of protocol standards in 1994 and the Web Standards Project (WaSP) standardized coding practices in 1998, it’s our turn to button up our software and get prepared for what’s coming next.
Stop me if you’ve heard this one. On December 4, I received an email from DeepCrawl telling me that my account was out of credits. That didn’t make any sense, though, because my billing cycle had just restarted a few days prior — and, frankly, we haven’t really used the tool much since October, as you can see in the screen shot below. I should still have a million credits.
Logging in, I remembered how much I prefer other tools now. Noting the advancements that competitors like On-Page.org and Botify have made in recent months, I’ve found myself annoyed with my current subscription.
The only reason I still have an account is because historical client data is locked in the platform. Sure, you can export a variety of .CSVs, but then what? There’s no easy way to move my historical data from Deep Crawl to On-Page or Botify.
That is because the SEO tools industry has no technical standards. Every tool has a wildly different approach to how and what they crawl, as well as how data is stored and ultimately exported.
As SEO practitioners, a lot of what we do is normalizing that data across these disparate sources before we can get to the meat of our analyses. (That is, unless you take everything the tools show you at face value.) One might counter that many other disciplines require you to do the same, like market research, but then you’d be ignoring the fact that these are all just different tools storing the same data in different ways.
As far as migrating between platforms, it is only the enterprise-level providers, such as Searchmetrics, Linkdex, SEOClarity, Conductor and BrightEdge that have systems in place for migration between each other. This, however, still requires customized data imports to make it happen.
Every industry has some sort of non-profit governing body that sets the standard. Specific to the web, we have five main governing bodies:
Yet there is no governing body for SEO software in that way. This means that SEO tools are essentially the Internet Explorer of marketing technology, deciding which standards and features they will and will not support — seemingly, at times, without regard for the larger landscape. Harsh, but true.
If you dig into certain tools, you’ll find that they often do not consider scenarios for which Google has issued clear guidelines. So these tools may not be providing a complete picture of why the site is (or is not) performing.
Generally, the development of standards occurs when an organization or group of organizations gets together to decide those standards. If the standard is ultimately deemed viable and software companies move forward with implementation, users tend to gravitate toward that standard and vote with their wallets.
So what’s preventing the SEO tools industry from getting together and issuing technical standards? A few things…
The establishment of standards benefits the SEO community, as well as the clients and sites that we work on. There is really no benefit to the tool providers themselves, as it will require them to make changes that are otherwise not within their roadmap (or technical changes that they have decided against for any other reasons). It also sets them up to lose customers due to the ease of moving between platforms.
Ultimately, the value of technical standards for SEO tools comes down to better capabilities, better user experience and encouraging more competition around creative features. But more specifically, it helps with the following:
So where does the standardization process begin? What needs to be consistent across platforms in order for SEO tools to meet these needs? That’s up for debate, of course, but here are my suggestions:
Ideally, there would be a common understanding of how all the different link metrics in the space can be translated to one another. The technical hangup here is two-fold.
One, each provider has used its own estimations that follow and then diverge from the original PageRank algorithm, their own proprietary formulae, which are not public. Two, they each crawl a different segment of the web.
The first problem becomes irrelevant if all the link providers were to crawl the Common Crawl and publicize the resulting data.
The Common Crawl is a public archive whose latest iteration features 1.72 billion pages. Anyone can download and process it as a means of web analysis. (In the past, I led projects where we used the Common Crawl as a corpus to extract influencer data and to identify broken link opportunities. But I digress.)
If Moz, Majestic and Ahrefs publicly processed the Common Crawl, they could all provide each other’s metrics or, more realistically, users could convert Ahrefs and Majestic metrics into the more widely understood Moz metrics themselves.
One caveat is that Moz now provides seed URL lists to the Common Crawl, and I’m unclear on whether that may create a bias to the study. I suspect not, because all the link indices would be limited to crawling just the Common Crawl URLs in this scenario.
While this open link metrics idea is likely a pipe dream, what may be more realistic and valuable is the establishment of a new set of provider-agnostic metrics that all link indices must offer.
Sure, they all give us the number of linking root domains and total number of links, but new quality measures that can tie all the datasets together after you’ve de-duplicated all the links would make the collective data infinitely more usable.
Google’s crawling capabilities have come a long way. Aside from Screaming Frog, to my knowledge, all SEO tools still crawl the way they always have. All SEO tools perform analysis based on downloading the HTML and not rendering the page.
Under the Gateway specification, crawling tools would be required to present you with the option of how you’d like to crawl rather than only letting you specify your user agent.
Under the hood, these crawling tools would be required to use Headless Chromium or headless QTWebkit (PhantomJS) in addition to the text-driven crawlers, with the goal of emulating Google’s experience even more closely.
No matter what the crawl provides, a standard should be specified that the columns are delivered in a standard order from all crawl providers. They should all export in the same format, potentially called a .CDF file. This would define the minimum specification for what needs to be included in these exports and in what order.
However, we would not want to limit a tool provider’s ability to deliver something more, so the export file could indeed include other columns of data. Rather, all tools would be required to import up to a certain column.
Personally, I believe we need to rethink rankings as an industry. Rankings report on a context that doesn’t truly exist in the wild and ignores specific user contexts. The future of search is more and more about those specific user contexts and how they dramatically influence the results.
In fact, I’d propose that rankings should be open and available to everyone for free. Since Google is not going to provide that, it’d be up to a group of folks to make it happen.
We’re all stealing rankings from Google through a means that inflates search volume; each tool has its own methodology. What if, instead, there was a centralized data store where rankings were pulled via distributed means or sophisticated botnets that everyone could access, thus allowing anyone access to full SERP data? The tool providers, then, would be challenged to deliver enhancements to make that data more valuable.
STAT used to offer a Codex which gave free rankings on over 200,000 keywords. I believe that was a big step in the right direction toward my ideal. I also believe STAT is a great example of a company that enhances the data and allows you the ability to further customize those enhancements.
Nonetheless, I would love to see the minimum specification for rank tracking from all providers account for:
Despite the fact that Google moved from strings to things years ago, there are still people examining search through the lens of keyword density and H1 tag targeting. Google has announced that entity analysis is where they start the understanding of query.
The following image illustrates how they approach this. In the example, they break the query, “Who was the US President when the Angels won the World Series?” into the entities US President, Angels and World Series, then systematically improve their understanding of the concepts until they can link their relationship and solve the problem.
SEO tools are not consistently at this level of sophistication for content analysis. NLP, TF*IDF and LDA tools have replaced the concept of keyword density, but most crawling tools are not weighing these methods in their examinations of pages.
The minimum specification of a crawling tool should be that it extracts entities and computes topic modeling scores. A primary barrier to this happening in the case of TF*IDF is the availability of rankings, as the calculation requires a review of other ranking documents, but the open rankings initiative could support that effort.
Naturally, these are my opinions and, taken another way, this article could be misconstrued as my feature request list for the SEO tools industry. That’s exactly what it should not be.
Rather, this should be a collaborative effort featuring the best and brightest in the space to establish a standard that grows with the needs of modern SEO and the ever-changing capabilities of search engines.
The tool providers could get together to develop standards the same way the search engines came together to develop Schema.org. However, the lack of value for the tool providers makes that unlikely. Perhaps a group of agencies or the search industry media could get together and make this happen. These folks are more objective and don’t have a vested interest in those companies themselves.
Or someone could just start this and see who ends up contributing.
All that said, I have created a draft, called the Gateway Specification, following a similar format to the W3C HTML specification on GitHub. Although there is a bit of a barrier to entry in picking Git to manage this, I’ve decided that it’s the better way to go to start. A specification of this sort will need to be discussed in depth, and GitHub provides the facilities to do so.
To get involved, you’ll need to fork the repository, make whatever edits or additions to the document, and then submit a pull request. All those steps are outlined here. Please submit your pull requests, and let’s get the standards party started!
The post How and why the SEO tools industry should develop technical standards appeared first on Search Engine Land.