Posted by BritneyMuller
Log file analysis can provide some of the most detailed insights about what Googlebot is doing on your site, but it can be an intimidating subject. In this week's Whiteboard Friday, Britney Muller breaks down log file analysis to make it a little more accessible to SEOs everywhere.
Hey, Moz fans. Welcome to another edition of Whiteboard Friday. Today we're going over all things log file analysis, which is so incredibly important because it really tells you the ins and outs of what Googlebot is doing on your sites.
So I'm going to walk you through the three primary areas, the first being the types of logs that you might see from a particular site, what that looks like, what that information means. The second being how to analyze that data and how to get insights, and then the third being how to use that to optimize your pages and your site.
For a primer on what log file analysis is and its application in SEO, check out our article: How to Use Server Log Analysis for Technical SEO
So let's get right into it. There are three primary types of logs, the primary one being Apache. But you'll also see W3C, elastic load balancing, which you might see a lot with things like Kibana. But you also will likely come across some custom log files. So for those larger sites, that's not uncommon. I know Moz has a custom log file system. Fastly is a custom type setup. So just be aware that those are out there.
So what are you going to see in these logs? The data that comes in is primarily in these colored ones here.
So you will hopefully for sure see:
So log files traditionally house all data, all visits from individuals and traffic, but we want to analyze the Googlebot traffic. Method (Get/Post), and then time taken, client IP, and the referrer are sometimes included. So what this looks like, it's kind of like glibbery gloop.
It's a word I just made up, and it just looks like that. It's just like bleh. What is that? It looks crazy. It's a new language. But essentially you'll likely see that IP, so that red IP address, that timestamp, which will commonly look like that, that method (get/post), which I don't completely understand or necessarily need to use in some of the analysis, but it's good to be aware of all these things, the URL requested, that status code, all of these things here.
So what are you going to do with that data? How do we use it? So there's a number of tools that are really great for doing some of the heavy lifting for you. Screaming Frog Log File Analyzer is great. I've used it a lot. I really, really like it. But you have to have your log files in a specific type of format for them to use it.
Splunk is also a great resource. Sumo Logic and I know there's a bunch of others. If you're working with really large sites, like I have in the past, you're going to run into problems here because it's not going to be in a common log file. So what you can do is to manually do some of this yourself, which I know sounds a little bit crazy.
But hang in there. Trust me, it's fun and super interesting. So what I've done in the past is I will import a CSV log file into Excel, and I will use the Text Import Wizard and you can basically delineate what the separators are for this craziness. So whether it be a space or a comma or a quote, you can sort of break those up so that each of those live within their own columns. I wouldn't worry about having extra blank columns, but you can separate those. From there, what you would do is just create pivot tables. So I can link to a resource on how you can easily do that.
But essentially what you can look at in Excel is: Okay, what are the top pages that Googlebot hits by frequency? What are those top pages by the number of times it's requested?
You can also look at the top folder requests, which is really interesting and really important. On top of that, you can also look into: What are the most common Googlebot types that are hitting your site? Is it Googlebot mobile? Is it Googlebot images? Are they hitting the correct resources? Super important. You can also do a pivot table with status codes and look at that. I like to apply some of these purple things to the top pages and top folders reports. So now you're getting some insights into: Okay, how did some of these top pages resolve? What are the top folders looking like?
You can also do that for Googlebot IPs. This is the best hack I have found with log file analysis. I will create a pivot table just with Googlebot IPs, this right here. So I will usually get, sometimes it's a bunch of them, but I'll get all the unique ones, and I can go to terminal on your computer, on most standard computers.
I tried to draw it. It looks like that. But all you do is you type in "host" and then you put in that IP address. You can do it on your terminal with this IP address, and you will see it resolve as a Google.com. That verifies that it's indeed a Googlebot and not some other crawler spoofing Google. So that's something that these tools tend to automatically take care of, but there are ways to do it manually too, which is just good to be aware of.
All right, so how do you optimize for this data and really start to enhance your crawl budget? When I say "crawl budget," it primarily is just meaning the number of times that Googlebot is coming to your site and the number of pages that they typically crawl. So what is that with? What does that crawl budget look like, and how can you make it more efficient?
Lastly, it's really helpful to connect the crawl data with some of this data. So if you're using something like Screaming Frog or DeepCrawl, they allow these integrations with different server log files, and it gives you more insight. From there, you just want to reevaluate. So you want to kind of continue this cycle over and over again.
You want to look at what's going on, have some of your efforts worked, is it being cleaned up, and go from there. So I hope this helps. I know it was a lot, but I want it to be sort of a broad overview of log file analysis. I look forward to all of your questions and comments below. I will see you again soon on another Whiteboard Friday. Thanks.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!