There you are. A big redesign or CMS migration is looming, and you’re ready to unleash a crawl on the new site in a test environment. You fire up your favorite crawling tool and trigger the crawl… and it lasts all of three seconds.
Yes, there’s an obvious problem. The staging server is protected and behind some type of security measure that inhibits you from freely crawling it. Sweat begins to form on your brow as you wonder how you will get the crawl completed.
At this point, you could choose to manually check all the pages, but you might end up in a padded room whispering something about header response codes taking over the world.
Or you could continue to click “crawl” and repeatedly crawl a login page, but that won’t help either. OR you could snap out of it and figure out a way to crawl the site in staging, which would enable you to analyze the crawl data and save SEO. Yes, that’s the ticket.
Some of you might be saying, “Hey, this is easy to get around!” Well, it’s important to understand that it’s not so easy sometimes. In my experience, I’ve helped a number of clients that used a staging setup that was just not easy to access and crawl. And for those situations, you might need to use alternative methods.
Below, I’ll cover five methods for crawling a staging server ranging from using basic authentication to VPN access to creating custom user agents. I’ll end with some key takeaways and tips. Let’s begin!
If the staging server is using basic authentication, then you’ll be happy to know that the top crawling tools support this method when setting up a crawl.
For example, my favorite crawling tools are DeepCrawl (where I’m on the customer advisory board) and Screaming Frog. Both tools provide the option to provide login details so you can crawl away.
Handling Basic Authentication in DeepCrawl:
Selecting the “Request Authentication” setting in Screaming Frog:
I’ve had some clients that keep their staging servers behind a firewall (on their company network and not publicly available). For a situation like that, I’ve sometimes been given VPN access so I could crawl the server. Once I connect via VPN, I could crawl away with any tool that was local (working on my systems in my office).
The upside is you can crawl staging with local tools. The downside is that you probably can’t use enterprise-level crawlers which aren’t located on your own network. And that could be important, especially if it’s a large-scale website.
Accessing a staging server via VPN:
I’ve also had some clients that used a staging platform that redirected all users to a common login, which then redirects you back to the specific staging server you wanted to access. Unfortunately, many of the tools that support basic or digest authentication will not work here, as the redirect throws a wrench into situation.
But you could request that the platform whitelist your IP address for the staging server you are trying to access. Your client would simply be providing access to your specific IP address to the staging server for a short period of time — for example, one day, or just a few days of access — while excluding all other IPs.
You’ve heard of Googlebot and Bingbot, but have you heard of GSQiBot? That’s one of the custom user agents I’ve set up for client crawls. Using the top crawling tools, you can create a custom user agent that you can pass along to your clients.
Then they can whitelist that specific user agent while blocking all other access. It’s similar to the IP address method, but it whitelists a user agent versus an IP address.
Setting up a custom user agent in DeepCrawl:
Setting up a custom user agent in Screaming Frog:
Yes, you read that correctly. In certain situations, I’ve had to go old-school and actually visit clients “in real life.” Whoa, the horror!
If staging is not accessible from the outside, and your client will not open up access for some reason, then you might have to go visit their office.
Once you do, you can crawl away from within their network. This obviously has some geographic restraints, but I’ve done this before for clients located in the Northeast. (I’m in Princeton, NJ.)
Now that I’ve covered five different ways to crawl a staging server, I’ll provide some key takeaways and tips based on my experience helping clients.
As I explained earlier, it’s critically important to crawl staging before key changes are pushed to production. You could very well uncover SEO technical problems during the crawl that will cause serious issues if pushed live.
My recommendation is to gain access to staging at all costs. The good news is that there are several methods you can choose from, as I documented above. Work with your client, and with their dev team, to gain access. That’s how you win. Now crawl away.
The post 5 Ways To Crawl A Staging Server Before Important Site Changes Go Live (To Save SEO) appeared first on Search Engine Land.