Five Steps To Clean Up Your Links Like A Techie

You’ve been in business for many years. You may have done it all on your own, or you may have enlisted the aid of unscrupulous SEO “services” to built links for you. Either way, you may be wondering whether you need to conduct a link clean up. This column contains a five-step process outlining how to determine if you have a problem, how to analyze it, and then how to clean it up.

But first:

Unnatural Links Warning = Problem

If you get this message, you have a problem. Skip to step two.

An Unnatural Link Warning from Google Webmaster Tools

An Unnatural Link Warning from Google Webmaster Tools

Step One: Check With Google

Obviously, you can’t email Google and ask them if you have some lousy links. But you can get a clue from the Google Webmaster Tools link list. To access it in the new navigation, go here:

Webmaster Tools Links List - New Navigation

Webmaster Tools Links List – New Navigation

Before you download anything, look at the numbers:

google webmaster tools links to your site

Sample of “Links to Your Site”

Compared to the total of 199k links, 76k is a lot of the share. If you have a distribution like this, you may have some cleanup to do.

Also on this screen, look over the list of “Your most linked content.” If a large proportion of the links are pointing directly to your homepage, you may have a problem.

Finally, click on the More >> button below “Your most linked content.” If your source domains seem low for the number of links to a particular content, you probably have what are referred to as “run of site” links. Those are often the source of penalties because they don’t really add any value, and are probably in either an ad, a blogroll, a list of links, or the footer.

Look at how many links you have by domain to spot run-of-site links.

Look at how many links you have by domain to spot run-of-site links.

Notice that I use amusing words like “may” and “might” here. That’s because there are no hard and fast rules dictating which links are “bad” and which aren’t; most of it is a matter of judgment. While the one circled above does look suspicious, what you’d eventually discover (later in the process) is that it is an affiliate link formatted as a 302 redirect — therefore, it does not pass PageRank.

Step Two: Gather Information

Okay, let’s assume you’ve determined that you most likely have a problem, or you’ve received a message that says you do. Where do you begin? The answer, from Google’s own blog, is to start with the link lists in Google Webmaster Tools. Begin by heading to your list of “Who links the most” in Webmaster Tools and downloading the following two reports:

all-domains-download

Combine them in Excel, sort by Col A ascending and de-duplicate them (note that you’ll have to uncheck “First discovered” since that is only available on one of the reports):

remove-duplicates

Now that you have a deduplicated list, there are two things you should do:

  1. Run all of the links through a header checker, like Xenu’s Link Sleuth or Screaming Frog. Filter out all of the non 200 or 301 responses and put them in another list. Don’t delete this — you’ll need it later.
  2. If you have the means, run the list through a script that checks for “nofollow” code on the link or in the meta robots tag. Put these in another list as well.

Now, you should be left with a smaller list that includes followed links with 200 and 301 status codes. These you will have to check manually.

Step Three: Check Links Manually

  1. Load the page in a Web browser:Download the Disavow Links Template
    • If the page does not load (i.e., you get a 404/Not Found error, note it and move on). Double check that the header checker you used works properly, but know that sometimes it is a few days to a week between when you collect the lists and run them through the header checker and when you start manually reviewing links. More links may become obsolete during this time.
    • If the page does not load any content, but you do not get an error, you still need to check it. Sometimes links are hidden with same color text, inside invisible frames, or placed off the page via CSS.
  2. View the source of the page. On most browsers you can right click or type CTRL+U.
  3. Do a find/search for your domain name. Leave off the www in case the link is without the www.
    • If it is not found, mark the link as “Link Removed.”
    • If it is found, check to see if it’s a decent link.
  4. First, search the page for “nofollow.”
    • If it is returned inside the <head> tag, you can mark the link as “No Follow” and move on.
    • If it is returned in the body tag, check to see if it is in the same href tag as one of your links.
      • If it is, mark the link “No Follow” and move on.
  5. If the link is on the page, and it is not nofollowed, note where the link appears and look at it in the HTML page.
  6. If any of the following are true, mark it disavow.
    • The link is with a collection of unrelated links.
    • The link is on the right or left sidebar or the footer (not in the main content).
    • The link is in the comments.
    • The link does not appear on the page (means it’s hidden).
    • The page has a link anywhere to “submit a link” or “submit an article” or something similar.
    • The page looks like gibberish, spam, or like it was created for the sole purpose of SEO (it might use keywords really heavily, it might list a site’s PR, or say it is SEO friendly).
  7. Finally, if the page passes all of these tests, do the “smell” test.
    • Does the link add value to the site’s visitors? It’s probably ok.
    • Does the link seem like it was shoehorned into an article about something else? It’s probably not ok.
    • Does the link seem like it was included due to a paid arrangement? It’s probably not ok.
  8. Mark the link Disavow or Ok, provide a reason based on one of the above, and move on. Don’t skip the reason; you may find yourself doing this a second and even third time after Google’s response, so you don’t want to have to re-check anything!

Step Four: Domains Or Links?

Now for the last step, and this one gets confusing. You’ve collected and checked a list of all the links that Google reported in those downloads, but you’re not done. You still have two things to do:

  1. Download  the list  of all domains: all-domains-download2
  2. Check these domains against your existing list.

When you download all the domains, you’ll notice that you only get a list of base URLs, like website.com. To check these against the list you’ve already made, add a new column in your spreadsheet labeled “Base Domain.”

Open up a new worksheet (this is important) and copy the list of links into it. Select [Data], [Text to Columns], [Delimited] and then make the delimiter a [/]. This will leave you with a list of base domains that you just need to clean up, possibly find and replace www. and then paste back into the “base domain” column. As long as you don’t sort or delete anything, the list will match up exactly with your list of links.

How to Delimit by / in Excel

Crash Course on Delimiting

Now, go to the very last record in your list of links. Underneath it, paste the list of domains from Google.

Sort by Col A descending, remove duplicates, and you’ll be left with a list of domains that aren’t already represented in your list of links. Check these the same way you did the links in Steps Two and Three.

Step Five: Clean Up & Take Stock

When you finish this process, you should have four main lists:

  1. Link Removed: The link was either removed from the site and the site is still around, or the page the link was on was removed, but the domain is still working. If you’ve contacted someone and successfully had them remove a link, you should list it here.
  2. Domain Removed: The domain either doesn’t exist anymore or it has nothing on it.
  3. No Follow: The link or the page the link was on has been nofollowed, or there is a 302 redirect between the link and your website.
  4. Disavow: These are the links you were not able to remove, but don’t want counting against you.

There are many services out there that will help you get links removed, which is what Google says they want you to do. But in most cases, reaching out to webmasters and asking them to remove the link is a fool’s errand. It’s extremely time-consuming and often unsuccessful — most sites where a webmaster would actually respond to you are ones that you want to keep your link on, maybe just have them add a nofollow.

If you know of directory submissions you can remove or paid links you can stop paying for or add nofollows to, you should absolutely do that. But, most webmasters don’t have that option. In addition, most of those services make you pay by the link, so going through this effort first will save you money based on the number of links that need to be checked.

Next time, I’ll show you how to properly format a reconsideration request and a disavow report. There’s bound to be more qualified link experts who take a different approach, but this is how a self-proclaimed techie approaches the problem. Best of luck, and leave your ideas and feedback in the comments!