In the Local SEO biz, we spend a lot of time fixing duplicate business listings. Duplicate records of your business appearing throughout the Local Search ecosystem can cause a variety of issues like messing with your Google Local rankings, causing you to lose business, and may even make you feel like you’re experiencing a zombie apocalypse.
The problem is no matter what we do, dupes often come back from the dead to haunt us. Why is that?
TL;DR Warning: Time to eat your local data vegetables.
Your typical big local search publisher (like YP.com) aggregates business listings data from a variety of sources — often as many as 50!
These sources are varied, including data aggregators like InfoGroup and Neustar Localeze, government entities like the IRS, web crawling data, user-submitted info and many more.
Each source has its own data hygiene issues. InfoGroup may have three separate listings for a business with different phone numbers and addresses. The IRS might have two slightly different business names for the same business. Publishers use a Matching algorithm to try to merge this data, but across a few billion records, inevitably Skynet Local misses a lot of these issues, which often leads to an SEO extinction event.
To compensate for problems with the matching algorithm, publishers use a process called Conflation, where different data elements are ranked by source.
For example, a user-submitted phone number might not be as trusted as a phone number submitted by the IRS.
The conflation process leads to a business record where each piece of data may be from a different source (e.g., the business name is from the IRS, the address from InfoGroup, the phone number from web crawling, etc.).
This record then gets published in what’s called The View, the temporary set of data that appears on a publisher’s live site. The key word here is “temporary.” The next time the publisher reruns the process, all of the elements could change, including that updated address you just submitted.
The way most SEOs solve this problem is to manually squash the dupes at both the publishers and the data aggregators via claiming, merging and deletion tools (if they are even available).
Or they may try reporting the issue via a contact form and waiting for a team in the Philippines to deal with it when they take a break from
comment spamming link building. Many publishers punt on the issue and send the request downstream to the data aggregators to deal with it, which they may or may not do.
The problem with this approach is that you need to guess all of the sources ingested by a publisher. If there’s just one mistake, the next time the publisher runs their match process, the dupe will be created and start appearing again in Google like a zombie looking for dinner.
And even if you guess all of the data sources for a particular publisher, if just one of those sources itself gets a new data source with dupes, those dupes would flow back up to the publisher, and you could end up right back where you started from.
So now you know why that dupe you just pick-axed in the skull keeps showing up and trying to rip your intestines out.
Fight the dead. Fear the living.