Many people are more afraid of duplicate content than they are of spammy links.
There are so many myths around duplicate content that people actually think it causes a penalty and that their pages will compete against each other and hurt their website. I see forum posts, Reddit threads, technical audits, tools, and even SEO news websites publishing articles that show people clearly don’t understand how Google treats duplicate content.
Google tried to kill off the myths around duplicate content years ago. Susan Moska posted on the Google Webmaster blog in 2008:
Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.
You can help your fellow webmasters by not perpetuating the myth of duplicate content penalties!
Sorry we failed you, Susan.
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.
People mistake duplicate content for a penalty because of how Google handles it. Really, the duplicates are just being filtered in the search results. You can see this for yourself by adding &filter=0 to the end of the URL and removing the filtering.
Adding &filter=0 to the end of the page URL on a search for “raleigh seo meetup” will show me the exact same page twice. I’m not saying Meetup has done a good job with this, since they actually indicate the two versions (HTTP and HTTPS in this case) are both correct in their use of canonical tags, but I think it does show that the exact same page (or similar pages) are actually indexed, and only the most relevant is being shown. It’s not that the page is necessarily competing or doing any harm to the website itself.
According to Matt Cutts, 25 to 30 percent of the web is duplicate content. A recent study by Raven Tools based on data from their site auditor tool found a similar result, in that 29 percent of pages had duplicate content.
Many great posts have been published by Googlers. I’m going to give you a summary of the best parts, but I recommend reading over the posts as well.
Deftly dealing with duplicate content
Duplicate content due to scrapers
Google, duplicate content caused by URL parameters, and you
Duplicate content summit at SMX Advanced
Learn the impact of duplicate URLs
Duplicate content (Search Console Help)
The solution will depend on the particular situation:
There are some things that could actually cause problems, such as scraping/spam, but for the most part, problems would be caused by the websites themselves. Don’t disallow in robots.txt, don’t nofollow, don’t noindex, don’t canonical from pages targeting longer-tail to overview-type pages, but do use the signals mentioned above for your particular issues to indicate how you want the content to be treated. Check out Google’s help section on duplicate content.
Myths about duplicate content penalties need to die. Audits, tools and misunderstandings need correct information, or this myth might be around for another 10 years. There are plenty of ways to consolidate signals across multiple pages, and even if you don’t use them, Google will try to consolidate the signals for you.