Can Robots.txt Be Used to Remove Duplicate Content?

  • Thread starter CraigHardy
  • Start date


May 11, 2021
Reaction Score
  • #1
Do you have a duplicate content problem on your website that you suspect has caused a drop in rankings? Have you been searching the internet to learn more about this problem? Have you been reading things like, "There's no such thing as a 'Duplicate Content Penalty'"? From what I've learned and experienced, while there may be no such thing as a duplicate content penalty, having a substantial amount of duplicate content (pages) on your website sure can reduce it's rankings. I remember one instance I experienced in 2008 where I had to change the URLs of an entire site. All the page content stayed the same, but the URLs changed. A day or two after I put the change in place, with 301 redirects all set up, the rankings for the site plummeted down to nearly nothing. That lasted for approximately two weeks until they rebounded. The problem was that right after I made this change, Google crawled all the new URLs before crawling the 301 redirects. Google thought that I had two versions of the website with duplicate URLs for each page. It didn't like that and dropped the ranking for the entire site. I'm talking from around 2,000 page views per day to 10. It was drastic.

Multiple URLs for the same content can be horrifically damaging to a website's rankings. We all know this. So really, the question is, what can we do about it?

The answer to this question really depends on your situation. Some folks suggest we use 301 redirects. Some suggest we use the canonical tag. Some suggest removing the duplicate content outright. I suggest using the robots.txt file to block the duplication, if feasible. Why do I give this suggestion? Let me tell you.

For the past two decades, I've tried multiple cures for the duplicate content illness and the only thing I've found to work is a good ol' hard block in the robots.txt file. So if you have the offending content in one directory and it's competing with content from another directory, go ahead and add one of the directories to this file to block it. I used to have a website that had product pages and then ancillary pages that contained the same information, but with larger product images. This was classic duplicate content. To deal with this, I added a canonical link element that pointed back to the original page. Did the canonical element make any difference? No, it did not. After months of having these tags in place for each product page and image page, none of the duplicate pages merged into the primary pages. I even checked many times inside of the Google Webmaster Console to see which pages Google had chosen as the canonicals. None of them. Google didn't see them as duplicates and decided to keep them separate, which made the website's traffic suffer. After this, I added the noindex meta tag to the duplicate pages, which made matters worse. I'll never use that tag again. After decades of experimenting with it, I've never experienced a positive result.

Some people say to give the canonical link element time to resolve. Yeah right. If Google doesn't see the page as a duplicate, even if there's duplicate content on it, it will simply ignore the canonical reference. The noindex tag works at hiding the page, but it does nothing to fix the ranking problem. The only thing I've ever done to fix any type of ranking problem that was caused by on-site duplicate content has been to block the duplicate pages in the robots.txt file. It'll take some time for Google to pull the offending pages out of the index, but it will and this does work. I've seen it happen many times. I know it goes against all the advice you're reading out there, but I can only speak from my own experiences.

And don't even get me going on 301 redirects. I hate those things.