What is it?
Duplicate content occurs when the same chunk of text – typically an article, page content or product description – can be found in more than one location on your website, or on multiple websites.
Specifically, the duplicate content is particularly harmful for SEO when the same content is available on more than one Universal Resource Location (URL).
This is an important distinction, as it counts not just content which occurs on different websites, but duplicate content that appears on more than one page on the same website!
Does it matter?
In general, duplicate content is a bad deal for everyone – your website is likely to suffer in rankings and the search engines provide poorer quality results. It is widely regarded by the subhuman parasites in the SEO industry as Very Bad News.
From an SEO perspective, there are two main issues with duplicate content.
- The search engine must make a decision about which instance of the content to return as the best or original result.
- The ‘esteem’ (shorthand for the all the metrics used to rank the page) that the search engine has attributed to that content will be divided amongst the various versions of the page.
Forms it can take.
There can be all kinds of reasons for duplicate content, but it can sometimes be hard to spot.
- You may simply have the same content on two pages, at someone’s insistence. This is a very overt example of duplicate content.
- You might have a couple of very similar products that use the manufacturer’s product description (which is also present on the manufacturer’s site).
- You might have something systemic, such as a “printer friendly version” option available on every page, which leads to a duplicate of the page at a different URL.
- You might refer to your contact page in your navigation menu with a trailing slash, but in your page content you’ve used the URL without a trailing slash. (Similar issues can occur with capitalisation of URLs)
- You might have configured your site so that http and https versions of the page resolve. (Similar issues can occur with www. and non-www addresses.)
All of these would constitute a duplicate content issue.
What’s the solution?
There’s loads of different solutions for different incarnations of this issue, but these are the main ones.
- Remove the duplicate pages (don’t forget to redirect the old URLs to the ‘real’ page)
- In the case of systemic duplicate content issues, such as trailing slashes, https/http and www/non-www, use sitewide redirects to force one particular format of URLs
- Rewrite the offending content
- Use a rel=canonical tag on all offending pages to explicitly canonicalise one page as the ‘official’ page
- Use a noindex tag to keep the duplicate page from Google’s index (not ideal if the page has some history as you are not consolidating any of the ‘esteem’ we alluded to earlier
- Use Screaming Frog to pick through your site’s URLs and spot duplicate pages.
This last bullet is one of many tests you can run on your website, using free tools. Download our Monthly Website Checklist to get started...