It’s easy to be fooled into thinking SEO is just about link building or ranking first for specific keywords.
While those are important factors, and staying up on best practices is essential, resolving duplicate content issues should be your top priority.
Often, the hidden cause of lost rankings and decreasing traffic isn’t that someone else does better at link building or keyword optimization.
On the contrary, the problem lies in finding and fixing issues on our sites that prevent searchers from finding us.
And when it comes to duplicate content, the devil is in the details — finding it and fixing the problem it causes on our sites. Luckily, you have control over your website, so you have the power to fix it. That’s precisely what I’m covering today.
Duplicate content simply refers to identical chunks of content on different web pages. If it’s a sentence or a phrase, it’s not usually an issue. After all, there are only so many ways to say, “Contact us about our services.”
What’s more, if you frequently write about similar topics, you most likely have some common phrases
For instance, on my blog, I talk extensively about brand messaging, my framework for copywriting and content creation, and marketing strategy.
And if you read several articles from me — whether on my blog or a guest post — you’ll find several places where I may repeat my explanations of my approach.
That’s not what has the potential to cause problems unless you use deceptive tactics and behaviors to move your site up in the search rankings, which can cause a red flag for Google and its spam policies.
However, when you have entire articles, pages, or sections repeated word for word, or when multiple page versions are indexed, it can be challenging for Google and other search engines to know which articles to prioritize.
And because search engines rarely show duplicate pieces of content, they choose the best version for each search, which may differ from the page you most want to drive traffic to.
It’s important to note that people think of duplicate content in two ways — internal and external.
In this article, I’m primarily focusing on internal duplicate content, not content plagiarized elsewhere on the internet, which is a growing concern with the rise of AI.
However, in the tools section, I’ll also show you how to find and resolve duplicate content issues from across the web.
In my experience, one of the biggest problems with duplicate content lies in even knowing that it exists on your site in the first place. When we create articles or products for our sites, it’s easy to hit publish and move on.
Sure, I might come back and update the article or make an edit or two, but by and large, with everything else going on, it’s not always top of mind to regularly check for duplicate content. But it should be.
Because it’s a case of “you don’t know what you don’t know,” and as I always say, it’s what you don’t know that causes you the most significant problems. And if you’re unaware of a problem, it’s impossible to fix it.
If you start checking for duplicate content regularly, it’s essential to understand how to fix it and what causes it in the first place.
According to Beth Bovie from Revelo, “Duplicate content errors may come in the form of duplicate title tags and meta descriptions, as well as content within an article. After auditing one client’s site, I found they had over 800 duplicate content errors.”
With that in mind, let’s dive into some of the biggest offenders for causing duplicate content.
URLs can often contain additional parameters because of how they are being tracked (marketing campaign IDs, analytics IDs), or the CMS a website uses adds its custom parameters.
I see this a lot when email marketing software or social media posts appends tracking to links when people click out of platforms.
For example, the following URLs could all lead to the same page:
Often, a web page will have an option to produce a printer-friendly version of that page. I often see these links leading to duplicate content issues on websites I visit.
For example, the following URLs would lead to the same page.
Sites may often track a user’s session across their website so they can tailor content. While this happens across many different industries, I see this frequently in e-commerce sites like Amazon.
In the example below, Amazon personalizes content to remind me of my recent searches.
And where Amazon stores my cart until I either buy, remove, or save items for later, other e-commerce stores remember what I added to my shopping cart the last time I visited.
When this happens, the site often appends session IDs to the URL, which causes duplicate versions of a page to exist. And although Amazon likely has solutions for this issue, smaller e-commerce businesses may not.
The example below illustrates what URLs leading to the same page might look like:
4. Repetitive Product Descriptions
E-commerce sites, in particular, can have a lot of repetitive product descriptions, whether due to having similar products or products that live in multiple categories.
The example below illustrates something I’ve seen on many retail sites, where a specific product is often found in multiple groupings.
While the product page and its content remain the same, you find it with three different URLs—one featuring a collection, one featuring best sellers, and another featuring items on clearance.
Jarik Oosting of SmartRanking sees this often, agreeing that this type of duplicate content “jeopardizes SEO rankings and creates confusion among users.”
While the most obvious duplicate content issues stem from decreased searchability, there are other repercussions for the same content.
If a search engine doesn’t know what page to index, it might index all or none of the pages, creating problems for your searchability.
When search engines encounter multiple versions of a page, they might struggle to assign link authority accurately, which can decrease the authority of your content and entire site.
When search engines don’t know what version of the page to rank for a relevant search query, not just one but all of your relevant pages might appear further in the results.
Additionally, if users go to the wrong link, they may not get the exact information they’re looking for.
When search engines are confused by duplicate content, they may display older — possibly outdated — content to your audience, which can decrease consumer trust in your brand.
Of all the SEO experts I connected with, I found Anatolii Ulitovskyi of Unmiss had the best summary of these problems.
Ulitovskyi explained that duplicate content ultimately “confuses search engines and dilutes the authority of your website, leading to lower rankings, decreased organic traffic, and a negative impact on user experience.”
When I connected with Kyle Roof, founder of High Voltage SEO, he said, “I’ve found that a combination of automated tools and manual checks works best to identify and rectify duplicate content.
Being familiar with your CMS can also offer insights into potential sources of duplication.”
However, if you’re wondering how to check for duplicate content, there are several tools you can use for your site. Here are three of the best out there.
Siteliner scans entire websites to identify duplicate content, broken links, and other issues.
Using it is almost a no-brainer. It’s as easy as typing in your site URL, waiting a few minutes, and then getting a comprehensive report that highlights specific areas of your site to fix.
And I find it very affordable. The freemium version gives you up to 250 pages free, and additional credits are $0.01 each.
What we like: “Siteliner is my go-to tool. It’s fast, offers a comparative analysis, and pinpoints the exact duplicated content segments,” says Ajay Porwal of DroidOwl.
In addition to affordability, I’ve found it incredibly easy to use. I love that Siteliner provides detailed information on duplicate content. It calls out your top issues and shows you how to fix them.
Best for: It’s great for beginners and pros alike — and it’s so easy to identify areas on your sites to fix.
Pro tip: After identifying your duplicate content and determining which pieces to fix first, take a look at the other areas of your site. The better your site experience, the better your site will perform in search results.
While not specifically a duplicate content tool, Google Search Console is relatively easy to set up and gives you great insights into the health and performance of your site.
By clicking on “Search Results” under “Performance,” you can see the most visited and clicked-on URLs. As you look through these, you can keep an eye out for any pages that have duplicate versions by watching for things like
HTTP vs. HTTPS
WWW vs. no WWW
An end slash
Additionally, Search Console gives you valuable data on site indexability, including if there are any reasons Google isn’t able to index pages.
What we like: I find it’s usable for people with even the most minimal tech understanding, making it one of my favorite tools for improving website searchability.
I also particularly love the ability to see what search terms people use the most to find your site so you can prioritize updating posts that are direct to them.
Best for: Google Search Console is great for people who likely have minimal duplicate content or don’t publish a ton of new content on a regular basis. It’s also fantastic for people who are tiptoeing into the world of duplicate content.
Pro tip: Watch your email every month for Google Search Console updates on your site, and use this as a reminder to prioritize a few new site improvements.
You can download the Screaming Frog web crawler and use it to crawl 500 pages for free. This application lets you do a lot of different things, including finding duplicate content problems.
Many of the SEO experts I connected with, including Janis Thies of SEOlutions, recommend Screaming Frog. Thies says, “It’s by far the best tool for a complete crawl and overview of your technical data.”
What we like: Screaming Frog is incredibly comprehensive. Here are some of the ways it works.
You can find duplicate page titles by simply clicking on the tab “Page Titles” or “Meta Description” and filtering for “Duplicate.”
You can also find pages that have multiple URL versions by simply clicking on the “URL” tab and sorting by “Duplicate.”
Best for: This is probably one of the best solutions out there. It’s ideal for people with a little more technical know-how who know what to do with the duplicate content they find.
Pro tip: For a complete guide on all the different things you can do with Screaming Frog, check out this post from SeerInteractive.
Earlier, we touched on the difference between internal and external duplicate content. With that in mind, here are our favorite tools for checking for duplicate content outside of your site.
Grammarly is known for helping people write clearly and concisely.
Many people aren’t aware that their Business Plan (currently $12/month with an annual plan), also features a plagiarism checker to confirm that your content doesn’t appear elsewhere on the web.
Simply click the option on the bottom of the right-hand toolbar (if using the web version).
What we love: If you’re already using Grammarly, it’s an easy way to make sure that content doesn’t appear elsewhere on the internet.
Best for: Grammarly’s plagiarism checker is great for content that you’re about to publish, but you can also go back to previously published content, paste it into the tracker, and get a sense of any outside duplications.
Pro tip: Their plagiarism detector sometimes flags things that aren’t actually duplicated content. So take it with a grain of salt and look for big-picture items.
In the below example, the checker flagged one phrase in my article on brand messaging. The phrase? “Instead of forcing a square peg into a round hole.”
In addition to it being common, the site it referenced was one about a recently passed bill in Illinois.
All in all, it’s good news. I know that the press release referenced is in no way duplicating my content (or vice versa).
CopyScape is my favorite external duplicate content checker. It’s as easy as dropping your URLs into the search box and finding out if and where any duplicate content exists.
But what it can do goes even deeper.
CopyScape Premium allows you to upload or paste entire articles to find out where else your content may have been shared (and indexed) for roughly $0.01/100 words.
Plus, they have a tool—CopySentry—that will check specific pages every week or day, depending on the options you choose.
What we love: CopyScape is easy to use, inexpensive, and up-to-date. It’s less likely to pull random phrases and more likely to identify social media shares.
Best for: I find that CopyScape is one of the best tools out there for ensuring that no one has lifted your content. The most technical knowledge you need is copy and paste to start checking your content.
Pro tip: Run anything you’re about to publish through CopyScape to ensure that you haven’t accidentally lifted a phrase in your research or excitement about a source!
By now, I’ve shown you how duplicate content can impact your organic traffic and web rankings. But, as James Maxfield of Dark Horse explains, most “duplicate content work is just housekeeping, tidying up your site to improve how Google crawls and indexes it.”
With that in mind, now it’s time to show you that it’s also something that you can easily fix. Here are four ways you can start “tidying” things up.
Using the canonical tag, you can tell search engines what version of a page you want to return for relevant search queries. The canonical tag is found in the header of a web page.
The canonical tag is the best approach when you want to have multiple versions of a page available to users. If you’re using the HubSpot COS, this will be taken care of automatically, so no manual labor is required.
If you’re not using HubSpot, you’ll need to go into the <head> section of the primary page and add <link rel=“canonical” href=”https://blog.hubspot.com/marketing/https://www.example.com/main-url-goes-here”/>
Even though the URL in the browser bar might read https://www.example.com/main-url-goes-here?source=email-TN1.0-S-sendtosite, the canonical tag ensures that the primary page gets the authority.
A 301 redirect will redirect all legacy pages to a new URL. It tells Google to pass all the link authority from these pages to the new URL and to rank that URL for relevant search queries.
The 301 redirect is the best option when you don’t have any need for multiple versions of a page to be available.
If you’re using WordPress, there are several plugins that can help you set up redirects.
My favorite is simply called 301 Redirects.
Setting up a redirect with this plugin is as easy as typing in the URL you want to redirect away from and the one you want people to go to instead.
You can use meta tags to tell search engines not to index a particular page.
<Meta Name=”Robots” Content=”noindex, nofollow”>
Meta tags work best when you want that page to be available to the user but not indexed, e.g., terms and conditions.
As a HubSpot user, it’s easy to add noindex tags. Here’s a quick overview.
If the page with duplicate content is causing you massive headaches and you can’t resolve it easily or quickly enough using the other methods, you can use Google Search Console to request that Google remove content from its search.
Go into Console > Indexing > Removals and make a new request.
From there, click New Request, and choose “Temporarily Remove URL” if you want something to be removed for around six months.
Alternatively, if you’re changing the content on a page and want to be sure that the current snippet is cleared until the next crawl, choose the “Clear Cached URL” option.
Generally speaking, most people and businesses won’t need this.
That said, there may be times when you need to make sure content isn’t appearing in search any longer, and I find that people gain peace of mind knowing that there’s an option in their back pockets.
When resolving my duplicate content issues, I’ve found that the best offense is a good defense.
George Bates of Limelight Digital agrees, saying, “We’ve found that by adopting a holistic approach that combines automated tools with manual audits, we can more effectively locate and resolve duplicate content issues.”
How often should you check in on your website health, including duplicate content? Every SEO and web expert will likely give you a different answer.
My response is that it depends — where your site is hosted, how much content you have, and how frequently you publish new content.
At a minimum, I’d recommend reviewing any automated reports at least once a month and then doing a more detailed analysis periodically.
That can include:
And, if you find that other sites are regularly lifting your content, it’s possible to disable the “Copy text” function when people right-click on your site, which can make it significantly more difficult for them to plagiarize your content.
The bottom line is that duplicate content is a real problem for sites, but one that can be easily solved using the advice above.
If you want to learn more about duplicate content, watch this video series from the SEO experts at Dejan SEO on how you can fix it for your site.
And if you’re looking for more SEO tips, check out this article from Hubspot’s own SEO experts.
While every company in the world is trying to adopt AI in one way or another, you can show potential employers you know what you’re talking about with artificial intelligence certifications. Instead of getting your education from YouTube University, these artificial intelligence certification picks make your resume legitimate and help you stand out in a...
Photo courtesy of Mariia Shalabaieva on Unsplash Opinions expressed by Digital Journal contributors are their own. With 2023 coming to a close, it’s time to look back on the year that was. TikTok certainly had an eventful one as it dominated the social media scene. It kicked off various global trends, keeping everyone entertained throughout the year’s...