By Michael P. Ehline, Esq. – As an attorney, you already should have a basic understanding of copyright law. As a member of the Circle of Legal Trust, you probably already know how duplicate content on crappy sites can dilute your brand and hurt your rankings. But I wanted to do a treatise on this issue for the lawyers who don’t quite get it, so here goes. Search engine optimization is important and this is a tip that can make a huge difference in rankings, since duplicate content is not permitted or wanted, in the search engine guidelines.
Google and other search engines do not want to index duplicate content pages that are on your website. Doing this takes up room in their database, and uses up bandwidth, which slows down their crawling and indexing process. In the end, it actually makes it better for us, since duplicate content would appear on the search engine results page over and over.
Search Engine Concerns about Duplicate Content
There are many reasons search engines, including Google care about duplicate content, and removing these pages will allow the search engine to center on indexing better pages on the website. Spammers create millions of useless web pages, which fill up the search engine database. Search engines with quality results, is the one you will want to use.
Google and Identifying Duplicate Content
Is there a difference in what you believe is duplicate content and what Google and other search engines consider it? Google uses a scientific way to determine what is duplicate and what is not. They do this by looking for main identifiers, which encourages them to look deeper into the web pages that are duplicates. It is unknown how Google really does it, but based on published documents, Google starts with finding similar URL’s and similar or the same title tags of web pages. When they find those, then they use other techniques of checking and comparing the web pages. If you want to learn more about Google finding and identifying duplicate content there are a few resources that can be used, including:
- Sanford Paper: Detecting duplicates for web crawling and Google process for detecting near duplicate content.
- There are Google patents, which discuss duplicate content and to find them, search Google for these titles:
- Detecting duplicate and near duplicate files.
- Google patents regarding duplicate content.
- Methods and apparatus for estimating similarity.
How to Find Duplicate Content on Your Web Site
Identifying duplicate content on your website, there are several ways to do this. One way of doing it, is to search for it, using Google or your favorite search engine, including Yahoo and MSN, for the duplicate content. When using Google, you can try using “site:domain.com” command, where you put your domain.com as your name.
When looking at the search results you should see the unique title tags. Seeing any pages that have the same title tag, should show up in the results. Then you should look at the number of pages that Google has indexed, along with the ones Yahoo and MSN have indexed. There will be a varied number of indexed web pages, going from search engine to search engine. If there is a large different, then you will want to investigate the reason why. As a rule, Google will show less indexed pages, than the other search engines, since they tend to remove duplicate pages they find. It might be possible to find out more information by looking at Google’s Webmaster Tools. Yahoo offers a similar service for webmaster’s that could provide you with more information.
Other Tactics You can Use
There are some other things that you can do, and they include:
- Search for similar tags, related meta description tags and meta keyword tags. Google Webmaster tools and Yahoo Site Explorer, both will provide information, if you have verified the site with them.
- Mirror websites, the first thing you need to determine, is how many websites you own, and do the domain names have a redirect with a 301 Permanent Redirect, to the main website. This is not directly related to seeking duplicate content, it can provide a way to make a list of all the domain names you have and a way to keep track of expiration dates.
- There is a chance with “light on indexable content,” or web pages that have little content on them, might be considered to be duplicate web pages. This is normally web pages that have a few paragraphs of unique content, but not a lot of text.
An example of this is an eCommerce website that has short product descriptions, which are characteristically considered to be duplicate web pages. This happens even though they have different product photos, are a different color, different sizes and other features. The reason, is the search engines compare each web page as a whole, to another. The only way around this is to add more text to the product descriptions, and if necessary hire a writer to write more content.
- Generally duplicate webpages can be found as “print” versions of an article or page, or “email to a friend.” What can be done, is to use the noindex meta tag and the robots.txt file on the pages to disallow indexing for one of the versions of the web page.
- Care must be taken with blog archives, themes and templates,along WordPress themes, which is especially known for creating duplicate content on websites. One of the things that can be done is to remove the internal links on the WordPress theme and the robots.txt file can be used, as well as the noindex tag to disallow the content from being indexed. When there are no links to the archives, usually the search engine will not find those duplicate web pages.
- Canonicalization, can be fixed by choosing one version of the website, with either www.mydomain.com or domain.com,, since not doing this can mean having duplicate versions of your website. Once this is determined, then set up a 301 Permanent Redirection from one version of the website to the other.
- When looking for multiple URL’s with the same content, it is recommended to look for “Session ID’s” and remove them. This is a choice between removing the session ID’s and rewriting the URL’s.
What Happens When You Find a Domain Name Causing the Problem
If you find a domain name causing duplicate content, then there are some things you need to do, to correct this.
- Redirect the domain names causing duplicate content, with a 301 Permanent Redirect to the main domain name.
- When the duplicate content is on your website, disallow the duplicate content, using the robots.txt.file. It is also possible to delete or redirect the content, with a 301 redirect.
- When it is content that is acceptable, print version and other, then you can use the robots.txt.file and noindex meta tag. This is used to let search engine spiders ignore the content that is duplicate.
- Determine if it is harming your business, and rankings from the search engines. Then search for the title tag or by searching a sentence on the page in quotes, this will show your web page, as it shows up in the search engine results. Doing this, you should be the first originator of the content, and in the event you are not, then you may want to consider having the content removed.
- When someone has copied the websites content, or even a part of the content, there are things you can do.
- Search for the content, or a sentence on the page, using quotes.
- When hiring a writer, to write content, check it in Copyscape.com to ensure the content is unique. Using Copyscape, it is possible to check your website and web pages to see what turns up. There is also a new free too I discovered that can help in a similar way to Copyscape, called Dmca.com. Here is the link to the tool : http://www.dmca.com/takedowns.aspx (tool is in upper right)
- Set up alerts in Google and Yahoo for a phrase or sentence from your website. This way, if someone else copies your content, you will be notified and can take immediate action.
Copied Content—What to Do
Copied content must be determined, if it is harming your business, or if it is due to syndication of your blog, which would make the content acceptable.
- Getting credit for the content, is another thing to determine, does your content link back to your website? If it does, then it should not be problematic. Another thing to look for in this situation, is if the content was indexed on your site first.
- Ranking first for the title of the page, will be alright, and then you can determine, if you want to take the time to have the content removed from another person’s website.
- First request the website owner remove the content, if you determine it should be removed. In the event that they do not remove it, ignore you or are unpleasant about the situation, then you may want to think about filing a DMCA request.
- Filing the DMCA request with their web host and the search engines, will force the content to be removed from their website.
- There are some DMCA templates that can be used for the letter that is necessary to send, which can be helpful.
There are ways to protect content, and it begins with prevention tactics, that will protect the content on your website. The first thing that should be done, if you live in the United States, is to file a copyright with the U.S. Government. In other countries there are ways to file a copyright in your country.
- The current cost of filing a copyright, is $35. And can be done using an online application at www.copyright.gov, or the necessary paperwork can be filed in writing, with the U.S. Copyright office.
- If you don’t have the time, or are not fond of writing, consider hiring a writer to add more content to your website. When you do hire a writer, check their content on Copyscape, to ensure it is unique. In some cases, site owners think it is easier and cheaper to copy someone else’s content from their website. This isn’t true, in the end, it will cost much more, than hiring a writer.
- Try to ensure any new content is crawled and indexed first. This can be done by placing new links to the pages on your home page, they do not have to be permanent, but will get it crawled in just a few days.
- Then the links can be removed, the idea is to get the content indexed and submitting the content on social media sites, is a way of forcing the content to be crawled quickly.
In the end duplicate content can cause your search engine rankings to bottom out, since search engines like Google commonly remove web pages from their index, when they suspect they are duplicate pages. Begin with identifying any duplicate content on your website and remove or redirect it. This kind of content can also be handled with a noindex meta tag, telling the search engine to ignore the page. The next issue will be to deal with people who copy the content, this must be done effectively and in a timely manner. Putting this off can only be destructive to your own web site pages.
Posts by Michael Ehline
- Not the First Time Google Abused Such Power?
- PR, Social Media, Content Marketing & SEO – A World of Rapid Changes
- How Will Google's EU Fines Affect PPC Bids?
- EU Slaps Google with More Antitrust Allegations
- Google Lawyers Up Over Extensive Probe
- Fight Between EU and Google Just Warming Up
- Tech Lobbying Money a Troubling Trend
- Google Seeks Self Driving Car Safety Exemption
- The "Right to be Forgotten" and Legal Precedent
- Gmail a Potential Security Minefield