Article Summary: People have been arguing over duplicate content for a considerable time now, but just how do we define duplicate content and is it really important?
(c) Don Saunders
The argument over just what duplicate content is and whether duplicate content matters has been underway for a long time now and there is no sign that it is going away. So just how do we define duplicate content and should we worry about it?
The generally accepted view is that duplicate content indeed matters and, though one well known SEO expert recently wrote an article opposing this view, even a quick look at the huge mass of material which has been published on the topic recently will clearly reveal that this is very much a minority view.
If we agree that duplicate content is important, then how should we define duplicate content? For example, if I compose an article for submission to an article directory and then re-work that same article for submission to a second directory how will the search engines examine my two articles and decide whether they contain duplicate content? The truth is that we don't know, but here is this webmaster's view.
When checking for duplicate content was first introduced by the search engines it was very much a matter of comparing one web page as a whole with another and there was no attempt to start dissecting the pages and comparing individual page elements. In those days you could make use of identical content and merely add an introductory and concluding paragraph to one of the pages and that would be enough to escape any duplicate content penalty. Sadly for many webmasters those days are long gone.
Nowadays, the search engines dissect the two pages and examine individual elements and it is here that we find the core of today's argument. Most experts agree that attention is concentrated on the main content of a page rather than the structure of the page. Many website designers make use of templates when making their pages which define the structure of each page including such things as headers, footers and menus. This is generally believed to be accepted and the search engines do not view this as duplicate content. What the search engines are examining is the central content contained within the body of the page. But just how do they compare this page content?
Some people believe that this examination is carried out at 'block' level (checking individual paragraphs or sentences), but other people contend that filters look for phrases or even individual words. None of us really knows of course but it would seem reasonable to conclude that the likeliest basis for comparison would be to make use of either phrase or sentence matching.
Sentence matching is relatively clear-cut and simply means breaking both pages down into chunks on the basis of the page's punctuation. Take a look, for example, this sentence:
It is relatively easy to get a good deal on a computer, providing you know how to haggle.
This would be viewed as either one single sentence or two sentences, depending upon whether you use the strict definition of a full-stop as being the end of a sentence or choose to adopt a flexible approach which would make use of other punctuation marks, like commas.
Matching at the phrase level is a little bit more complex. What is the definition of a phrase? Should it be made up of 2 words or 3 words or 4 words or more?
Let us assume for a moment that we are going to define a phrase as 3 words. In this case the phrases below would all be viewed as duplicate content if they were to appear on two pages which were being checked:
Take a look
Did you know
In the end
In those days
Day to day
These five phrases are all regular everyday phrases which could be used on pages about building a greenhouse, learning to swim, search engine optimization or anything else you care to mention. Now there are some people who would say that the search engines do compare pages down to this level. To illustrate this, when I questioned the staff of one particular duplicate checker (Dupecop) about the basis on which they checked for duplicate content they said:
"DupeCop compares both individual words and 3-word phrases. It also ignores all punctuation and scans across sentences"
I was not surprised therefore that when I ran several articles through this software (comparing articles on the subject of Christmas decorations against articles about gun dogs) I found they had an average of 25% of duplicate content!
Against this background, I think that it would be ridiculous that the search engines would have their filters set to this level. So how low would the filters be set? At 4, 5, 6 words? To be honest, your guess is as good as mine.
Over the last few years I have published literally hundreds of articles and have watched the results in terms of duplicate content penalties, as far as anyone can do so. On the basis of my own experience I believe that filtering is not carried out clear down to the level of short phrases but probably ends at the sentence level. Thus, as long as you change content down to this level, you should have no problem in avoiding the filters. Indeed, even if one or two of sentences are duplicated you will be okay.
Article Source: http://www.upublish.info
About the Author:
Don Saunders
WebMarketingCentre.com provides information on article writing and article submission and is also an article directory where you can pick up a free online article for your website or ezine and to which you can submit articles on a wide variety of topics including article marketing and much more.
Keywords: Don Saunders, articles, article writing, content, duplicate content, re-writing articles, unique content, Dupecop
**NOTE** - Don Saunders has claimed original rights on the article "What Is Duplicate Content And Should You Worry About It?" ... if there is a dispute on the originality of this article ... please contact us via our Contact Form and supply our staff with the appropriate details of dispute.
Don Saunders Article Feed : http://www.upublish.info/rssauthor/12524.xml
Author
