Sam's Remove & Prevent duplicate content with the canonical tag V1.3.2
This script will dynamically read the query string calling the page, remove any parameters that have/may cause duplicate content issues with search engines and generate a canonical tag with the modified query string. The search engines will effectively interpret this as a 301 redirect, there will be no effect on the page or url for your visitors.
- Produces a correctly formatted canonical tag as per your current url (seo or otherwise).
- Search engines will modify their index according to the tag, removing any duplicates.
- Any ranking spread caused by the duplicates will be consolidated onto the correct page.
- Any un=listed query string param's will remain un-effected, barring ensuring a correctly formatted string.
- Very simple short code addition
- Will also optionally remove any duplicate content issues with ssl pages.
- Any instances of the osCsid param will be removed.
- Default removed param's are: currency, language, page, sort, ref, affiliate_banner_id & osCsid.
If you have an existing install of a meta tag contribution, the install only requires the addition of a single line of code and one added function.
Purpose
If Google etc. can find multiple paths to a single product its likely that the url for that page will vary according to the path taken, that will mean the search engines will consider it a separate page, but as its the same as another with a different url your site will be penalized for having duplicate content.
This situation arises in osC due to paging, sorting, multiple currencies & maybe even multiple languages. Also if you have any links to your site with the osCsid attached you have the same problem & other issues, you should enable Prevent Spider Sessions and update spiders.txt.
This add-on prevents the problem and removes it if it exists by adding a new tag into you page <head> section called a canonical, the search engines see this & will use that in preference to the pages actual url. This removes any parameters from the query string appearing in that link that have been defined as creating the issue.
Version History
Version 1.0
- 12.2009 - Initial version.
Version 1.1
- 12.2009 - Expanded server support, with many thanks to the code provided by Robert Fisher (FWR Media).
Added contols for xhtml content & ssl duplicated pages.
Version 1.2
- 01.2010 - Modified code to make adding parameters for removal easier, operation is exactly the same & nothing else is effected.
Version 1.3
- Modified code to allow removal of params in specific pages. Code & idea by hobbynet
- Modified code to allow option of removal of index.php from the uri.
- Added partial compatibilty with 'search-engine-frindly-url (still in development)', note issues could arise, ie if a param is set but has no value.
Version 1.3.1
- Bug Fix: Added test for empty page array.
Version 1.3.2
- Modified regex to allow for url encoded chars within param values.
Support, questions - issues - etc
Please post any queries or issues in the support thread found here.
Installation
Please read through all the instructions first & make sure you understand them, if your unsure about anything please read the tips given here.
If you have any errors or problems following your install please also read the tips given here before you rush to post your problem. .
That's it, your done!
Usage
If you use a xhtml doctype change the $xhtml = false
parameter to $xhtml = true
, that will mean the closing tag will be /> instead of >.
If you have no 'catalog' pages on a secure server (ie only account etc. pages are there) set the 'SSL'
parameter to 'NONSSL'
, then if anyone has linked any catalog pages as https in error, the search engines will still index as non-ssl, so no duplicate. Account etc. pages will not receive a tag as they should not be index-able, ie blocked by your 'robots.txt'.
If you wish to increase or change the removed param's, edit the line:$remove_array = array( 'currency','language','main_page','page','sort','ref','affiliate_banner_id','max');
You must ensure you keep to the existing format, ie ,'my tag'
To remove page specific params add to the $page_remove_array, should be in same format as previous, given is manufacturers_id & cPath will be be removed for product_info.php only and no specific removals for index.php.
Its possible the you will get a duplicate content issue with my-domain.com & my-domain.com/index.php as these are one & the same page, you can use .htacces to deal with this:RewriteCond %{THE_REQUEST} !^POST RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.*)index\.php(.*)\ HTTPS/ [NC] RewriteRule ^index\.php(.*)$ http:\/\/pivht-supplements.com/$1 [R=301,L]That is normally sufficient, however on some servers there can still be an issue, if so enable removal of index.php from the uri by setting $rem_index to true in the code.
If you have any errors or problems following your install please also read the tips given here before you rush to post your problem. .
-----
Tested on PHP 4 & 5, SQL 4 & 5, osC 2.2 ms2, rc1 & rc2a and is register_globals off compatible.
Note: you will not see anything change on the page, the tag goes in the page head section, so only the search engines see it, you will need to view your pages generated source to see it.
Support thread will be found here.