Below is a URL with a PHPSESSIONID in it.
Bad URL: http://example.com/index.php?PHPSESSID=242c489fb4a1bc79f5cf365988167e4dNow I do not know about you but the above URL is not very pretty to myself and for most users it doesn't say much about what the location of the URL is about. It truly does look very ugly as well. Now what if you had a second URL created by a SEF plugin of sorts and both are being indexed by search engines? The other one looks like the one below.
Pretty url: http://example.com/article/useful-article.html
Simply add the following disallow to your robots.txt
Disallow: /*?This will make sure that nothing after the question mark gets indexed by the search engines. However your pretty URL will be still be indexed.
In some situation where you make use of a suffix such as .html at the end of URL's you can end up getting a session id appended to the end of the .html portion. This is just very distracting and confusing for most users.
An even uglier url:
http://example.com/article/useful-article.html?PHPSESSID=242c489fb4a1bc79f5cf365988167e4dAgain use the same rule in the robots.txt file.
With all the advances in search technology it still baffles me to this day how in 2010 the time of publishing this post. Most search engines still do not do automatic canonicalization of URL's when they find duplicate content. Yet those same search engines are almost on the brink of artificial intelligence themselves. This is a very quick and dirty robots.txt hack but does work as a permanent solution to the problem. Adding extensions or modules to your content management system or scripts just slows down your Website and adds unnecessary bloat I think. What we have found is that this method will give you one single URL per content item, that search engines cannot possibly get confused with.Disallow: /*?
You may want to remove the old urls from search engine after this. For older Website you do need to be more careful about redirecting users appropriately from the old urls to the new one's with a 301 redirect in your .htaccess file.
But the result can make your Website look something like this.
Here's an example of how to do the 301 redirect in .htaccess
Redirect permanent /example.php?catid=123 http://example.com/example.html
Should you wish to remove the old content from the search engines you can request a URL removal inside your Webmaster account at most of the major search engine providers. Since you probably will not want around any more particularly if it's still a very new Website which is who this tutorial is focused on.
The relevant Webmaster accounts to request URL Removal are below:
Google: http://google.com/webmaster
Yahoo: http://siteexplorer.search.yahoo.com/
Bing: http://www.bing.com/webmaster
To WWW or Not to WWW.
Now we move on to a section that I honestly detest debating. Why you might ask? The reasoning behind it is that this conversation has just about as many lovers of the WWW as the lovers of the No WWW. Both with very relevant arguments.
The yes WWW fans are usually those with old Website or old habits.
Pick one?
The no WWW fans are more Apache performance focused freaks like me who realize that most people are either in a hurry or just lazy (like me). But I choose not to make Apache pay for human mistakes whereever possible.
The reality however is that if a Website started its life on www.example.com you should keep it that way. If it did not it still boils down to personal preference. I just choose not to do it because of the performance hit on Apache making an additional redirect request.
Here is how to convert from NO WWW to WWW with .htaccess
RewriteEngine on
# 301 redirect to domain to 'www.'
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
and
Here is how to convert from WWW to NO WWW with .htaccess
RewriteEngine on
# 301 redirect to domain without 'www.'
RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
But remember in the end this is all up to personal preference but you should pick a side and stick to it.
If you have any other suggestions for removing duplicate please leave a comment and let others know about it.

