Pages

Tuesday, May 12, 2009

How to Remove Duplicate content Urls from all search engines

Removing duplicate content from all search engines can be a tiresome and tedious uphill battle for many a Webmaster. So in my post today I am going to discuss a few of my personal favorite ways to remove duplicate content from search engines.

Below is a URL with a PHPSESSIONID in it.

Bad URL: http://example.com/index.php?PHPSESSID=242c489fb4a1bc79f5cf365988167e4d
Now I do not know about you but the above URL is not very pretty to myself and for most users it doesn't say much about what the location of the URL is about. It truly does look very ugly as well. Now what if you had a second URL created by a SEF plugin of sorts and both are being indexed by search engines? The other one looks like the one below.

Pretty url: http://example.com/article/useful-article.html

Simply add the following disallow to your robots.txt
Disallow: /*?
This will make sure that nothing after the question mark gets indexed by the search engines. However your pretty URL will be still be indexed.

In some situation where you make use of a suffix such as .html at the end of URL's you can end up getting a session id appended to the end of the .html portion. This is just very distracting and confusing for most users.

An even uglier url:
http://example.com/article/useful-article.html?PHPSESSID=242c489fb4a1bc79f5cf365988167e4d
Again use the same rule in the robots.txt file.
Disallow: /*?
With all the advances in search technology it still baffles me to this day how in 2010 the time of publishing this post. Most search engines still do not do automatic canonicalization of URL's when they find duplicate content. Yet those same search engines are almost on the brink of artificial intelligence themselves. This is a very quick and dirty robots.txt hack but does work as a permanent solution to the problem. Adding extensions or modules to your content management system or scripts just slows down your Website and adds unnecessary bloat I think. What we have found is that this method will give you one single URL per content item, that search engines cannot possibly get confused with.

You may want to remove the old urls from search engine after this. For older Website you do need to be more careful about redirecting users appropriately from the old urls to the new one's with a 301 redirect in your .htaccess file.
But the result can make your Website look something like this.

Here's an example of how to do the 301 redirect in .htaccess

Redirect permanent /example.php?catid=123 http://example.com/example.html

Should you wish to remove the old content from the search engines you can request a URL removal inside your Webmaster account at most of the major search engine providers. Since you probably will not want around any more particularly if it's still a very new Website which is who this tutorial is focused on.
The relevant Webmaster accounts to request URL Removal are below:

Google: http://google.com/webmaster
Yahoo: http://siteexplorer.search.yahoo.com/
Bing: http://www.bing.com/webmaster


To WWW or Not to WWW.


Now we move on to a section that I honestly detest debating. Why you might ask? The reasoning behind it is that this conversation has just about as many lovers of the WWW as the lovers of the No WWW. Both with very relevant arguments.

The yes WWW fans are usually those with old Website or old habits.
Pick one?

The no WWW fans are more Apache performance focused freaks like me who realize that most people are either in a hurry or just lazy (like me). But I choose not to make Apache pay for human mistakes whereever possible.

The reality however is that if a Website started its life on www.example.com you should keep it that way. If it did not it still boils down to personal preference. I just choose not to do it because of the performance hit on Apache making an additional redirect request.

Here is how to convert from NO WWW to WWW with .htaccess


RewriteEngine on

# 301 redirect to domain to 'www.'
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]


and

Here is how to convert from WWW to NO WWW with .htaccess


RewriteEngine on

# 301 redirect to domain without 'www.'
RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]


But remember in the end this is all up to personal preference but you should pick a side and stick to it.

If you have any other suggestions for removing duplicate please leave a comment and let others know about it.

Tuesday, February 3, 2009

Iburst USB modem installation tutorial on Ubuntu 8.04 and 8.10


Iburst USB modem installation tutorial on Ubuntu 8.04 and 8.10
This post was created as a guide to aid others out there compile, install and configure an Iburst USB "Array Com" / "Kyocera" modem in Ubuntu 8.04 Hardy Heron and 8.10 Intrepid Ibex both desktop editions.


First a special thanks goes to Nick Carroll for his original guide. Which this tutorial was inspired by and is a good read for anyone looking to install the PCMCIA Iburst modem in Ubuntu 7.10 Nick I give you almost all credit for what I am about to write and take nothing away from you. Thanks again for all the help! Just wish I had found your post sooner. But here it is for people living in the Republic of South Africa like myself to find a little easier.

Okay to begin I want to ask that you do not plug in your modem at this point!
Everything in green is whats happening in the terminal and bolded is what you should be doing.

Step 1.

Download Ibdriver from http://sourceforge.net/projects/ibdriver
the latest driver at time of writing this is
linux-2.6.ibdriver-1.3 - ibdriver-1.3.4



Step 2.

Next download the Roaring Penguin PPPoE dialler from www.roaringpenguin.com/products/pppoe


Step 3.

Remember to be in the directory you downloaded the files to when attempting this. Now time to extract the driver:
tar -zxvf ibdriver-1.3.4-linux-2.6.24.tar.gz



Step 4.

Next we extract the RP-PPPoE dialler this script just happens to be magic for all those looking to setup PPPoE in Debian/Ubuntu period.
tar -zxvf rp-pppoe-3.10.tar.gz


Step 5.

cd ibdriver-1.3.4-linux-2.6.24

Feel free to use vim or pico instead of nano it's just my preferred editor but which ever makes you feel more comfortable.

nano ib-net.c

Look in ib-net.c for SET_MODULE_OWNER(netdev); and delete the line this is a deprecated call that is no longer in use on later kernels.

Quick hint it's on line 509 so scrolling right down to the bottom of the file and then up a little should make the process a little less painful.

Press CTRL X and say Yes to save.


Step 6.

Now type: make
if everything worked out you can now install
Type: sudo make install

Step 7.

Now might be a good time to tell you that libc6 would be required for this but if you have installed Ubuntu with out removing to many packages it should already be on system.

cd..
cd rp-pppoe-3.10
sudo ./go



Step 8.


>>> Enter your PPPoE user name: your.iburst.username@iburst.co.za
>>> Enter the Ethernet interface connected to the DSL modem: ib0
>>> Enter the demand value: no
>>> Enter the DNS information here: 208.67.220.220
>>> Enter the Secondary DNS information here: 208.67.222.222

We enter OpenDNS servers above here in stead of Iburst DNS. Why well Willie Venter has some good points as to why we use them here.


>>> Please enter your PPPoE password: your.password.here
>>> Please re-enter your PPPoE password: your.password.here.again
>>> Choose Firewall settings: 2

Choose 2 above if you have more then 1 machine using the gateway.
This will save you some time and effort in the future.

Step 9.

You must now restart the machine and plug the iBurst USB modem in.
It should start the connection automatically on boot.

If you having problems you can verify the installation using:
lsmod | grep ib_

The output should look something like this:

ib_usb 13956 0
ib_net 14344 1 ib_usb
usbcore 149360 5 ib_usb,isp1760,usbhid,uhci_hcd

To start, stop or view the status of the interface use these commands:

sudo pppoe-start
sudo pppoe-stop
sudo pppoe-status


Please note that Hosting Habitat is in No way affiliated with Iburst or endorses any of there products. Also we in no way guarantee that any of this will work for you and will not provide any technical support if the above product or installation fails. Just thought this may help you as I use Iburst on my laptop for when I'm on the road or at a meeting. Good Luck!!!

Before I go users of 8.10 experience a problem using firefox where it keeps reverting to offline mode look here for the work around.