Knowledge Spot

Share knowledge with us…

Archive for November, 2008

Introduction to Robots.txt

Posted by Gajendra on November 27, 2008

What is robots.txt?

When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of your site should be indexed and which should be ignored.

The robots.txt file is a simple text file (no HTML), that must be placed in root directory, i.e.

http://www.yourwebsite.com/robots.txt

Creating robots.txt file?

The robots.txt file is a simple text file. Open a simple text editor(Notepad) to create it. The content of a robots.txt file consists of so-called “records”.

A record contains the information for a special search engine. Each record consists of two fields: the user agent line and one or more Disallow lines.

User-agent: googlebot
Disallow: /cgi-bin

This robots.txt file would allow the “googlebot”, which is the search engine spider of Google, to retrieve every page from your site except for files from the “cgi-bin” directory. All files in the “cgi-bin” directory will be ignored by googlebot.

The Disallow command works like a wildcard. If you enter

User-agent: googlebot
Disallow: /support

both “/support.html” and “/support/index.html” as well as all other files in the “support” directory would not be indexed by search engines.

If you leave the Disallow line blank, you’re telling the search engine that all files may be indexed. In any case, you must enter a Disallow line for every User-agent record.

If you want to give all search engine spiders the same rights, use the following robots.txt content:

User-agent: *
Disallow: /cgi-bin

Where can I find user agent names?

You can find user agent names in your log files by checking for requests to robots.txt. Most often, all search engine spiders should be given the same rights. in that case, use “User-agent: *” as mentioned above.

Dont’s

If you don’t format your robots.txt file properly, some or all files of your Web site might not get indexed by search engines. To avoid this, do the following:

  1. Don’t use comments in the robots.txt fileAlthough comments are allowed in a robots.txt file, they might confuse some search engine spiders.

    Disallow: support # Don’t index the support directory” might be misinterepreted as “Disallow: support#Don’t index the support directory“.

  2. Don’t use white space at the beginning of a line. For example, don’t write

    User-agent: *
    Disallow: /support

    but

    User-agent: *
    Disallow: /support

  3. Don’t change the order of the commands. If your robots.txt file should work, don’t mix it up. Don’t write

    Disallow: /support
    User-agent: *

    but

    User-agent: *
    Disallow: /support

  4. Don’t use more than one directory in a Disallow line. Do not use the following

    User-agent: *
    Disallow: /support /cgi-bin /images/

    Search engine spiders cannot understand that format. The correct syntax for this is

    User-agent: *
    Disallow: /support
    Disallow: /cgi-bin
    Disallow: /images

  5. Be sure to use the right case. The file names on your server are case sensitve. If the name of your directory is “Support“, don’t write “support” in the robots.txt file.
  6. Don’t list all files. If you want a search engine spider to ignore all files in a special directory, you don’t have to list all files. For example:

    User-agent: *
    Disallow: /support/orders.html
    Disallow: /support/technical.html
    Disallow: /support/helpdesk.html
    Disallow: /support/index.html

    You can replace this with

    User-agent: *
    Disallow: /support

  7. There is no “Allow” commandDon’t use an “Allow” command in your robots.txt file. Only mention files and directories that you don’t want to be indexed. All other files will be indexed automatically if they are linked on your site.

Tips and tricks:

1. How to allow all search engine spiders to index all files

    Use the following content for your robots.txt file if you want to allow all search engine spiders to index all files of your Web site:

    User-agent: *
    Disallow:

2. How to disallow all spiders to index any file

    If you don’t want search engines to index any file of your Web site, use the following:

    User-agent: *
    Disallow: /

Posted in SEO Updates | Tagged: , , | Leave a Comment »

How to Optimize Dynamic Websites for Better Search Engine Rankings

Posted by rkum on November 26, 2008

There is a misconception related to dynamic websites that dynamic websites are not search engine friendly or they can’t have good positions in major search engines. This is absolutely wrong, dynamic websites can have better and more controlled positions in search engines comparatively than static websites.

What is a dynamic website?

A dynamic website is database driven website in which parts of the content are generated by Server Side Programs/ Middle Tier.
Dynamic webpage doesn’t physically exist as a file/document on (hosting) server, unless the request comes for a webpage. The request contains parameters, user identities, date & time, context etc.

Problems with Dynamic Websites according to Search Engines

This is true that search engines are not good at reading dynamic web pages, but there is always a solution for any problem, first you need to understand that why search engines are unable to read dynamically generated websites? What hurts them not to read dynamic web pages?

  1. Dynamic webpage doesn’t physically exit on server

  2. Dynamic website has complex URLs such as “ http://www.asif-iqbal.com?name=value&blabla%blabla@session_id@2226897&blabla=77

  3. Search engine bots/crawlers usually have difficulty in reading these characters “?”, “=”, “@”, “%”, “$”, “*”, “&”, “!” in URLs

  4. Search engine usually considers dynamic website as group of never ending links

  5. Search engine bots/crawlers might get stuck in an infinite loop, specially if the dynamic webpage has session id

Tips to Optimize Dynamic Websites

Now you know what hurts search engine bots/crawlers to index your website? What you need to know is that how you can keep your valuable website indexed by search engines, the more your web pages are indexed the better your website will impress search engines

  1. Create an HTML sitemap with 100 text links or less. If you have more than 100 links, break the sitemap into more than one web pages

  2. Google Sitemap will also be an advantage, specially if your website is big and dynamic

  3. Get inbound links deep into your website from other relevant websites such as directories, classified directories, vertical industrial portals

  4. Convert dynamic web pages into static web pages with the help of URL re-writing techniques

  5. You can use some plug-in applications that will change your existing dynamic URLs into static ones, specially for shopping carts there are plenty of applications available

  6. Avoid using session IDs in the URL, specially when user has not logged in

  7. If you do need to include parameters, limit it to two and limit the number of characters per parameter to ten or less

  8. If you do have small dynamic website and enough time you can apply this technique. Just right click on page by page of you website, copy the source code and create new static page with .htm or .html extensions

Posted in SEO Updates | Tagged: | Leave a Comment »

Windows Live Search rumoured to be rebranded

Posted by sunilkumar90 on November 25, 2008

Microsoft has been rumoured to be rebranding its Windows Live Search, according to various reports.

The computing giant is expected to rebrand the service to Kumo.com, with LiveSide confirming the company has bought the domain name and is directing internal traffic there.

Speaking on its blog, the website said: “While Microsoft employees have admitted publicly that there are branding issues around Live Search, we’re not quite ready to stick our heads above the parapet and say that Kumo will be the new brand name.”

Citing a source within the company, a TechCrunch report found the rebranded site is expected to launch in early 2009 – something search engine optimisation specialists may have time to plan for.

The website also adds the brand name could still change.

In other news, Microsoft’s senior program manager of search engine optimisation has been interviewed by a magazine.

Duanne Forrester said, even with the current economic climate, digital website marketing continues to hold up.

He added the methods are adapting to the changing marketplace – with online marketing leading the way as it is easily editable.

Posted in General, SEO Updates | Tagged: , , , | Leave a Comment »