«

»

Apr
03

Search Engine Crawlers – Treat & Know them like a Friend


What Are Search Engine Crawlers?



The definition of Web Crawler directly from Wiki

Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are antsautomatic indexersbots, and worms[1] or Web spiderWeb robot.

This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).

A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

So the Web Crawler is the tool that visits your site late at night and traverses as much of your site as possible making that information available to search engines such as Google or Bing so they can easily find your content, and rank it properly.  So the Web Crawler is your friend and you want to make it as easy for the crawler as possible when visiting your site.  If you don’t help the crawler, it will use generic rules that may hurt your search ranking, or might make certain parts of your site un-searchable. It may waste time indexing system files such as CSS files or other files that are meaningless to the purpose of your site.

Make the Robots Happy and Productive – Use “Robots.Txt” Files

If you believe in Christmas and Santa, then you know what I mean when I say you wouldn’t go to bed without leaving some cookies and milk for Santa to put him in a good mood while visiting your house.  A simple and standard file called “robots.txt” is what the Robots or Crawlers look for the moment they get to your site.  Imagine their DisplacedGuy Blog PowerBuilder Silverlight Wavemaker disappointment that you didn’t even think of them by leaving a simple robots.txt file.

How the Robots Work

Seriously speaking, the Crawler or Robot is looking for objects on your site, and objects are files, folders, web links.  Upon reaching the root of your site, the Robot looks for the Robots.txt file and uses it to understand your site and maximize the benefit of the crawl.  Your robots.txt file will help the robot to assign content to indexes, and understand arranged web page order to structure indexes for faster finding by internet searcher.  In this case crawler will filter which are web page, file, folder and which can be indexed or not. Most of web page contain links to other pages and normally spider will start from top left to right down similar to reading a book.


Making your robots.txt File

Robots.txt is text file not html this will be placed on the root of your web site.  There are books written on the subject of web crawlers and usage of robots.txt files but here is a simple start:

Location & naming

1. Name it robots.txt *not* robot.txt or Robot.Txt, or spider.txt

2. Add rules to the text file, save, and place a copy at the root of your web site.  Many sites available on rule formats.

Example 1 – Disallow All robots for specific folders and files

Make a list of everything on your site you DON’T want robots/spiders to visit and put in file like this.  Note: You could replace the wildcard for user agent and put specific robots that you want to ban.

# robots.txt for http://www.sample.com/

User-agent: *

Disallow: /chat/          # Online chat files
Disallow: /testsite/      # This is a test area
Disallow: /login.html     # This is a an admin file

Good Luck

I hope this helps give you a basic understanding of the robots.txt and how Web Crawlers work.  This information here scratches the surface of what you can do with the robots.txt file. There are tons of sites focusing on it entirely so I won’t bother reinventing the wheel– just wanted to get you started.

Sincerely,

The DisplacedGuy  (a.k.a. Rich Bianco)

P.S.  My daughter Heather is taking ownership of another blog called Otown411 as she wants to help the family situation with me being unemployed. She sees me working day and night and was willing to try to get Otown411 up and running. The site is targeted towards people looking to visit or vacation in the Orlando area and we would offer insider tips about maximizing the vacation since we live here and know all the ins-and-outs.  It would be great if you could stop over and give her some motivation to stick with it.  She is like me and will be checking visitor stats constantly which is the motivating part.  I’m banking on this time I’m spending as being an investment…  fingers crossed.  IF any of the sponsor sites on this site are appropriate please consider visiting them.

43 comments

1 ping

  1. Electronic Cigarette Starter Kit says:

    Great post bro. I like your writing style.

  2. Luella Kruzan says:

    Very nice post and straight to the point. I am not sure if this is truly the best place to ask but do you folks have any thoughts on where to get some professional writers? Thank you :)

  3. Hazard Internetowy says:

    You got a definitely helpful blog I’ve been right here reading for about an hour. I’m a newbie and your accomplishment is quite a lot an inspiration for me.

  4. personal trainers says:

    Finally, an issue that I am passionate about. I have looked for information of this caliber for the last several hours. Your site is greatly appreciated.

  5. Jesus Nozum says:

    We really love this site. Iwish I could come here everyday\all day.

  6. William B says:

    Ooohh, If you have a website in English language with 500 unique visitors per day, I can make you earn $200-1000 everyday and the request is after receiving the payment, we share the revenue 50 to 50. This is an invitation sent to you via a group-sending software, which helped me send more than 50,000 invitations to blog writers using wordpress, although only 5 of them established the cooperative relationship with us, they now get $2000-10000 every month. If you are interested in this invitation, please contact us. You will get an auto email reply with an url link liking to detailed information about this project. :-D

  7. massage therapist says:

    It’s really a nice and helpful piece of information. I’m glad that you shared this helpful info with us. Please keep us informed like this. Thanks for sharing.

  8. Maggie Musulin says:

    Thank you… I’m going to add this to my favorites. If I may ask, what got you started into blogging? To be honest I’ve just been catching on to this hobbie and it’s really begun to inspire me to begin a blog of my own. I’ve tried but nothing that material has occured as of yet. You seem established, hints would be appreciated…

  9. Tai Starghill says:

    I really like your site!

  10. http://technologyou.blogspot.com/2010/10/powerpoint-templates.html says:

    I am really happy with ur blog and will book mark this

  11. Young Millionaire says:

    Just got a chance to leave a comment so here it is! Excellent post and very interesting stuff! Hope all the best for your blog and your making money online ventures…

    Just letting you know that I’ve signed up for your blog newsletter and looking forword to your future posts. It would be great if you’d do the same for my blog… I’ve also created a few products that I promote on my blog and would love if you’d consider promoting them on yours for some quick affiliate cash!

    All the best,
    Dino Vedo

    PS: Like my Facebook fan page and follow me on Twitter and I’ll do the same for you! ;)

  12. Truden says:

    їPuedo tomar Foto de su sitio

    Truden

  13. BernieR says:

    ЎHola!
    Interesante, yo cotizaciуn en mi sitio mбs tarde.

    BernieR

  14. Vinger Jackson says:

    Are you guys seeing short sales just dominate your market?

    1. DisplacedGuy says:

      Yes, short sales and foreclosures are killing our housing market. Home prices have fallen over 100% in the last five years and have not showed any sign of leveling out. If I were a buyer I would NOT buy yet, I have a gut feeling that there will be a hard/fast crash in prices before the recovery begins, similar to how a stock when overbought needs to shake out all the weak hands.
      Thanks for the comment,
      Sincerely,
      Rich

  15. joshua Thomas says:

    Whats up ! Love your blog thanks for sharing it with us. Support local business.

  16. Jessika Feisthamel says:

    You are the man ! Bye

  17. Nicolas says:

    Greatings, http://www.displacedguy.com – da mejor. Guardar va!
    Gracias

    Nicolas

  18. Ilias says:

    Interesante, no va a continuar con este artнculo?
    Gracias

    Ilias

  19. concord discount broker says:

    Thanks for posting about this, I would love to read more….

  20. Truden says:

    Super post, tienen que marcarlo en Digg

    Truden

  21. Lyme disease says:

    I just assumed i’d distribute and let you realize your weblogs is valuable for uncovered the practical strategy.I genuinely love your weblog.Systematically, the post is in actuality the best on this worth while topic. I concur together with your ideas and will desperately search forward for your forthcoming tweets. Simply just saying thanks will not just be enough, for that brilliant lucidity inside your methods. I will quickly capture your rss feed to remain updated of any updates.Real do the trick and substantially achievements in your give good results and small business tries.Anyways maintain up the very good efforts.Appreciate it.

  22. kostenlos weltweit bargeld abheben says:

    I like this website and it has shown me some sort of inspiration to have success for some reason, so thanks. Moreover I´m definitely thinking about mentioning these facts in my own blog!

  23. tax attorney says:

    Thanks man. It is cool reading.

  24. Andrew Juenger says:

    Just noticed a new chat site called Chat Spasm, a chat site with girls.

  25. parions web says:

    Hello Je découvre grace à Google sur ce topic et je reconnais qu’il fait réfléchir. Franchement merci pour ce message. Bonne continuation !

  26. Electronic Cigarettes says:

    Thanks for the informative article. Will check out more of your blog posts!

  27. bluehost review says:

    Great, Thank you.

  28. Erin Grimaldi says:

    This site is so great that i will honor it with my comment :)

  29. Ian Geldmacher says:

    I found your entry interesting thus I’ve added a Trackback to it on my weblog :)

  30. Nuby says:

    I hope you have a good day! Very good article, well written and very thought out. I am looking forward to reading more of your posts in the future.

  31. poker says:

    It’s a such good article, I will follow you !

  32. Mario Rizzi says:

    nice article, i just bookmarked it for future reference. i’d like to check on future articles. how can i set the rss reader again? thanks!

  33. Legalsounds says:

    Wow!, this was a real quality post. In theory I’d like to write like this too – taking time and real effort to make a good article… but what can I say… I keep putting it off and never seem to get something done

  34. Dino Vedo says:

    Another very strong and powerful post. I’ve been reading through some of your previous posts and finally decided to drop a comment on this one. I signed up for your newsletter, so please keep up the informative posts!

    Hope you subscribe to my blog as well and leave a few comments here and there! Also would appreciate it if you check out some of my products I created and maybe you could promote them on your blog to make us both some money! Check out my site at : Make Money Online with Dino Vedo.

    All the best,
    Dino Vedo

    PS: Like my Facebook fan page and follow me on Twitter and I’ll do the same for you! ;)

  35. Panda Internet Security Coupons says:

    greetings there, i just saw your site via bing, and i would like to comment that you express exceptionally good via your site. i am very struck by the mode that you write, and the subject is quality. anyway, i would also love to acknowledge whether you would like to exchange links with my site? i will be certainly more than willing to reciprocate and enter your link off in the blogroll. anticipating for your answer, i would like to convey my appreciation and have a great day!

  36. Grant Soose says:

    Thanks for the interesting content!!!

  37. Gene Stauder says:

    Book marked your webblog. Appreciation for discussing. Absolutely really worth the time from our tests.

  38. Billy Anstead says:

    Very neat blog post.Really looking forward to read more. Fantastic.

  39. Kourtney Wankel says:

    Excellent contenu. Merci de poster.

  40. Shayari says:

    Hey dear can i publish some paragraph of your post on my little blog of university.I have to publish a good articles out there and i do think your post Fits well into it.I will be glad to provide you an source link as well.I have two blogs one my personal and the other which is my college blog.I will write some part in the university blog.Hope you do not mind.Greetings

  41. refurbished apple computers says:

    Thanks for the Blog, thanks for helping me with this fine Article. I think it is really a great topic to write about on my blog. Also here is some good information if needed: refurbished apple computers

  42. parionssport says:

    Salut J’arrive sans être un passionné sur ce message et je dois dire qu’il nous ouvre les yeux. Un grand merci pour ce blog. Bon courage !

  1. Tweets that mention The Displaced Guy » Blog Archive » Search Engine Crawlers – Treat & Know them like a Friend -- Topsy.com says:

    [...] This post was mentioned on Twitter by Rich Bianco. Rich Bianco said: RT @DisplacedGuy Search Engine Crawlers – Treat & Know them like a Friend http://bit.ly/cizOoc [...]

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>