What Search Engine ‘Spiders’ Are And How They Work
Posted by Justin Harrison | Posted in Site Promotion | Posted on 16-03-2010
Tags: internet marketing, search engine marketing, search engine optimization, search marketing, SEO, seo services, Site Promotion
0
Search engine ‘spiders’ are robots that seek out webpages to display in search engines. Below we’ll discuss how they work and why they’re important.
Robots actually have the same basic functionality that earlier browsers had. Just like these early browsers, search engine robots do not have the ability to do certain things. Robots cannot get past password protected areas. They do not understand frames, Flash movies, nor Images or JavaScript. Even if you use a robot, you have to click the buttons on your website. They can cease to function while using JavaScript navigation or when indexing a dynamically generated URL. A search engine robot retrieves data and finds information and links on the web.
Spiders are able to determine the content of your page by looking at the visible text, the HTML code, and links. Based on the words it finds, the spider determines what the site is about using a complex algorithm to determine what is and isn’t important. Spiders also collect links from websites to follow later, which allows them to effectively hop from site to site to site. Since the entire internet is made up of links between websites, the robots use them to make their way through the internet as they search.
Links are collected from every page that is visited. These links are used in following those links to other pages. The robot gets around on the World Wide Web by following links from one place to another.
To ensure that searchers get the right results with the most relevant response to their query, quick calculations are done to see that this happens. Server logs and log statistics program results can be checked by the user to see what pages have been visited and how often. Some robots may be easy to identify such as Google’s ‘Googlebot’, while less well-known ones such as Inktomi’s ‘Slurp’ are not easily identifiable. Some robots even appear to be human-powered browsers.
There may be robots that you do not want to visit your website such as aggressive bandwidth grabbing robots and others. The ability to identify individual robots and the number of their visits is useful. Information on the undesirable robots is helpful also. IP names and addresses of search engine robots are listed at the end of this article in a resources section. These robots read the pages on your website by visiting your page and looking at the text that is visible on the page, and then looks at the source code tags such as title tags, meta tags and others. They look at the hyperlinks on your page. From these links, the search engine robot can determine what your page is about. Each search engine has its own algorithm to determine what is important. Information is indexed and delivered to the search engine’s database according to how the robot has been set up through the search engine.
The search engine sorts the information that has been delivered to the databases which has become a part of the search engine and directory ranking process. This allows it to display the results. Databases are updated periodically. Robots visit you regularly to find any changes to your pages so that the latest information will be available. The way in which the search engine is set up determines how the number of visits you get is calculated. This can vary with different search engines. If your website is down or experiencing a large amount of traffic, the robot may not be able to access the page they are trying to visit. The website may not be re-indexed when this occurs. This depends on how frequently your site is visited by the robot. In the hope that your site will be accessible again, the robot will re-visit your site to see if it has become accessible.
Justin Harrison is an internationally recognised Internet Marketing Consultant who provides world class SEO Services to website owners. For more information visit: http://www.seorankings.co.za

