Search Engine optimization

Web Search Engine

 

The term of “Web Search Engine” refers to a search engine designed to search for information on the Internet. This information may consist of web pages, videos, images, documents and other types of files. Some search engines also search for data available in databases, newsgroups, or open directories. Unlike Web directories (they are maintained by human editors), a search engine operates whether algorithmically or it is a mixture of algorithmic and human input.

 

Internet Bot

 

Internet bots are also called “Web robots”, “WWW robots” or simply bots. They refer to a software application running automated tasks over the Internet. Usually, bots are performing tasks that are both structurally repetitive and simple, but always at a much higher rate than would be ever possible for a human. The Internet bots mostly find their use in web spidering. Web spidering means an automated script fetching, filing and analyzing information from web servers at many times the speed of a human. Most servers have a file called robots.txt. It contains rules for the spidering of that server that the robot is supposed to obey.

 

Additionally to these functions, the bots are also very often implemented where a response speed faster that that of humans is needed. Less commonly, they may be implemented in situations where the emulation of human activity is required (chat bots for example).

 

Internet bots are also being used as content access and organization applications for media delivery. In these situations, the bots track content updates on host computers and delivers live streaming access to a (browser based) logged in user.

 

Web Crawler

 

An automated script or program which browses the World Wide Web in an automated, methodical manner is called Web Crawler. There are other (less frequently used) names for web crawlers such as automatic indexers, ants, worms and bots.

 

Actions maintained by web crawlers are called web crawling or spidering. Many sites, particularly search engines, are using spidering in order to provide up-to-date data. What is the purpose of web crawlers? They are mainly used to create a copy of all the visited pages for later processing by a search engine. The engine will then index downloaded pages in order to provide fast searches. Web crawlers are also being used for automating maintenance tasks on a website, link-check or HTML code validation included. They are also used to gather specific types of information from Web sites, such as e-mail addresses-harvesting that is usually used for spam.

 

A web crawler is actually a type of web-bot, or software agent. In general, when active, it starts with a list of URLs to visit – they are called “the seeds”. Then, the crawler visits these URLs and it identifies all the hyperlinks in the page, adding them to the list of URLs to visit – this is called “the crawl frontier”. According to a set of policies, URLs from the frontier are recursively visited.

 

Googlebot

 

A search bot used by Google is called Googlebot. Its task is to collect documents from the web in order to build a searchable index for the Google search engine.

 

If a web site is administered by a webmaster wishing to restrict the information on their site available to a Googlebot (or another well-behaved spider), it can be done by editing the appropriate directives in a robots.txt file or by adding the meta tag <meta name="Googlebot" content="noindex"> to the website. Googlebot requests to Web servers are discernible from their user-agent string (“Googlebot”).

 

Googlebot itself has two versions, freshbot and deepbot. Deepbot is a deep crawler trying to follow every link on the web and download as many pages as possible to the Google indexers. This process is completed about once a month. Freshbot is crawling the web looking for fresh content. It visits the frequently changing websites according to how frequently they change. Usually, Google bot only follows SRC links and HREF links.

 

Googlebot also discovers pages by harvesting all of the links on every page it finds on the World Wide Web. Then, it is following these links to other web pages. In order to be crawled and indexed, the new web page must be linked to from another known page on the Internet.

 

However, there is a problem which webmasters have often noted with the Googlebot – it takes up an enormous amount of bandwidth. This can lead to a situation when websites exceed their bandwidth limit and they are taken down temporarily. This problem occurs to be especially troublesome for mirror sites because they may be hosting many gigabytes of data. Google is providing “Webmaster Tools” allowing the websites owners to make up the crawl rate.

 

 
            Login



 Remember me

Lost Password?
Register