About Internet Search Engines
Published: Friday, August 20, 2004
About Internet Search Engines
The internet contains a vast collection of information, which is spread out in every part of the world on remote web servers. The problem in locating the correct information on the internet led to the creation of search technology, known as the internet search engine. A search engine can provide links to relevant information based on your requirement or query. Examples of popular internet search engines are Google, Yahoo, MSN, Lycos and Ask Jeeves. In order to understand the terminology and techniques to position your website pages for higher ranking in search engines, the knowledge of the basic functioning of a search engine is essential.
Functions of Internet Search Engines
A search engine is a computer software, that is continually modified to avail of the lastest technologies in order to provide improved search results. Each search engine does the same functions of collecting, organizing, indexing and serving results in its own unique ways, thus employing various algorithms and techniques, which are their trade secrets. In short, the functions of a search engine can be categorized into the following:- Crawling the internet for web content.
- Indexing the web content.
- Storing the website contents.
- Search algorithms and results.
Crawling and Spidering the Web
Crawling is the method of following links on the web to different websites, and gathering the contents of these websites for storage in the search engines databases. Crawling the internet can start afresh (starting with a popular website containing lots of links, such as Yahoo) or from existing older indexes of websites. The crawler (also known as a web robot or a web spider) is a software program that can download web content (web pages, images, documents and other files), and then follow hyper-links within these web contents to download the linked contents. The linked contents can be on the same site or on a different website.
The crawling continues until it finds a logical stop, such as a dead end with no external links or reaching the set number of levels inside the website's link structure. If a website is not linked from other websites on the internet, the crawler will be unable to locate it. Therefore, if the website is new, and has no links from other sites, that website has to be submitted to each of the search engines for crawling.
The efficiency of the crawler makes it crawl multiple websites at the same time, so as to collect billions of website contents as frequently as it can. News and media sites are crawled more frequently (every hour or so) by advanced search engines like Google, in order to deliver updated news and content in their search results. The crawler also does not flood a single website with a high volume of requests at the same time, but spreads the crawling over a period of time so that the web site does not crash. Usually search engines crawl only a few (three or four) levels deep from the homepage of a website. The term deep crawl is used to denote that the crawler or spider can index pages that are many levels deep. Google is an example of a deep crawler.
Crawlers or web robots follow guidelines specified for them by the website owner using the robots exclusion protocol (robots.txt). The robots.txt will specify the files or folders that the owner does not want the crawler to index in its database. Many search engine crawlers do not like unfriendly URLs, such as those generated by database driven websites. These website URLs contain parameters after the question mark (such as http://somedomain.com/article.php?cat=1&id=3). Search engines dislike such URLs because the website can overwhelm the crawler by using parameters to generate thousands of new web pages for indexing with similar content. Thus, crawlers often disregard the changes in the parameters as part of a new URL to spider.
Search engine friendly URLs are used to compensate for this problem.
|
|
|
|
|
|
|
RELATED NEWS
|
|
EzCheckPrinting & Virtual Printer Works With QB Customers To Eliminate Excessive Fees
EzCheckPrinting and virtual printer combo offers QuickBooks clients a way to eliminate costly check printing by allowing for...
|
|
IGEL for Business Continuity works in minutes, keeps business moving
New IGEL innovation is critical amid the growing, costly cyberattacks that target endpoints MUNICH, Sept. 16, 2024 /PRNewswir...
|
|
Spectral Capital Announces Strategic Acquisition of Quantomo, Pioneering the Future of Quantum Search Technology
SEATTLE, Sept. 10, 2024 /PRNewswire/ -- Spectral Capital Corporation (OTCQB: FCCN), a pioneering innovator in Quantum as a Se...
|
|
Linux Foundation Announces OpenSearch Software Foundation to Foster Open Collaboration in Search and Analytics
AWS transfers OpenSearch to the Linux Foundation to support a vendor-neutral community for search, analytics, observability,...
|
|
Proprio Successfully Completes 50 Surgeries Using First AI-Powered Surgical Platform; Unveils Massive Dataset
Ushers in a new era where surgery is no longer a one-off event but an opportunity to gather data to radically improve the pra...
|
|
Submit News |
View More News
|
|
|
RELATED CLASSIFIED ADS
|
|
North Bengal & Sikkim Search
Are you looking for hotels, tour and travels, tea, Real estate, Mobile shop, art and painting, grap ...
|
|
Website Design - Atechnocrat.com
Website design, web development & SEO company Atechnocrat.com, provides SEO services, search engine ...
|
|
KA Tech is offering Website Designing & SEO
KA Technologies have a superior understanding and knowledge of Web Designing / Development & SEO ( ...
|
|
Digital Marketing Company in India - Web Design
Datascribe Technologies Inc., was incorporated in the year 1999, now located at Charlotte, NC with a ...
|
|
sinelogix : Web Design and Development Solutions
We Sinelogix is a website development and design company based our at Bangalore and Gujarat. Get you ...
|
|
Post Free Ad |
View More
|
|
|
|
|
|
|