Google ! (verb-search for information about (someone or something) on the Internet using the search engine Google). Now Google becomes a term, indicating searching in the web. We are all familiar with Google and it’s search engine. We are having enough experience in searching and finding things in internet using Google, Good ! But how many of us know how Google finds the exact answer for your search term/query? There is a great work being Google’s search engine working. Let’s peep in to the search Giant’s engine room.
A web crawler, sometime known as web spider or web robot or simply bot, is a program, most of the times it is an automated script which browses the internet in an automated manner. This process is refers as Web crawling or spidering. Google send out their web crawlers (Googlebot) to crawl the webpages in an automatic manner. The crawlers crawl and re-crawl websites quickly in a timely manner. The crawling interval depends on how quickly the content changes. If a site add contents frequently, (example a news website), crawls more often than a company’s static page. There are more than 130 trillion individual pages in web and it is constantly growing in a rapid manner. The bot go from link to link and bring data about those webpages back to Google’s servers.
Googlebot follow links from page to page and extracts links and put into a queue system to crawl later on. Here website owners can choose whether their sites are crawled or not. If they want to block the search engines from indexing their site, they had to add a code in the robot.txt file. With the robots.txt file, site owners can choose not to be crawled by Googlebot, or they can provide more specific instructions about how to process pages on their sites.
Then Google sort the pages by their content and other factors. all the crawled pages are indexed in Google’s index database. This may contain all the search terms associated with a page. It’s now over 100 million gigabytes ! Google index is now over 100,000,000 gigabytes, and they spent over one million computing hours to build it.
When a user search a term in Google search box Google index will match the query with all the webpages in which query term appears. Based on the query understanding Google pulls out relevant documents from the index. By this time a short description of the page is generated for showing the result.
Google then rank results in terms of relevance. For that Google is considering more than 200 ranking factors including, site and page quality, linking, freshness, and Google’s own PageRank. Then filter for ‘safe search’ results. Here search for safe for public viewing or not. Google’s algorithm is constantly changing, the Google search lab working constantly to make their results better.
After arranging the results based on different factors, the search page is aligned based latest design, the results are displayed – everything happens in 1/8th of a second.
You can view the complete story in an animated form here. Here is the video from Google explaining How Google Works !