Monday, February 5, 2007

How to create your own Search Engine - using PHP and mySQL

Create your own Search Engine

I decided: why not create a own search engine crawler, and crawl my own website? And crawl all my competitors websites? And check what I am missing on Google Page Rank? And have my own Search Engine?

This is what I did, I installed the http://astellar.com/opensource/php-crawler/.

This is really easy to install, just download it and upload to your server, you have to create only one database table, and set up the config file, and that is all. Up and running.

I found out that this php-crawler had one problem when finding an JPG image or a PDF, then what you should do is to insert one code on the file crawler/crawler.php between

print "Crawling: [" . $URL_info["depth"] . "] {$URL}";

### this is the new code you should insert ###

$url22 = substr($URL,-3);
if($url22 == "pdf" | $url22 == "Pdf" | $url22 == "PDF" | $url22 == "jpg" | $url22 == "Jpg" | $url22 == "JPG" | $url22 == "gif" | $url22 == "Gif" | $url22 == "GIF")
{
drop_url_from_db($URL_info["id"]);
print " - FAILED/REMOVED.
\n";
continue;
}

### here ends the new code you should insert ###

$page = fetch_URL($URL);
if ($page === false)
{
drop_url_from_db($URL_info["id"]);
print " - FAILED/REMOVED.
\n";
continue;
}


And that is all, your crawler will be up and running with your new search engine. You can check the one I made up at http://www.marineparts.ws/.


How to populate your Search Engine with a lot of websites?

One strategy you could use to populate your search engine is what I am doing: I have one major website that I receive a lot of traffic. I did create one script in php where I read the referer - where my visitors are coming from.

Then, I do not consider my own domain name, and also I do not consider google, yahoo, msn, and major search engines domain names.

If the user is coming from a new website, then I just will go and crawl it. This way, the more traffic I receive, the more websites I will be able to crawl.


Ricardo Guimaraes
http://www.avatarinteractive.com


I want to receive your feedbacks about this post, thanks.