Page 1 of 1

Building a search engine

Posted: Mon Aug 24, 2009 8:49 am
by dalkin
I'm not even sure if this is the place to post but within the auspices of the OnRev hosting setup, is there any way of building a search engine? If the answer is "Yes", is there a published project head-start?

Regards to all.

Posted: Mon Aug 24, 2009 11:12 am
by BvG
It's possible. However, to me the question is more in the direction of "why???". Making a search engine is not only hard work, sooner or later you need a dedicated server (respectively a dozen of server farms). Does google suck so much in your eyes?

Posted: Tue Aug 25, 2009 1:18 am
by dalkin
Hi. Not so much that Google sucks, it's more a case of they charge $100 per site. It doesn't take many sites to incur a big chunk of profit.

Re: Building a search engine

Posted: Fri Nov 27, 2009 6:20 pm
by mcgrath3
Google business search costs $100.

But site:yoursitehere.com yoursearchhere -dontsearchthis works for free.

site:lazyriversoftware.com software -quartz

So, a simple search function that puts "site:" & "the site name" with the search term and don't search terms should be easy to do. And still make use of googles hard working algorithms etc.


HTHs

Tom

Re: Building a search engine

Posted: Mon Dec 14, 2009 7:35 am
by sturgis
Doubt its what you'd want but this is interesting. http://www.google.com/enterprise/search/gsa.html and of course theres the mini also.

Also, while i'm sure it's possible to roll your own rev crawler, there are already open source solutions available that might do what you need. I've never used the scripts themselves, but there seem to be quite a few php based crawlers out there. If nothing else, if you decide to roll your own, the php scripts might give some insight.

Re: Building a search engine

Posted: Wed Jan 20, 2010 8:20 am
by calieigh
Hi.
Can I have list of some good php based crawler which are good according to your experience ?

Re: Building a search engine

Posted: Wed Jan 20, 2010 4:30 pm
by sturgis
I've never actually used any of them, just googled to see wat was out there. You might look on devshed, also hotscripts as well as googling for php search engine and php crawler. Then experiment with what you find to see if there is anything that matches your needs. THe last time I used a home grown search engine it was for a relatively small local site and used ingres as a back end. I don't even remember the name of the engine itself or what language it was in. Its been 12 years or so since then. I'm sure things have improved greatly in the meantime.

Edit: My mistake, it used GDBM, heres a link to the search engine. http://harvest.sourceforge.net/harvest/doc/index.html From what I recall from way back then, setup was a real bear, but once it was working, it was really good. As I said above tho, its been over 12 years so my memory is almost nil at this point. Hard to remember breakfast much less that far back.

Re: Building a search engine

Posted: Wed Jan 20, 2010 5:39 pm
by FourthWorld
FWIW I use Atomz.com at my site. They have a free option which is quite useful for most sites with a reasonable number of pages.

I've built a couple search engines for desktop apps, and like BvG says it's a lot of work. In my case it was necessary because we have unusual data which needs to be handled in unusual ways, but I wouldn't recommend writing one from scratch unless you absolutely need to; the time required is often better spent on other features.

That said, if you have unique needs that can only be addressed by a custom solution, it may be helpful to keep this old programmer's adage in mind: "Show me your data structures and I'll show you your algorithm".

Decide up front what you need to accomplish with your SE, then design the data structures you'll need to make that happen. Once you have that figured out you'll be in a good position to work through the tedious details of indexing and retrieving that data store.

This article provides some background on the Google engine, which may provide some good ideas for your own:
http://infolab.stanford.edu/~backrub/google.html