Beginning from version 3.0.0 mnoGoSearch can index SQL database text fields -
the so called  htdb: virtual URL scheme.

Using htdb:/ virtual scheme you can build full text index of your SQL
tables as well as index your database driven WWW server.

Please note that currently mnoGoSearch can index only those 
tables that  are in the same database with mnoGoSearch tables. 
MySQL users may specify database in the query though. 
Also you must have PRIMARY key on the table you want to index.


HTDB indexer.conf commands
--------------------------

Two indexer.conf commands provide HTDB. There are HTDBList and HTDBDoc. 

HTDBList is SQL query to generate list of all URLs which correspond to
records in the table using PRIMARY key field. You may use either absolute
or relative URLs in HTDBList command:

  For example:

HTDBList SELECT concat('htdb:/',id) FROM messages

  or

HTDBList SELECT id FROM messages


HTDBDoc is a query to get only certain record from database 
using PRIMARY key value.

HTDBList SQL query is used for all URLs which end with '/' sign. For other
URLs SQL query given in HTDBDoc is used.

Note that HTDBDoc query must return FULL HTTP response with headers. 
So, you can build very flexible indexing system giving different HTTP 
status in query. Take a look at HTTP-codes.txt of documentation to understand 
indexer behavor when it gets different HTTP status.

If there is no result of HTDBDoc or query does return several records,
HTDB retrieval system generates "HTTP 404 Not Found". This may
happen at reindex time if record was deleted from your table since last
reindexing. You may use "DeleteBad yes" to delete such records from 
mnoGoSearch tables as well.

You may use several HTDBDoc/List commands in one indexer.conf with
corresponding Server commands.


HTDB variables
--------------

You may use PATH parts of URL as parameters of both HTDBList and HTDBDoc 
SQL queries. All parts are to be used as $1, $2,  ... $n, where number
is the number of PATH part:

htdb:/part1/part2/part3/part4/part5
       $1    $2    $3    $4    $5

   For example,you have this indexer.conf command:

HTDBList SELECT id FROM catalog WHERE category='$1'

When htdb:/cars/  URL is indexed, $1 will be replaced with 'cars':

         SELECT id FROM catalog WHERE category='cars'


You may use long URLs to provide several parameters to both HTDBList 
and HTDBDoc queries. For example, htdb:/path1/path2/path3/path4/id with query:

HTDBList SELECT id FROM table WHERE field1='$1' AND field2='$2' and field3='$3'

This query will generate the following URLs:

htdb:/path1/path2/path3/path4/id1
...
htdb:/path1/path2/path3/path4/idN

for all values of the field "id" which are in HTDBList output.




Creating full text index
------------------------
Using htdb:/ scheme you can create full text index and use it further in
your application. Lets imagine you have a big SQL table which stores
for example webboard messages in plain text format. You also want to
build an application with messages search facility. Lets say messages are 
stored in "messages" table with two fields "id" and "msg". "id" is an integer 
primary key and "msg" big text field contains messages themself. Using usual
SQL LIKE search may take long time to answer:

SELECT id,message FROM message WHERE message LIKE '%someword%'

Using mnoGoSearch htdb: scheme you have a possibility to create 
full text index on "message" table. Install mnoGoSearch in usual order.
Then edit your indexer.conf:


DBAddr mysql://foo:bar@localhost/database/
DBMode single

HTDBList SELECT id FROM messages

HTDBDoc SELECT concat(\
'HTTP/1.0 200 OK\\r\\n',\
'Content-type: text/plain\\r\\n',\
'\\r\\n',\
msg) \
FROM messages WHERE id='$1'

Server htdb:/


After start indexer will insert 'htdb:/' URL into database and will
run an SQL query given in HTDBList. It will produce 1,2,3, ..., N values
in result. Those values will be considered as links relative to 'htdb:/' 
URL. A list of new URLs in the form htdb:/1, htdb:/2, ... , htdb:/N
will be added into database. Then HTDBDoc SQL query will be executed
for each new URL. HTDBDoc will produce HTTP document for each document
in the form:

HTTP/1.0 200 OK
Content-Type: text/plain

<some text from 'message' field here>


 This document will be used to create full text index using words
from 'message' fields. Words will be stored in 'dict' table assuming
that we are using 'single' storage mode.

 After indexing you can use mnoGoSearch tables to perform search:

SELECT url.url 
FROM url,dict 
WHERE dict.url_id=url.rec_id 
AND dict.word='someword';


 As far as mnoGoSearch 'dict' table has an index on 'word' field
this query will be executed much faster than queries which use
SQL LIKE search on 'messages' table.

  You can also use several words in search:

SELECT url.url, count(*) as c 
FROM url,dict
WHERE dict.url_id=url.rec_id 
AND dict.word IN ('some','word')
GROUP BY url.url
ORDER BY c DESC;


  Both queries will return 'htdb:/XXX' values in url.url field.
Then your application has to cat leading 'htdb:/' from those values
to get PRIMARY key values of your 'messages' table.



Indexing SQL database driven web server
---------------------------------------

  You can also use htdb:/ scheme to index your database driven WWW server.
It allows to create indexes without having to invoke your web server while
indexing. So, it is much faster and requires less CPU resources when dicrect
indexing from WWW server. 

 The main idea of indexing database driven web server is to build full text 
index in usual order. The only thing is that search must produce real URLs
instead of URLs in 'htdb:/...' form. This can be achieved using mnoGoSearch
aliasing tools.


Take a look at sample indexer.conf in doc/samples/htdb.conf
It is an indexer.conf used to index our webboad at http://search.mnogo.ru/

HTDBList command genegates URLs in the form:

   http://search.mnogo.ru/board/message.php?id=XXX

  where XXX is a "messages" table primary key values.

For each primary key value HTDBDoc command generates text/html document 
with HTTP headers and content like this:

<HTML>
<HEAD>
<TITLE> ... subject field here .... </TITLE>
<META NAME="Description" Content=" ... author here ...">
</HEAD>
<BODY> ... message text here ... </BODY>


At the end of doc/samples/htdb.conf we wrote three commands:

Server htdb:/
Realm  http://search.mnogo.ru/board/message.php?id=*
Alias  http://search.mnogo.ru/board/message.php?id=  htdb:/


First command says indexer to execute HTDBList query which will
generate a list of messages in the form:

    http://search.mnogo.ru/board/message.php?id=XXX

Second command allow indexer to accept such message URLs using string
match with '*' wildcard at the end.

Third command replaces "http://search.mnogo.ru/board/message.php?id=" 
substring in URL with "htdb:/" when indexer retrive documents with messages.
It means that "http://mysearch.udm.net/board/message.php?id=xxx" URLs will be
shown in search result, but "htdb:/xxx" URL will be indexed instead,
where xxx is the PRIMARY key value, the ID of record in "messages" table.


