Product | FastTextSearch/IB 1.4 from 18 Mar 2001 |
Description | Full text search engine for Interbase. |
Author | Copyright (c) 1998-2001, SoftLab MIL-TEC Ltd Web: http://www.softcomplete.com Email: support@softcomplete.com |
Status | Shareware $50. Online registration is available. |
Limitation | not index each even word (2,4,...) |
SOFTLAB EVALUATION LICENSE AGREEMENT 1.GRANT OF LICENSE.
2. WARRANTY AND LIMITATION OF LIABILITY.
|
FastTextSearch/IB is unique solution for the searching textual information in the Interbase database. Allow indexing an unlimited amount string and textual memo fields. Implements some idea from text retrivial systems - stop-word list, word stemming. Server side quick searching (without scan of all record). Possibility of use SoundEx for query. Independence from client software.
Setup complete ... or how to install FastTextSearch/IB.
Install of FastTextSearch is very simple. You must place this files in "interbase\lib" folder.
FileName
|
Short description
|
xFTS.dll | FTS core |
xFTS.ini | FTS settings |
xLibUDF.dll | string function library |
xFTS_skip.lst | stop-word list |
xFTS_tkn.lst | list of unreducible words |
stem_eng.dll | english stemming engine |
xFTS_RTF.dll | Blob RTF filter |
xFTS.lic | license file. only in registered version |
Step by step ... or how to add FastTextSearch/IB for existing database.
Open file "script\FTSCore.sql" in notepad. Change database name, user name and password to required. Execute script. FTS core will be add to your database.
By
example, we add FTS to EMPLOYEE.GDB (from Interbase examples). For add FTS capabilities
in some table we must select fields for indexing and define 3 triggers. We will
be working with CUSTOMERS table. First step - add field for FTS object identifier.
alter table CUSTOMER add
FTS_ID INTEGER;
create index CUSTOMER_IDX_FTS on CUSTOMER(FTS_ID);
Second step - define next triggers:
create trigger t_bi_customer_fts for CUSTOMER active before insert position 0 as
declare variable Parser integer;
declare variable i integer;
begin
if (exists(select * from TS$OPT where Enable>0)) then begin
Parser=Parser_Create();
i=Parser_Add(Parser,new.CUSTOMER);
i=Parser_Add(Parser,new.CONTACT_FIRST);
i=Parser_Add(Parser,new.CONTACT_LAST);
i=Parser_Add(Parser,new.ADDRESS_LINE1);
i=Parser_Add(Parser,new.ADDRESS_LINE2);
i=Parser_Add(Parser,new.CITY);
i=Parser_Add(Parser,new.COUNTRY);
execute procedure TS$UPDATE(new.FTS_ID, 'CUSTOMER', :Parser)
RETURNING_VALUES new.FTS_ID;
i=Parser_Free(Parser);
end
endcreate trigger t_bu_customer_fts for CUSTOMER active before update position 0 as
/* same as trigger t_bi_customer_fts */
declare variable Parser integer;
declare variable i integer;
begin
if (exists(select * from TS$OPT where Enable>0)) then begin
Parser=Parser_Create();
i=Parser_Add(Parser,new.CUSTOMER);
i=Parser_Add(Parser,new.CONTACT_FIRST);
i=Parser_Add(Parser,new.CONTACT_LAST);
i=Parser_Add(Parser,new.ADDRESS_LINE1);
i=Parser_Add(Parser,new.ADDRESS_LINE2);
i=Parser_Add(Parser,new.CITY);
i=Parser_Add(Parser,new.COUNTRY);
execute procedure TS$UPDATE(old.FTS_ID, 'CUSTOMER', :Parser)
RETURNING_VALUES new.FTS_ID;
i=Parser_Free(Parser);
end
endcreate trigger t_bd_customer_fts for CUSTOMER active before delete position 0 as
begin
delete from TS$OBJ where ID = old.FTS_ID;
end
Third step (optional) - build FTS index, if table contains data. For this necessary to execute trigger "Before Update" for each record in table.
update CUSTOMER set CUSTOMER=CUSTOMER;
(click here for see performance this query)
All done ! Test FTS index:
select *
from TS$Select_OR("customer","tech corp",0) TS
left join customer c on c.fts_id = ts.obj_id
This query select all customer, which contain words "tech" or "corp":
Attention: FastTextSearch/IB break all string and text memo to words, and built index, which contain this word. Searching for index is realized on coinciding beginning of word with sample. Hereupon, records, contain word "incorporated", will not be found at searching for word "corp".
Second test:
select *
from TS$Select_And("customer","hong kong",0) TS
left join customer c on c.fts_id = ts.obj_id
This query select all customer, which contain words "hong" and "kong":
SoundEX test:
select *
from TS$Select_OR("customer","jon",1) TS
left join customer c on c.fts_id = ts.obj_id
Select all customer, which contain words, with SoundEX equal to SoundEX("jon") :
Quick and fast ... or analysis of performance.
Query
|
Performance
|
"OR" Query with 2 word select * |
![]() |
"AND" Query with 2 word select * |
![]() |
"OR" SoundEX Query with 1 word select * |
![]() |
Brought diagrams show that only indexed access to data is used.
Some tips for increasing performance:
You don't speak english ? ... or international settings.
This is the sample xFTS.ini file for work with russian and english language (file xfts.rus):
Parameter
|
Description
|
MinWordLen | Disable indexing for short words (by example "at", "is", "or", etc.) |
WordSet | Define set of chars, from which consist words. All rest symbols will be perceive as separators of words |
NationSet Translate |
Is kept table for transformation of word in english equivalent (char by char), on which, subsequently, is calculate function SoundEX |
[Filters] | Additional filters, which apply for any indexing word. In next section we show, how to write your own filter. |
Find possible all !... more powers.
Indexing blob fields.
Use function Parser_AddBlob. By example, in table JOB from EMPLOYEE.GDB we built FTS index on fields JOB_COUNTRY varchar(15), JOB_TITLE varchar(25) and JOB_REQUIREMENT BLOB:
create trigger t_bi_job_fts for JOB active before insert position 0 as
declare variable Parser integer;
declare variable i integer;
begin
if (exists(select * from TS$OPT where Enable>0)) then begin
Parser=Parser_Create();
i=Parser_Add(Parser,new.JOB_COUNTRY);
i=Parser_Add(Parser,new.JOB_TITLE);
i=Parser_AddBlob(Parser,new.JOB_REQUIREMENT);
execute procedure TS$UPDATE(new.FTS_ID,'JOB',:Parser)
RETURNING_VALUES new.FTS_ID;
i=Parser_Free(Parser);
end
endIndexing blob fields, contains text in specific format.
If BLOB field contain RTF text then use function Parser_AddBlobRTF from xFTS_RTF.dll. Source code for this dll found in folder "RtfSrc". In future we planned add support for HTML.
Indexing unreducible words.
If it is necessary in the forced order to put in an index some words (irrespective
of their length and outcome of application of filters), they should be enumerated
in the file xFTS_tkn.lst. For example, if we want indexing field STATE_PROVINCE
in CUSTOMER table we should in the file xFTS_tkn.lst add words "CA",
"TX", "MA", "ON", "HI". Otherwise they
will not hit in an index as their length less minimum valid word length.
Note: unreducible words can contain non-word char. By example: "B+W",
"ABC!", "http://".
Write your own filter.
You can write filter in any programming language, which allows to create Win32 dll module. You is necessary export one function:
procedure ProcessWord(WordBuf: PChar); stdcall;
When calling this functions, word will be place in buffer with size 64 bytes. You are to process word, and return result in this the most buffer. If word is not index then simple assign WordBuf^:=#0.
Performance results were obtained with the help QuickDesk.