Friday, February 24, 2012

Contains query returning false hits

Hi
I cant figure out why a query returns false results.

It is basically:

select *
from sometable
where somecolumn=1
and contains(someothercolumn, 'arti-b')

This query takes forever to process and returns about 100 false hits for every row where someothercolumn actually contains the string 'arti-b'.
Note, I don't get the full set of rows where somecolumn=1, so there is some filtering from the contains clause.

If I use the same query, searching for just 'arti', it works fine. 'whatever-b' seems to work as well, as long as 'whatever' != 'arti'.

Does anyone know what causes this?

SQL server Enterprise edition
version 8 (SP4)
Language: US english
Collation: Finnish_swedish.

I believe that it is the collation of the someother column that is causing the problem. See Book on line “Collation Precedence”. You may have better luck with LIKE ‘%arti-b%’.

|||Well the column and table should have default collation.
Might this be an issue about different settings in sqlserver and mssearch?

LIKE '%arti-b%' is definitly not an option. This is over a million lines of text data!
Could someone give me an example of how to perform a proper contains-query for this string?|||

Hi!

I just saw this question today. Is this still unsolved?

If so, then I will assign this particular issue to one of the developers so they can find out what is the specific issue with this 'arti'. In which language is your text data? (finish-swedish?), there could be an issue with the wordbreaker used as well (but this is just a quick idea).

I will answer you as soon we can, probably from 3rd January on.

Thanks!

Fernando Azpeitia

|||

Hi

Yes, it is still unsolved.

The text data is in swedish. Our particular problem is that the data is stock market orientated. Hence the frequent full-text search for non-ortodox words. ("arti-b" is a stock ticker)

If it is not possible to configure mssearch not to treat hyphens as word breakers, I'd appriciate any help to handle such searchs as good as possible, with acceptable performance.

Thanks for your time.

/Gustav

No comments:

Post a Comment