Saturday, February 25, 2012

CONTAINSTABLE and wildcard inconsistency

I have a table in Sql Server 2000 with full text indexing setup on a column
called 'contents' with a datatype of Text.
The column contains HTML, and I want to search for a particular link eg.
<a href="http://links.10026.com/?link=/hm/default.aspx?i=40559#secure">
...by using the following phrase:
/hm/default.aspx?i=40559
...in a CONTAINSTABLE query.
I need to find the phrase within the html. This is tricky because
containstable only allows a wildcard at the end of the phrase, ie I can't
search for '*/hm/default.aspx?i=40559*' with an asterisk at each end.
BUT, when I try the following query on my development server, it works
without any asterisks:
SELECT item_id, contents, a.RANK FROM
CONTAINSTABLE(item, contents, '/hm/default.aspx?i=40559') as a, item b
WHERE (a.[KEY] = b.item_id AND path LIKE '%10646%')
The row is returned but I don't understand why. I thought the full text
would try to find the phrase on its own (with no html surrounding it)? On my
live server, it doesn't work (also Sql 2000). The data and the FT catalogs
and the word-breaker are the same on both machines. What else could cause
this difference?
Any help greatly appreciated.
Ed
This could be a version issue, what are the results of select @.@.version from
both servers.
On my machine, using us_english, I get <a
href="/hm/default.aspx?i=40559#secure">
indexed and queried as a, href, hm, default, aspx, i, 40559, and secure.
I would also check the noise word lists as Daniel suggests to make sure they
are identical.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"edwaldo" <edwaldo@.discussions.microsoft.com> wrote in message
news:C93D95AE-3296-42CD-B42E-10FCC841369F@.microsoft.com...
>I have a table in Sql Server 2000 with full text indexing setup on a column
> called 'contents' with a datatype of Text.
> The column contains HTML, and I want to search for a particular link eg.
> <a href="http://links.10026.com/?link=/hm/default.aspx?i=40559#secure">
> ..by using the following phrase:
> /hm/default.aspx?i=40559
> ..in a CONTAINSTABLE query.
> I need to find the phrase within the html. This is tricky because
> containstable only allows a wildcard at the end of the phrase, ie I can't
> search for '*/hm/default.aspx?i=40559*' with an asterisk at each end.
> BUT, when I try the following query on my development server, it works
> without any asterisks:
> SELECT item_id, contents, a.RANK FROM
> CONTAINSTABLE(item, contents, '/hm/default.aspx?i=40559') as a, item b
> WHERE (a.[KEY] = b.item_id AND path LIKE '%10646%')
> The row is returned but I don't understand why. I thought the full text
> would try to find the phrase on its own (with no html surrounding it)? On
> my
> live server, it doesn't work (also Sql 2000). The data and the FT
> catalogs
> and the word-breaker are the same on both machines. What else could cause
> this difference?
> Any help greatly appreciated.
> Ed
>
|||Hilary/Daniel,
Thank you both for your help with this.
To reiterate, on server A the containtable does return the row. Version is
Microsoft SQL Server 2000 - 8.00.194 (Intel X86) Developer Edition on
Windows NT 5.1 (Build 2600: Service Pack 1)
On server B it doesn't. Version is
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Standard Edition on
Windows NT 5.0 (Build 2195: Service Pack 4)
I tried splitting the phrase as you suggested:
CONTAINSTABLE(item, contents, '"hm default aspx i 40559"')
...works on server A, not on server B. Which is a shame because this would
be easy to implement.
On both machines the data is the same. On both machines I cleared the
contents of the noise files - noise.dat and noise.eng are both empty. I then
created the FT catalogs using the English (United Kingdom) word breaker. I
have repopulated each several times, and a simple LIKE query returns the row
on both servers.
I appreciate why it shouldn't work on server B, but I'm intrigued as to why
it works on server A at all!
Thanks again.
Ed
"Hilary Cotter" wrote:

> This could be a version issue, what are the results of select @.@.version from
> both servers.
> On my machine, using us_english, I get <a
> href="http://links.10026.com/?link=/hm/default.aspx?i=40559#secure">
> indexed and queried as a, href, hm, default, aspx, i, 40559, and secure.
> I would also check the noise word lists as Daniel suggests to make sure they
> are identical.
>
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> "edwaldo" <edwaldo@.discussions.microsoft.com> wrote in message
> news:C93D95AE-3296-42CD-B42E-10FCC841369F@.microsoft.com...
>
>
|||The problem is that you are using different word breakers. Apply Sp4 on both
SQL Server versions to get consistent behavior.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"edwaldo" <edwaldo@.discussions.microsoft.com> wrote in message
news:6F1AE47C-DD30-4AF5-AC45-9F61C73710E5@.microsoft.com...[vbcol=seagreen]
> Hilary/Daniel,
> Thank you both for your help with this.
> To reiterate, on server A the containtable does return the row. Version
> is
> Microsoft SQL Server 2000 - 8.00.194 (Intel X86) Developer Edition on
> Windows NT 5.1 (Build 2600: Service Pack 1)
> On server B it doesn't. Version is
> Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Standard Edition on
> Windows NT 5.0 (Build 2195: Service Pack 4)
> I tried splitting the phrase as you suggested:
> CONTAINSTABLE(item, contents, '"hm default aspx i 40559"')
> ..works on server A, not on server B. Which is a shame because this would
> be easy to implement.
> On both machines the data is the same. On both machines I cleared the
> contents of the noise files - noise.dat and noise.eng are both empty. I
> then
> created the FT catalogs using the English (United Kingdom) word breaker.
> I
> have repopulated each several times, and a simple LIKE query returns the
> row
> on both servers.
> I appreciate why it shouldn't work on server B, but I'm intrigued as to
> why
> it works on server A at all!
> Thanks again.
> Ed
>
> "Hilary Cotter" wrote:

No comments:

Post a Comment