Friday, February 24, 2012

CONTAINS and Monetary values stored in text data type

I have data (pages of reports) stored in the text data type. The table w/
this data has a full text index on it, and I am trying to query for monetary
values such 123.45 using the CONTAINS predicate.
Ex.
select *
from tablename
where contains(document_text, '123.45')
The problem is that the query returns "too many" rows. In other words, the
query is finding the exact matches to 123.45 but also returns other records
that do not seem to match. I get the same results when I use a like search
on the text field as well (column like '%123.45%').
My goal is to return exact matches, only, to the monetary search value.
Thanks in advance.
-Eric
Eric,
Could you post the full output of -- SELECT @.@.version -- as well as your
exact FTS query (contains or freetext) and an example of the actual content?
This information is good for troubleshooting SQL FTS issues such as this.
When you say that the FTS query returns other records that do not seem to
match, do the other rows have similar numeric values and if so, could you
provide an example of this content?
Thanks,
John
"Eric Keeney" <Eric Keeney@.discussions.microsoft.com> wrote in message
news:F8D7C16C-8322-4CB8-B0D3-418CD23F739A@.microsoft.com...
> I have data (pages of reports) stored in the text data type. The table w/
> this data has a full text index on it, and I am trying to query for
monetary
> values such 123.45 using the CONTAINS predicate.
> Ex.
> select *
> from tablename
> where contains(document_text, '123.45')
> The problem is that the query returns "too many" rows. In other words,
the
> query is finding the exact matches to 123.45 but also returns other
records
> that do not seem to match. I get the same results when I use a like
search
> on the text field as well (column like '%123.45%').
> My goal is to return exact matches, only, to the monetary search value.
> Thanks in advance.
> -Eric
|||The problem is that 123.45 is indexed as 123 and 45 if 1,2,3,4, and 5 are
not in your noise word list.
When you query it would be searched on as 123 45 and would match with
123.45, 123 45, $123.45, 123!45, 123$45. etc.
This behavior is identical for the English and the neutral word breaker.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Eric Keeney" <Eric Keeney@.discussions.microsoft.com> wrote in message
news:F8D7C16C-8322-4CB8-B0D3-418CD23F739A@.microsoft.com...
> I have data (pages of reports) stored in the text data type. The table w/
> this data has a full text index on it, and I am trying to query for
monetary
> values such 123.45 using the CONTAINS predicate.
> Ex.
> select *
> from tablename
> where contains(document_text, '123.45')
> The problem is that the query returns "too many" rows. In other words,
the
> query is finding the exact matches to 123.45 but also returns other
records
> that do not seem to match. I get the same results when I use a like
search
> on the text field as well (column like '%123.45%').
> My goal is to return exact matches, only, to the monetary search value.
> Thanks in advance.
> -Eric
|||select @.@.version:
Microsoft SQL Server 2000 - 8.00.818 (Intel X86)
May 31 2003 16:08:15
Copyright (c) 1988-2003 Microsoft Corporation
Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 4)
FTS query:
select *
from tablename
where contains(document_text, '1278.85')
I cannot post any portion of the data, it is confiential info. But I'll
give it a go at describing it. Essentially, each page has a report header
about 3 lines long that is text and date information. Then each page has
column headers, then the rest of the page has monetary values, a lot of ".00"
and occasionally an actual meaningful value such as 123.45.
The example query above finds 4 occurrences of 1278.85 and two other
records. The two other records have values of 1085.74 which is the closest I
can find to 1278.85.
Thanks for taking the time to look at this.
-Eric
"John Kane" wrote:

> Eric,
> Could you post the full output of -- SELECT @.@.version -- as well as your
> exact FTS query (contains or freetext) and an example of the actual content?
> This information is good for troubleshooting SQL FTS issues such as this.
> When you say that the FTS query returns other records that do not seem to
> match, do the other rows have similar numeric values and if so, could you
> provide an example of this content?
> Thanks,
> John
>
>
> "Eric Keeney" <Eric Keeney@.discussions.microsoft.com> wrote in message
> news:F8D7C16C-8322-4CB8-B0D3-418CD23F739A@.microsoft.com...
> monetary
> the
> records
> search
>
>
|||Is there any way to get the word breaker to consider 123.45 as a whole word?
"Hilary Cotter" wrote:

> The problem is that 123.45 is indexed as 123 and 45 if 1,2,3,4, and 5 are
> not in your noise word list.
> When you query it would be searched on as 123 45 and would match with
> 123.45, 123 45, $123.45, 123!45, 123$45. etc.
> This behavior is identical for the English and the neutral word breaker.
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Eric Keeney" <Eric Keeney@.discussions.microsoft.com> wrote in message
> news:F8D7C16C-8322-4CB8-B0D3-418CD23F739A@.microsoft.com...
> monetary
> the
> records
> search
>
>
|||add a column to your table the same data type as the column you are
currently full text indexing.
Replace all occurences of . that occur and represent the decimal point with
a token you don't search on, ie XXX.
Then when FTI this new column. When you are searching replace all occurences
of . that occur and represent the decimal point with XXX, it a search on
123.45 will be changed into a search on 123XXX45.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Eric Keeney" <EricKeeney@.discussions.microsoft.com> wrote in message
news:A3CB8D09-C53C-415C-AA92-55E305EC62AA@.microsoft.com...
> Is there any way to get the word breaker to consider 123.45 as a whole
word?[vbcol=seagreen]
> "Hilary Cotter" wrote:
are[vbcol=seagreen]
table w/[vbcol=seagreen]
words,[vbcol=seagreen]
value.[vbcol=seagreen]
|||Thanks Hilary, interesting idea. I will try that if I have to. So, do you
know if it is possible to configure the word-breaker? My default full-text
language is English US.
What would happen if I included the decimal "." in the noise-word list?
"Hilary Cotter" wrote:

> add a column to your table the same data type as the column you are
> currently full text indexing.
> Replace all occurences of . that occur and represent the decimal point with
> a token you don't search on, ie XXX.
> Then when FTI this new column. When you are searching replace all occurences
> of . that occur and represent the decimal point with XXX, it a search on
> 123.45 will be changed into a search on 123XXX45.
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Eric Keeney" <EricKeeney@.discussions.microsoft.com> wrote in message
> news:A3CB8D09-C53C-415C-AA92-55E305EC62AA@.microsoft.com...
> word?
> are
> table w/
> words,
> value.
>
>
|||Eric,
Unfortunately, it's not possible to *configure* the wordbreaker and adding
any punctuation (other than $ that is already present) to the noise word
files will have no affect, regardless of the OS platform. :-(.
Regards,
John
"Eric Keeney" <EricKeeney@.discussions.microsoft.com> wrote in message
news:9860F962-2EA5-4690-928A-518B91318E40@.microsoft.com...
> Thanks Hilary, interesting idea. I will try that if I have to. So, do
you
> know if it is possible to configure the word-breaker? My default
full-text[vbcol=seagreen]
> language is English US.
> What would happen if I included the decimal "." in the noise-word list?
> "Hilary Cotter" wrote:
with[vbcol=seagreen]
occurences[vbcol=seagreen]
5[vbcol=seagreen]
with[vbcol=seagreen]
breaker.[vbcol=seagreen]
message[vbcol=seagreen]
for[vbcol=seagreen]
other[vbcol=seagreen]
like[vbcol=seagreen]

No comments:

Post a Comment