To quote the MS SQL Server 2000's online book on CONTAINS:
"If the text and asterisk are not delimited by double quotation marks, as in
CONTAINS (column, 'text*'), full-text search considers the asterisk as a
character and will search for exact matches to text*."
Can anybody confirm the above to be true ?
It is my experience that MS documentation in general - and in particular
their examples - contains a lot of errors, but here they specifically
mention that is should be possible to full-text search for 'text*' not
counting the asterisk as a wildcard. When I try to full-text search for
'text*' using CONTAINS the result includes records containing 'text' without
the asterisk.
Best regards
Johnny Nielsen
This is correct for a single search argument in your search phrase. The
behavior changes if you have multiple search arguements in your search
phrase.
With double quotation marks wrapping around your search phrase it will
wildcard, so results will match the prefix or beginnings (what comes before
the *), but will have different suffixes or endings, and it will match with
words or tokens in your table that has an * at the end. The * will be
applied to all search arguments.
so a search on "al anon*" will match with al anon, alcoholics anonymous,
allan anont, but not and mike anon*.
If you search without the double quotation marks and you have multiple
search arguments
ie
select *from authors where contains(*,'al anon*')
your query will bomb with
Server: Msg 7631, Level 15, State 1, Line 1
Syntax error occurred near 'anon*'. Expected ''''' in search condition 'al
anon*'.
If you query on a single word or token with no double quotation marks you
will get not wildcarding, but will get an exact match to the anon, and
anon*.
If you query on a single word or token and wrap it in double quotation marks
you will get wildcarding and it will match with anon and anon*.
So for a single search word or token in your search phrase you need to wrap
your search phrase in double quotes to get wildcarding.
For multiple search arguments, words or tokens in your search phrase you
need to wrap your search phrase in double quotes to get wildcarding, but all
search arguments will be wildcarded and you will get matches to tokens or
words in your table ending with the * and matching the search argument
exactly.
In FreeText the * will be ignored, but will be return hits to matches to the
search arguments which are stemmed or unstemmed, and end in an * and don't
end with the *.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
news:unzSUC6dEHA.1644@.tk2msftngp13.phx.gbl...
> To quote the MS SQL Server 2000's online book on CONTAINS:
> "If the text and asterisk are not delimited by double quotation marks, as
in
> CONTAINS (column, 'text*'), full-text search considers the asterisk as a
> character and will search for exact matches to text*."
> Can anybody confirm the above to be true ?
> It is my experience that MS documentation in general - and in particular
> their examples - contains a lot of errors, but here they specifically
> mention that is should be possible to full-text search for 'text*' not
> counting the asterisk as a wildcard. When I try to full-text search for
> 'text*' using CONTAINS the result includes records containing 'text'
without
> the asterisk.
>
> Best regards
> Johnny Nielsen
>
|||"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:Oz0AhW8dEHA.3664@.TK2MSFTNGP12.phx.gbl...
> This is correct for a single search argument in your search phrase.
So, just to be sure I understand you correctly - if you perform a SELECT *
FROM SOMETABLE WHERE CONTAINS(*,'test*') you will only match words
containing the five characters 'test*' and not the four-character word
'test' ?
On my system CONTAINS(*,'test*') and CONTAINS(*,' "test*" ') seems to return
the same results (it treats the asterisk in 'test*' as a wildcard which is
not what I want).
> The behavior changes if you have multiple search arguements in your search
phrase.
Can the behavior of the single-quoted simple term 'test*' change by joining
it to other arguments ?
I know the meaning of double-quotes in prefix terms, but I'm only interested
in the single-quoted non-prefix cases (I'm writing a search-condition parser
which is why the specific workings are important to me).
> With double quotation marks wrapping around your search phrase it will
> wildcard, so results will match the prefix or beginnings (what comes
before
> the *), but will have different suffixes or endings, and it will match
with
> words or tokens in your table that has an * at the end. The * will be
> applied to all search arguments.
> so a search on "al anon*" will match with al anon, alcoholics anonymous,
> allan anont, but not and mike anon*.
> If you search without the double quotation marks and you have multiple
> search arguments
> ie
> select *from authors where contains(*,'al anon*')
> your query will bomb with
> Server: Msg 7631, Level 15, State 1, Line 1
> Syntax error occurred near 'anon*'. Expected ''''' in search condition 'al
> anon*'.
> If you query on a single word or token with no double quotation marks you
> will get not wildcarding, but will get an exact match to the anon, and
> anon*.
> If you query on a single word or token and wrap it in double quotation
marks
> you will get wildcarding and it will match with anon and anon*.
> So for a single search word or token in your search phrase you need to
wrap
> your search phrase in double quotes to get wildcarding.
> For multiple search arguments, words or tokens in your search phrase you
> need to wrap your search phrase in double quotes to get wildcarding, but
all
> search arguments will be wildcarded and you will get matches to tokens or
> words in your table ending with the * and matching the search argument
> exactly.
> In FreeText the * will be ignored, but will be return hits to matches to
the[vbcol=seagreen]
> search arguments which are stemmed or unstemmed, and end in an * and don't
> end with the *.
>
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
> news:unzSUC6dEHA.1644@.tk2msftngp13.phx.gbl...
as
> in
> without
>
Best regards
Johnny Nielsen
|||No, I am saying Select * from tablename where contains(*,'test*') will
return hits to test and test*.
With Select * from tablename where contains(*,'"test*"') I will get hits to
test, test*, and tested as it is now wildcarding
You can't really change the behavior of the wildcarding by joining it to
other clauses, unless perhaps you do a join to a like that looks like this
like '% test %'
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
news:OXHRNu$dEHA.3864@.TK2MSFTNGP10.phx.gbl...
> "Hilary Cotter" <hilaryk@.att.net> wrote in message
> news:Oz0AhW8dEHA.3664@.TK2MSFTNGP12.phx.gbl...
> So, just to be sure I understand you correctly - if you perform a SELECT *
> FROM SOMETABLE WHERE CONTAINS(*,'test*') you will only match words
> containing the five characters 'test*' and not the four-character word
> 'test' ?
> On my system CONTAINS(*,'test*') and CONTAINS(*,' "test*" ') seems to
return[vbcol=seagreen]
> the same results (it treats the asterisk in 'test*' as a wildcard which is
> not what I want).
>
search
> phrase.
>
> Can the behavior of the single-quoted simple term 'test*' change by
joining
> it to other arguments ?
> I know the meaning of double-quotes in prefix terms, but I'm only
interested
> in the single-quoted non-prefix cases (I'm writing a search-condition
parser[vbcol=seagreen]
> which is why the specific workings are important to me).
>
> before
> with
'al[vbcol=seagreen]
you[vbcol=seagreen]
> marks
> wrap
> all
or[vbcol=seagreen]
> the
don't[vbcol=seagreen]
> as
a[vbcol=seagreen]
particular[vbcol=seagreen]
for
>
> Best regards
> Johnny Nielsen
>
>
|||"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:%23I4gCIAeEHA.3132@.TK2MSFTNGP11.phx.gbl...
> No, I am saying Select * from tablename where contains(*,'test*') will
> return hits to test and test*.
I'm sorry Hillary, but now I'm really confused. According to the MS
documentation that you seem to agree upon the second asterisk in
CONTAINS(*,'test*') isn't acting as a wildcard but is a part of the word
searched for.
Then how come it will match the word test without an appended asterisk ?
That would be similar to CONTAINS(*,'server') matching the word serve
without the appended 'r'.
> With Select * from tablename where contains(*,'"test*"') I will get hits
to
> test, test*, and tested as it is now wildcarding
I agree and I have now problem understanding the functionality of the
double-quotes - they seem to work as expected :-)
It is only the functionality of the single-quoted cases that troubles me.
> You can't really change the behavior of the wildcarding by joining it to
> other clauses, unless perhaps you do a join to a like that looks like this
> like '% test %'
OK, I just tried to find out what you meant by "the behavior changes if you
have multiple search arguements in your search phrase" .. you didn't mean
that the behavior of the single-quoted term then.
PS: This has nothing to do with the current subject, but do you happen to
know the valid range of full-text searchable unicode characters ? I can't
seem to find a clear definition in the MS documentation .. not even a
definition of 'punctuation' symbols.
Best regards
Johnny Nielsen
[vbcol=seagreen]
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
> news:OXHRNu$dEHA.3864@.TK2MSFTNGP10.phx.gbl...
*[vbcol=seagreen]
> return
is[vbcol=seagreen]
> search
> joining
> interested
> parser
anonymous,[vbcol=seagreen]
> 'al
> you
you[vbcol=seagreen]
but[vbcol=seagreen]
> or
to[vbcol=seagreen]
> don't
marks,[vbcol=seagreen]
as[vbcol=seagreen]
> a
> particular
specifically[vbcol=seagreen]
not
> for
>
|||Yes that is correct. The subtlety is that non alphanumeric characters with
some exceptions are treated as white space.
So test and test* are both indexed as test. So a search on test will return
hits to test and test*.
All text (even ascii) is indexed as its unicode equivalent. Its punctuation
marks which are treated as white space and you get this problem with.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
news:u5Jc8EBeEHA.1356@.TK2MSFTNGP09.phx.gbl...[vbcol=seagreen]
> "Hilary Cotter" <hilaryk@.att.net> wrote in message
> news:%23I4gCIAeEHA.3132@.TK2MSFTNGP11.phx.gbl...
> I'm sorry Hillary, but now I'm really confused. According to the MS
> documentation that you seem to agree upon the second asterisk in
> CONTAINS(*,'test*') isn't acting as a wildcard but is a part of the word
> searched for.
> Then how come it will match the word test without an appended asterisk ?
> That would be similar to CONTAINS(*,'server') matching the word serve
> without the appended 'r'.
>
> to
> I agree and I have now problem understanding the functionality of the
> double-quotes - they seem to work as expected :-)
> It is only the functionality of the single-quoted cases that troubles me.
>
this
> OK, I just tried to find out what you meant by "the behavior changes if
you[vbcol=seagreen]
> have multiple search arguements in your search phrase" .. you didn't mean
> that the behavior of the single-quoted term then.
>
> PS: This has nothing to do with the current subject, but do you happen to
> know the valid range of full-text searchable unicode characters ? I can't
> seem to find a clear definition in the MS documentation .. not even a
> definition of 'punctuation' symbols.
>
> Best regards
> Johnny Nielsen
>
SELECT[vbcol=seagreen]
> *
which[vbcol=seagreen]
> is
will[vbcol=seagreen]
match[vbcol=seagreen]
be[vbcol=seagreen]
> anonymous,
multiple[vbcol=seagreen]
condition[vbcol=seagreen]
marks[vbcol=seagreen]
and[vbcol=seagreen]
quotation[vbcol=seagreen]
to[vbcol=seagreen]
> you
> but
tokens[vbcol=seagreen]
argument[vbcol=seagreen]
matches[vbcol=seagreen]
> to
message[vbcol=seagreen]
> marks,
asterisk[vbcol=seagreen]
> as
> specifically
> not
search[vbcol=seagreen]
'text'
>
|||"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:%23itsZkJeEHA.3512@.TK2MSFTNGP12.phx.gbl...
> Yes that is correct. The subtlety is that non alphanumeric characters with
> some exceptions are treated as white space.
...but as I quoted from the MS online books the asterisk is *not* treated as
white space when used inside single-quotes - that's my point. To repeat the
MS statement from the online books, that I originally questioned: [If the
text and asterisk are not delimited by double quotation marks, as in
CONTAINS (column, 'text*'), full-text search considers the asterisk as a
character and will search for exact matches to text*.]
> So test and test* are both indexed as test. So a search on test will
return
> hits to test and test*.
Well, according to MS the asterisk in single-quotes is treated as a
character and therefore test and test* should be considered two different
words. That is, contains(*,'test*') should never match the word test.
> All text (even ascii) is indexed as its unicode equivalent. Its
punctuation
> marks which are treated as white space and you get this problem with.
Sadly this is rather an additional problem. Building a parser is hard job
when the precise definition of what is considered punctuation symbols and
indexable characters isn't available to the public :-(
Best regards
Johnny Nielsen
[vbcol=seagreen]
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
> news:u5Jc8EBeEHA.1356@.TK2MSFTNGP09.phx.gbl...
hits[vbcol=seagreen]
me.[vbcol=seagreen]
to[vbcol=seagreen]
> this
> you
mean[vbcol=seagreen]
to[vbcol=seagreen]
can't[vbcol=seagreen]
phrase.[vbcol=seagreen]
> SELECT
word[vbcol=seagreen]
to[vbcol=seagreen]
> which
your[vbcol=seagreen]
search-condition[vbcol=seagreen]
> will
comes[vbcol=seagreen]
> match
> be
> multiple
> condition
> marks
> and
> quotation
need[vbcol=seagreen]
> to
phrase[vbcol=seagreen]
wildcarding,[vbcol=seagreen]
> tokens
> argument
> matches
and[vbcol=seagreen]
> message
> asterisk
'text*'
> search
> 'text'
>
|||According to my tests with the neutral and US Englishword breaker the * is
treated as white space when you have a single search argument and you
wildcard it, and you wrap it in single quotes.
In other words a search on test will return the same hits on a search on
test*.
If you wrap the phrase with double quotes you will get wildcarding, but it
will still match with test*.
Think of it this way. While indexing when MSSearch sees non white space, non
alphanumeric characters it strips them out and indexes the alphanumeric
characters.
So test* is indexed as test and is indistinguishable from test.
When you search on test* the * is stripped off and the search is performed
on test.
Confusing I know.
HTH
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
news:%236Y8P9KeEHA.2812@.tk2msftngp13.phx.gbl...[vbcol=seagreen]
> "Hilary Cotter" <hilaryk@.att.net> wrote in message
> news:%23itsZkJeEHA.3512@.TK2MSFTNGP12.phx.gbl...
with
> ..but as I quoted from the MS online books the asterisk is *not* treated
as
> white space when used inside single-quotes - that's my point. To repeat
the[vbcol=seagreen]
> MS statement from the online books, that I originally questioned: [If the
> text and asterisk are not delimited by double quotation marks, as in
> CONTAINS (column, 'text*'), full-text search considers the asterisk as a
> character and will search for exact matches to text*.]
>
> return
> Well, according to MS the asterisk in single-quotes is treated as a
> character and therefore test and test* should be considered two different
> words. That is, contains(*,'test*') should never match the word test.
>
> punctuation
> Sadly this is rather an additional problem. Building a parser is hard job
> when the precise definition of what is considered punctuation symbols and
> indexable characters isn't available to the public :-(
>
> Best regards
> Johnny Nielsen
will[vbcol=seagreen]
word[vbcol=seagreen]
?[vbcol=seagreen]
> hits
> me.
it[vbcol=seagreen]
> to
like[vbcol=seagreen]
if[vbcol=seagreen]
> mean
> to
> can't
message[vbcol=seagreen]
> phrase.
> word
> to
> your
by[vbcol=seagreen]
> search-condition
it[vbcol=seagreen]
> comes
will[vbcol=seagreen]
anon,[vbcol=seagreen]
> need
> phrase
> wildcarding,
> and
quotation
> 'text*'
>
|||OK, thanks for spending your time Hilary.
Having my observations confirmed I now know how not to proceed :-)
I really should stop trusting those MS examples...
Best regards
Johnny*Nielsen
"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:O$0froMeEHA.3512@.TK2MSFTNGP12.phx.gbl...
> According to my tests with the neutral and US Englishword breaker the * is
> treated as white space when you have a single search argument and you
> wildcard it, and you wrap it in single quotes.
> In other words a search on test will return the same hits on a search on
> test*.
> If you wrap the phrase with double quotes you will get wildcarding, but it
> will still match with test*.
> Think of it this way. While indexing when MSSearch sees non white space,
non[vbcol=seagreen]
> alphanumeric characters it strips them out and indexes the alphanumeric
> characters.
> So test* is indexed as test and is indistinguishable from test.
> When you search on test* the * is stripped off and the search is performed
> on test.
> Confusing I know.
> HTH
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
> news:%236Y8P9KeEHA.2812@.tk2msftngp13.phx.gbl...
> with
> as
> the
the[vbcol=seagreen]
different[vbcol=seagreen]
job[vbcol=seagreen]
and[vbcol=seagreen]
> will
> word
asterisk[vbcol=seagreen]
> ?
serve[vbcol=seagreen]
get[vbcol=seagreen]
the[vbcol=seagreen]
troubles[vbcol=seagreen]
> it
> like
> if
happen[vbcol=seagreen]
a[vbcol=seagreen]
> message
a[vbcol=seagreen]
words[vbcol=seagreen]
four-character[vbcol=seagreen]
seems[vbcol=seagreen]
wildcard[vbcol=seagreen]
> by
only[vbcol=seagreen]
> it
will[vbcol=seagreen]
> will
quotation[vbcol=seagreen]
> anon,
anon*.[vbcol=seagreen]
to[vbcol=seagreen]
*[vbcol=seagreen]
> quotation
in[vbcol=seagreen]
containing
>
|||just to clarify further you get the same result searching on test* as you
get searching on test!, or test#.
Here is another stupid SQL FTS trick
try this
select * from tablename where contains(*,'#test*$')
or
select * from tablename where contains(*,'$test*$')
or
select * from tablename where contains(*,'"test*"')
Whatever delimiter you use it still works, even if you mix the delimiter
types.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE.THIStrap402@.megabit.dk> wrote in message
news:uqCdXGNeEHA.3212@.TK2MSFTNGP10.phx.gbl...[vbcol=seagreen]
> OK, thanks for spending your time Hilary.
> Having my observations confirmed I now know how not to proceed :-)
> I really should stop trusting those MS examples...
>
> Best regards
> Johnny*Nielsen
>
> "Hilary Cotter" <hilaryk@.att.net> wrote in message
> news:O$0froMeEHA.3512@.TK2MSFTNGP12.phx.gbl...
is[vbcol=seagreen]
it[vbcol=seagreen]
> non
performed[vbcol=seagreen]
characters[vbcol=seagreen]
treated[vbcol=seagreen]
repeat[vbcol=seagreen]
> the
a[vbcol=seagreen]
> different
with.[vbcol=seagreen]
> job
> and
message[vbcol=seagreen]
contains(*,'test*')[vbcol=seagreen]
MS[vbcol=seagreen]
the[vbcol=seagreen]
> asterisk
> serve
> get
> the
> troubles
joining[vbcol=seagreen]
changes[vbcol=seagreen]
didn't[vbcol=seagreen]
> happen
I[vbcol=seagreen]
even[vbcol=seagreen]
> a
perform[vbcol=seagreen]
> a
> words
> four-character
> seems
> wildcard
in[vbcol=seagreen]
change[vbcol=seagreen]
> only
phrase[vbcol=seagreen]
(what[vbcol=seagreen]
> will
*[vbcol=seagreen]
alcoholics[vbcol=seagreen]
have[vbcol=seagreen]
> quotation
> anon*.
you[vbcol=seagreen]
search[vbcol=seagreen]
> to
search[vbcol=seagreen]
to[vbcol=seagreen]
an[vbcol=seagreen]
> *
in[vbcol=seagreen]
> in
full-text
> containing
>
No comments:
Post a Comment