Saturday, February 25, 2012

containstable, top_n_rank, and additional where clause combination causes unexpected resul

Hello,
Something strange happens if I try and do the following:
If I use the containstable function, with top_n_rank, along with an
extra "and where" clause, then the top_n_rank does not seem to return
the correct number of rows.
Here is an example to explain my point:
select *, tempidentity = IDENTITY (INT) into #tempt
from Store_BasicSearchableShelves,
containstable(Store_BasicSearchableShelves, ShelfName, @.sSearchText,
@.nextitemsrecpointer) tblSearchResults
where [key] = Store_BasicSearchableShelves.ShelfId
and ParentAisleId = @.PID
order by rank desc
If @.nextitemsrecpointer is equal to 20, then the stored proc only
returns 14 rows. If I remove the additional "and ParentAisleId =
@.PID" where clause, then the stored proc returns 20 rows, which is
correct. If I begin to edit the value of @.nextitemsrecpointer for
experimentational purposes to a higher value such as 30 or 40, then
the stored proc returns more rows - 18 and 25 respectively.
The additional "and ParentAisleId = @.PID" seems to be affecting the
way that top_n_rank is behaving.
Can anyone please provide me with a work around for this?
Thank you,
Regards, dnw.
Your problem is that when you limit your result set that is returned from
MSSearch further rows are removed by the "and ParentAisleId = @.PID" clause.
The approaches to this problem are 1) knowing in advance the number of rows
which are returned by MSSearch for this query and entering this value for
@.nextitemrecpointer, 2) partitioning your table into multiple partitioned
tables one for each PartentAisleID value so your query would end up looking
something like this:
if @.ParentAisleID=1
begin
select *, tempidentity = IDENTITY (INT) into #tempt
from Store_BasicSearchableShelves_1,
containstable(Store_BasicSearchableShelves, ShelfName, @.sSearchText,
@.nextitemsrecpointer) tblSearchResults
where [key] = Store_BasicSearchableShelves.ShelfId
order by rank desc
end
if @.ParentAisleID=2
begin
select *, tempidentity = IDENTITY (INT) into #tempt
from Store_BasicSearchableShelves_2,
containstable(Store_BasicSearchableShelves, ShelfName, @.sSearchText,
@.nextitemsrecpointer) tblSearchResults
where [key] = Store_BasicSearchableShelves.ShelfId
order by rank desc
end
3) picking a larger number which will guarantee a larger number of hits that
when filtered by @.ParentAisleID will yield at least @.nextitemsrecpointer
hits. You have to be careful here as you don't want to pick too large a
number to guarantee hits.
In large search applications partitioning in frequently used and it does
work very well. They will often have a seperate catalog for each of the
partitioned tables and you will then get a seperate threads for each catalog
which will improve your overall querying and indexing.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Dot net work" <dotnw@.hotmail.com> wrote in message
news:77b8c5a9.0410280048.698a85c7@.posting.google.c om...
> Hello,
> Something strange happens if I try and do the following:
> If I use the containstable function, with top_n_rank, along with an
> extra "and where" clause, then the top_n_rank does not seem to return
> the correct number of rows.
> Here is an example to explain my point:
> select *, tempidentity = IDENTITY (INT) into #tempt
> from Store_BasicSearchableShelves,
> containstable(Store_BasicSearchableShelves, ShelfName, @.sSearchText,
> @.nextitemsrecpointer) tblSearchResults
> where [key] = Store_BasicSearchableShelves.ShelfId
> and ParentAisleId = @.PID
> order by rank desc
>
> If @.nextitemsrecpointer is equal to 20, then the stored proc only
> returns 14 rows. If I remove the additional "and ParentAisleId =
> @.PID" where clause, then the stored proc returns 20 rows, which is
> correct. If I begin to edit the value of @.nextitemsrecpointer for
> experimentational purposes to a higher value such as 30 or 40, then
> the stored proc returns more rows - 18 and 25 respectively.
> The additional "and ParentAisleId = @.PID" seems to be affecting the
> way that top_n_rank is behaving.
> Can anyone please provide me with a work around for this?
> Thank you,
> Regards, dnw.
|||DNW,
Could you post the SQL Server and OS platform information from -- SELECT
@.@.version -- as well as a row count from your table
Store_BasicSearchableShelves? As all of this information is important in
first understanding your environment before making recommendations as well
as understanding the existing RANK values that are returned from your query.
The simple answer to your as why you query is not returning the expected
number of rows when using Top_N_Rank and with an additional WHERE clause is
that all of the WHERE clause parameters are applied AFTER the MSSearch
service returns the Top_N_Rank (not Top_N_Row... but "top_n_by_RANK").
Additionally, and depending upon the number of rows in your table, the
actual number or "top" values for RANK may be not what you expect as in
order to calculate rank, a statically large number of rows need to be
present.
You should also review the following KB article on the use and cautions of
using Top_N_Rank: 240833 (Q240833) "FIX: Full-Text Search Performance
Improved via Support for TOP" at
http://support.microsoft.com//defaul...b;EN-US;240833
Regards,
John
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:#4HNUAOvEHA.2540@.TK2MSFTNGP09.phx.gbl...
> Your problem is that when you limit your result set that is returned from
> MSSearch further rows are removed by the "and ParentAisleId = @.PID"
clause.
> The approaches to this problem are 1) knowing in advance the number of
rows
> which are returned by MSSearch for this query and entering this value for
> @.nextitemrecpointer, 2) partitioning your table into multiple partitioned
> tables one for each PartentAisleID value so your query would end up
looking
> something like this:
> if @.ParentAisleID=1
> begin
> select *, tempidentity = IDENTITY (INT) into #tempt
> from Store_BasicSearchableShelves_1,
> containstable(Store_BasicSearchableShelves, ShelfName, @.sSearchText,
> @.nextitemsrecpointer) tblSearchResults
> where [key] = Store_BasicSearchableShelves.ShelfId
> order by rank desc
> end
> if @.ParentAisleID=2
> begin
> select *, tempidentity = IDENTITY (INT) into #tempt
> from Store_BasicSearchableShelves_2,
> containstable(Store_BasicSearchableShelves, ShelfName, @.sSearchText,
> @.nextitemsrecpointer) tblSearchResults
> where [key] = Store_BasicSearchableShelves.ShelfId
> order by rank desc
> end
> 3) picking a larger number which will guarantee a larger number of hits
that
> when filtered by @.ParentAisleID will yield at least @.nextitemsrecpointer
> hits. You have to be careful here as you don't want to pick too large a
> number to guarantee hits.
> In large search applications partitioning in frequently used and it does
> work very well. They will often have a seperate catalog for each of the
> partitioned tables and you will then get a seperate threads for each
catalog
> which will improve your overall querying and indexing.
>
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
>
> "Dot net work" <dotnw@.hotmail.com> wrote in message
> news:77b8c5a9.0410280048.698a85c7@.posting.google.c om...
>
|||Thanks a lot for your advice.
-dnw.
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message news:<#4HNUAOvEHA.2540@.TK2MSFTNGP09.phx.gbl>...[vbcol=seagreen]
> Your problem is that when you limit your result set that is returned from
> MSSearch further rows are removed by the "and ParentAisleId = @.PID" clause.
> The approaches to this problem are 1) knowing in advance the number of rows
> which are returned by MSSearch for this query and entering this value for
> @.nextitemrecpointer, 2) partitioning your table into multiple partitioned
> tables one for each PartentAisleID value so your query would end up looking
> something like this:
> if @.ParentAisleID=1
> begin
> select *, tempidentity = IDENTITY (INT) into #tempt
> from Store_BasicSearchableShelves_1,
> containstable(Store_BasicSearchableShelves, ShelfName, @.sSearchText,
> @.nextitemsrecpointer) tblSearchResults
> where [key] = Store_BasicSearchableShelves.ShelfId
> order by rank desc
> end
> if @.ParentAisleID=2
> begin
> select *, tempidentity = IDENTITY (INT) into #tempt
> from Store_BasicSearchableShelves_2,
> containstable(Store_BasicSearchableShelves, ShelfName, @.sSearchText,
> @.nextitemsrecpointer) tblSearchResults
> where [key] = Store_BasicSearchableShelves.ShelfId
> order by rank desc
> end
> 3) picking a larger number which will guarantee a larger number of hits that
> when filtered by @.ParentAisleID will yield at least @.nextitemsrecpointer
> hits. You have to be careful here as you don't want to pick too large a
> number to guarantee hits.
> In large search applications partitioning in frequently used and it does
> work very well. They will often have a seperate catalog for each of the
> partitioned tables and you will then get a seperate threads for each catalog
> which will improve your overall querying and indexing.
>
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
>
> "Dot net work" <dotnw@.hotmail.com> wrote in message
> news:77b8c5a9.0410280048.698a85c7@.posting.google.c om...
|||Hi John,
[vbcol=seagreen]
SELECT
@.@.version
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Developer
Edition on Windows NT 5.0 (Build 2195: Service Pack 4)
[vbcol=seagreen]
At the moment, only 39.
[vbcol=seagreen]
expected
number of rows when using Top_N_Rank and with an additional WHERE
clause is
that all of the WHERE clause parameters are applied AFTER the MSSearch
service returns the Top_N_Rank (not Top_N_Row... but "top_n_by_RANK").
As I am a newbie, please can you explain the difference between those
3 things please - Top_N_Rank, Top_N_Row and top_n_by_RANK. Thanks.
Thank you,
Regards, dnw.
"John Kane" <jt-kane@.comcast.net> wrote in message news:<#iLXshRvEHA.3840@.TK2MSFTNGP12.phx.gbl>...[vbcol=seagreen]
> DNW,
> Could you post the SQL Server and OS platform information from -- SELECT
> @.@.version -- as well as a row count from your table
> Store_BasicSearchableShelves? As all of this information is important in
> first understanding your environment before making recommendations as well
> as understanding the existing RANK values that are returned from your query.
> The simple answer to your as why you query is not returning the expected
> number of rows when using Top_N_Rank and with an additional WHERE clause is
> that all of the WHERE clause parameters are applied AFTER the MSSearch
> service returns the Top_N_Rank (not Top_N_Row... but "top_n_by_RANK").
> Additionally, and depending upon the number of rows in your table, the
> actual number or "top" values for RANK may be not what you expect as in
> order to calculate rank, a statically large number of rows need to be
> present.
> You should also review the following KB article on the use and cautions of
> using Top_N_Rank: 240833 (Q240833) "FIX: Full-Text Search Performance
> Improved via Support for TOP" at
> http://support.microsoft.com//defaul...b;EN-US;240833
> Regards,
> John
>
>
> "Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
> news:#4HNUAOvEHA.2540@.TK2MSFTNGP09.phx.gbl...
> clause.
> rows
> looking
> that
> catalog
|||Thanks, DNW,
While the version (SQL & OS platform) are less important for your questions,
overall the OS platform is most important for understanding expected FTS
query results when searching on specific words &/or punctuation characters
due to OS-specific wordbreaker issues, see
http://groups.google.com/groups?q=langwrbk+infosoft for details.
However, in this case the row count is the most important factor, especially
when used with Top_N_Rank. I'd recommend that you review SQL Server 2000 BOL
title "Full-Text Search Recommendations" and the next to last paragraph on
RANK for a better understanding of how RANK is calculated in SQL Sever 2000.
Rank needs a "statistically significant" number of rows (and therefore
number of unique non-noise words) in order to be useful and 39 rows is not a
"statistically significant" number of rows. While it may depend upon the
number of unique non-noise words, the number of rows is important as well,
and at least 10,000+ rows are generally the recommended number of rows to
start using RANK and you won't need Top_N_Rank for performance reasons
until at least 1 million rows.
As for "Top_N_Rank, Top_N_Row and top_n_by_RANK", there is only Top_N_Rank,
the other two were only metaphors that I used in my explanation as while
Top_N_Rank does limit the number of rows returned it is in fact a limit for
N (some number) of rows returned by RANK and not explicitly a row limiter as
is Top. Sorry, for the confusion, but with only 39 rows, I'd recommend that
you do not use Top_N_Rank as it was added as a fix in SQL Server 7.0 (and
included in SQL Server 2000) to improve the FTS query performance when used
against very large (1 to 2+ million) row tables that can generate large FT
Catalogs. See KB article 240833 (Q240833) for more info.
Again, thanks for providing the @.@.version as well as the row count info!
John
"Dot net work" <dotnw@.hotmail.com> wrote in message
news:77b8c5a9.0410290622.8147ced@.posting.google.co m...
> Hi John,
> SELECT
> @.@.version
> Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
> 14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Developer
> Edition on Windows NT 5.0 (Build 2195: Service Pack 4)
>
> At the moment, only 39.
> expected
> number of rows when using Top_N_Rank and with an additional WHERE
> clause is
> that all of the WHERE clause parameters are applied AFTER the MSSearch
> service returns the Top_N_Rank (not Top_N_Row... but "top_n_by_RANK").
> As I am a newbie, please can you explain the difference between those
> 3 things please - Top_N_Rank, Top_N_Row and top_n_by_RANK. Thanks.
> Thank you,
> Regards, dnw.
>
> "John Kane" <jt-kane@.comcast.net> wrote in message
news:<#iLXshRvEHA.3840@.TK2MSFTNGP12.phx.gbl>...[vbcol=seagreen]
well[vbcol=seagreen]
query.[vbcol=seagreen]
is[vbcol=seagreen]
of[vbcol=seagreen]
from[vbcol=seagreen]
for[vbcol=seagreen]
partitioned[vbcol=seagreen]
hits[vbcol=seagreen]
@.nextitemsrecpointer[vbcol=seagreen]
a[vbcol=seagreen]
does[vbcol=seagreen]
the[vbcol=seagreen]
return[vbcol=seagreen]
|||That info was really interesting! Thanks a lot.
-dnw.
"John Kane" <jt-kane@.comcast.net> wrote in message news:<emMzmVdvEHA.2200@.TK2MSFTNGP11.phx.gbl>...[vbcol=seagreen]
> Thanks, DNW,
> While the version (SQL & OS platform) are less important for your questions,
> overall the OS platform is most important for understanding expected FTS
> query results when searching on specific words &/or punctuation characters
> due to OS-specific wordbreaker issues, see
> http://groups.google.com/groups?q=langwrbk+infosoft for details.
> However, in this case the row count is the most important factor, especially
> when used with Top_N_Rank. I'd recommend that you review SQL Server 2000 BOL
> title "Full-Text Search Recommendations" and the next to last paragraph on
> RANK for a better understanding of how RANK is calculated in SQL Sever 2000.
> Rank needs a "statistically significant" number of rows (and therefore
> number of unique non-noise words) in order to be useful and 39 rows is not a
> "statistically significant" number of rows. While it may depend upon the
> number of unique non-noise words, the number of rows is important as well,
> and at least 10,000+ rows are generally the recommended number of rows to
> start using RANK and you won't need Top_N_Rank for performance reasons
> until at least 1 million rows.
> As for "Top_N_Rank, Top_N_Row and top_n_by_RANK", there is only Top_N_Rank,
> the other two were only metaphors that I used in my explanation as while
> Top_N_Rank does limit the number of rows returned it is in fact a limit for
> N (some number) of rows returned by RANK and not explicitly a row limiter as
> is Top. Sorry, for the confusion, but with only 39 rows, I'd recommend that
> you do not use Top_N_Rank as it was added as a fix in SQL Server 7.0 (and
> included in SQL Server 2000) to improve the FTS query performance when used
> against very large (1 to 2+ million) row tables that can generate large FT
> Catalogs. See KB article 240833 (Q240833) for more info.
>
> Again, thanks for providing the @.@.version as well as the row count info!
> John
>
> "Dot net work" <dotnw@.hotmail.com> wrote in message
> news:77b8c5a9.0410290622.8147ced@.posting.google.co m...
> news:<#iLXshRvEHA.3840@.TK2MSFTNGP12.phx.gbl>...
> well
> query.
> is
> of
> from
> clause.
> rows
> for
> partitioned
> looking
> hits
> that
> @.nextitemsrecpointer
> a
> does
> the
> catalog
> return

No comments:

Post a Comment