Consolidation Query: controlling temdb size

We have a create table query against 2 large tables (about 12 gig) that
balloons our tempdb to about 17 gigs. Is there a way to constrain the growt
h
of tempdb by turning off logging of some operations. The query is fairly
clean and the join and group by clauses are fully covered by indexes.There is no way to turn off logging. Is it the log file that is ballooning
or the data file? If it's the data file (which I suspect is most of it)
then logging has nothing to do with it anyway. You can pretty much expect a
lot of activity in TempDB when you have that much data that you are joining
and especially grouping by. It has to keep the intermediate results
somewhere while it groups them and that is tempdb. Maybe if you post the DDL
for the tables involved and the actual query someone can suggest something.
Andrew J. Kelly SQL MVP
"Consultant Mark" <Consultant Mark@.discussions.microsoft.com> wrote in
message news:85E3C12A-EB74-4489-916A-7E15FAB04AA4@.microsoft.com...
> We have a create table query against 2 large tables (about 12 gig) that
> balloons our tempdb to about 17 gigs. Is there a way to constrain the
> growth
> of tempdb by turning off logging of some operations. The query is fairly
> clean and the join and group by clauses are fully covered by indexes.|||Andrew,
Thanks for your reply. The problem is the tempdb file, not the data file.
Using table hints to force use of appropriate indexes and a 'merge' join hin
t
the query now has the same 1hr 50min time but only brought temdb to 9 gig
(rather than 17gig). Below is the DDL for the table and query. By the way
the accnt table is 1.1 million rows and the proft table is 25 million rows.
___
CREATE TABLE [dbo].[accnt] (
[FIRM_ID] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[ACCT_NO] [char] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[DATABASE_DATE] [smalldatetime] NULL ,
[REP] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[BRANCHLABEL] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[REGIONLABEL] [char] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[BAL_OTHR] [money] NOT NULL ,
[CNT_OTHR] [int] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[proft] (
[FIRM_ID] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[ACCT_NO] [char] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[DATABASE_DATE] [smalldatetime] NULL ,
[HOUSE_REVENUE] [money] NOT NULL ,
[TRADING_REVENUE] [money] NOT NULL ,
[TOTAL_EXPENSE] [money] NOT NULL ,
[CLIENT_PROFITABILITY] [money] NOT NULL ,
[HasParent] [bit] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[accnt] WITH NOCHECK ADD
CONSTRAINT [PK_firm_acct] PRIMARY KEY CLUSTERED
([FIRM_ID],[ACCT_NO]) ON [PRIMARY]
GO
CREATE INDEX [FIRM_ID_ind] ON [dbo].[accnt]([FIRM_ID]) ON [PRIMARY]
GO
CREATE INDEX [ACCT_NO_ind] ON [dbo].[accnt]([ACCT_NO]) ON [PRIMARY]
GO
CREATE INDEX [DATABASE_DATE_ind] ON [dbo].[accnt]([DATABASE_DATE]) ON
[PRIMARY]
GO
CREATE INDEX [REP_ind] ON [dbo].[accnt]([REP]) ON [PRIMARY]
GO
CREATE INDEX [BRANCHLABEL_ind] ON [dbo].[accnt]([BRANCHLABEL]) ON [PRIMARY]
GO
CREATE INDEX [REGIONLABEL_ind] ON [dbo].[accnt]([REGIONLABEL]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[proft] ADD
CONSTRAINT [HasParentDefault] DEFAULT (0) FOR [HasParent]
GO
CREATE INDEX [HasParent_ind] ON [dbo].[proft]([HasParent]) ON [PRIMARY]
GO
CREATE INDEX [FIRM_ID_ind] ON [dbo].[proft]([FIRM_ID]) ON [PRIMARY]
GO
CREATE INDEX [ACCT_NO_ind] ON [dbo].[proft]([ACCT_NO]) ON [PRIMARY]
GO
CREATE INDEX [FIRMACCT_ind] ON [dbo].[proft]([FIRM_ID], [ACCT_NO]) ON
[PRIMARY]
GO
ALTER TABLE [dbo].[proft] ADD
CONSTRAINT [FK_proft_accnt] FOREIGN KEY ([FIRM_ID],[ACCT_NO])
REFERENCES [dbo].[accnt] ([FIRM_ID],[ACCT_NO])
GO
SELECT
accnt.BRANCHLABEL,
accnt.REP,
proft.DATABASE_DATE,
accnt.REGIONLABEL,
COUNT(*) AS ACCOUNTS,
SUM(proft.HOUSE_REVENUE) AS HOUSE_REVENUE,
SUM(proft.TRADING_REVENUE) AS TRADING_REVENUE,
SUM(proft.TOTAL_EXPENSE) AS TOTAL_EXPENSE,
SUM(proft.CLIENT_PROFITABILITY) AS CLIENT_PROFITABILITY,
SUM(CAST(0 AS money)) AS OTHER_REVENUE
INTO RepMonth
FROM accnt WITH (INDEX (pk_firm_acct))
INNER MERGE JOIN proft WITH (INDEX (FIRMACCT_ind))
ON accnt.FIRM_ID = proft.FIRM_ID AND accnt.ACCT_NO = proft.ACCT_NO
GROUP BY accnt.REGIONLABEL, accnt.BRANCHLABEL, accnt.REP, proft.DATABASE_DAT
E
ORDER BY accnt.BRANCHLABEL, accnt.REP, proft.DATABASE_DATE, accnt.REGIONLABE
L
___
"Andrew J. Kelly" wrote:

> There is no way to turn off logging. Is it the log file that is balloonin
g
> or the data file? If it's the data file (which I suspect is most of it)
> then logging has nothing to do with it anyway. You can pretty much expect
a
> lot of activity in TempDB when you have that much data that you are joinin
g
> and especially grouping by. It has to keep the intermediate results
> somewhere while it groups them and that is tempdb. Maybe if you post the D
DL
> for the tables involved and the actual query someone can suggest something
.
> --
> Andrew J. Kelly SQL MVP
>
> "Consultant Mark" <Consultant Mark@.discussions.microsoft.com> wrote in
> message news:85E3C12A-EB74-4489-916A-7E15FAB04AA4@.microsoft.com...
>
>|||OK well each database including tempdb has at least one data file and one
log file. The data file normally has a .mdf extension and the log file has
a .ldf extension. Which one of these for TempDB is growing to 17GB? A
couple of comments from what I see here. You don't have a primary Key
defined on the Proft table and you don't have a clustered index on that
table either. Both of which are very important. I suggest you drop the
nonclustered index on Firm_ID, ACCT_NO and create a clustered index on
Firm_ID, ACCT_NO instead. That will allow the two tables to be joined in a
true merge fashion using the clustered indexes and should dramatically cut
down on the time to do this. You might even consider making the clustered
indexes on both these tables with ACCT_NO as the first column instead of
Firm_ID. The reason being that this is more selective but this depends a
lot on how you access these tables. Do you look for Acc_No or Firm_ID more?
In either case there is no need to have a non-clustered index on Firm_ID if
the clustered index already has Firm_ID as the first column in the index
expression. Also why does the Order By have the same columns as the group by
but in a different order? It may help to have them the same so the engine
does not potentially have to order twice.
Andrew J. Kelly SQL MVP
"Consultant Mark" <ConsultantMark@.discussions.microsoft.com> wrote in
message news:95D8D214-FFA2-4C49-A6D9-64AE8CDDF645@.microsoft.com...
> Andrew,
> Thanks for your reply. The problem is the tempdb file, not the data file.
> Using table hints to force use of appropriate indexes and a 'merge' join
> hint
> the query now has the same 1hr 50min time but only brought temdb to 9 gig
> (rather than 17gig). Below is the DDL for the table and query. By the
> way
> the accnt table is 1.1 million rows and the proft table is 25 million
> rows.
> ___
> CREATE TABLE [dbo].[accnt] (
> [FIRM_ID] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [ACCT_NO] [char] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [DATABASE_DATE] [smalldatetime] NULL ,
> [REP] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [BRANCHLABEL] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [REGIONLABEL] [char] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [BAL_OTHR] [money] NOT NULL ,
> [CNT_OTHR] [int] NOT NULL
> ) ON [PRIMARY]
> GO
> CREATE TABLE [dbo].[proft] (
> [FIRM_ID] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [ACCT_NO] [char] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [DATABASE_DATE] [smalldatetime] NULL ,
> [HOUSE_REVENUE] [money] NOT NULL ,
> [TRADING_REVENUE] [money] NOT NULL ,
> [TOTAL_EXPENSE] [money] NOT NULL ,
> [CLIENT_PROFITABILITY] [money] NOT NULL ,
> [HasParent] [bit] NOT NULL
> ) ON [PRIMARY]
> GO
>
> ALTER TABLE [dbo].[accnt] WITH NOCHECK ADD
> CONSTRAINT [PK_firm_acct] PRIMARY KEY CLUSTERED
> ([FIRM_ID],[ACCT_NO]) ON [PRIMARY]
> GO
> CREATE INDEX [FIRM_ID_ind] ON [dbo].[accnt]([FIRM_ID]) ON [PRIMARY]
> GO
> CREATE INDEX [ACCT_NO_ind] ON [dbo].[accnt]([ACCT_NO]) ON [PRIMARY]
> GO
> CREATE INDEX [DATABASE_DATE_ind] ON [dbo].[accnt]([DATABASE_DATE]) ON
> [PRIMARY]
> GO
> CREATE INDEX [REP_ind] ON [dbo].[accnt]([REP]) ON [PRIMARY]
> GO
> CREATE INDEX [BRANCHLABEL_ind] ON [dbo].[accnt]([BRANCHLABEL]) ON
> [PRIMARY]
> GO
> CREATE INDEX [REGIONLABEL_ind] ON [dbo].[accnt]([REGIONLABEL]) ON
> [PRIMARY]
> GO
> ALTER TABLE [dbo].[proft] ADD
> CONSTRAINT [HasParentDefault] DEFAULT (0) FOR [HasParent]
> GO
> CREATE INDEX [HasParent_ind] ON [dbo].[proft]([HasParent]) ON [PRIMARY]
> GO
> CREATE INDEX [FIRM_ID_ind] ON [dbo].[proft]([FIRM_ID]) ON [PRIMARY]
> GO
> CREATE INDEX [ACCT_NO_ind] ON [dbo].[proft]([ACCT_NO]) ON [PRIMARY]
> GO
> CREATE INDEX [FIRMACCT_ind] ON [dbo].[proft]([FIRM_ID], [ACCT_NO]) ON
> [PRIMARY]
> GO
> ALTER TABLE [dbo].[proft] ADD
> CONSTRAINT [FK_proft_accnt] FOREIGN KEY ([FIRM_ID],[ACCT_NO])
> REFERENCES [dbo].[accnt] ([FIRM_ID],[ACCT_NO])
> GO
> SELECT
> accnt.BRANCHLABEL,
> accnt.REP,
> proft.DATABASE_DATE,
> accnt.REGIONLABEL,
> COUNT(*) AS ACCOUNTS,
> SUM(proft.HOUSE_REVENUE) AS HOUSE_REVENUE,
> SUM(proft.TRADING_REVENUE) AS TRADING_REVENUE,
> SUM(proft.TOTAL_EXPENSE) AS TOTAL_EXPENSE,
> SUM(proft.CLIENT_PROFITABILITY) AS CLIENT_PROFITABILITY,
> SUM(CAST(0 AS money)) AS OTHER_REVENUE
> INTO RepMonth
> FROM accnt WITH (INDEX (pk_firm_acct))
> INNER MERGE JOIN proft WITH (INDEX (FIRMACCT_ind))
> ON accnt.FIRM_ID = proft.FIRM_ID AND accnt.ACCT_NO = proft.ACCT_NO
> GROUP BY accnt.REGIONLABEL, accnt.BRANCHLABEL, accnt.REP,
> proft.DATABASE_DATE
> ORDER BY accnt.BRANCHLABEL, accnt.REP, proft.DATABASE_DATE,
> accnt.REGIONLABEL
> ___
> "Andrew J. Kelly" wrote:
>|||Andrew,
Thanks. Not quite sure why the order and group by statements have different
field order, but that's easy to fix.
The mdf for tempdb is the one growing.
We will try variations of your good suggestions. Firm_id is baggage, part
of the account number and in this implementation of the database always
containing the same value, so we will flip that around.
Monthly we use DTS to add 1 million records to proft and drop and replace
all rows in the accnt table. Because the proft is so large I drop and
rebuild all indexes (loading the data in between), so if I have a primary ke
y
I have to deal with all the dependencies, constraints, etc. Proft by the wa
y
is unique on acct_no, firm_id, database_date, so its primary key would need
all three but this field order should be ok.
With the foreign key constraint connecting proft into accnt I assumed the db
engine would recognize what is supposed to be happening.
"Andrew J. Kelly" wrote:

> OK well each database including tempdb has at least one data file and one
> log file. The data file normally has a .mdf extension and the log file ha
s
> a .ldf extension. Which one of these for TempDB is growing to 17GB? A
> couple of comments from what I see here. You don't have a primary Key
> defined on the Proft table and you don't have a clustered index on that
> table either. Both of which are very important. I suggest you drop the
> nonclustered index on Firm_ID, ACCT_NO and create a clustered index on
> Firm_ID, ACCT_NO instead. That will allow the two tables to be joined in
a
> true merge fashion using the clustered indexes and should dramatically cu
t
> down on the time to do this. You might even consider making the clustered
> indexes on both these tables with ACCT_NO as the first column instead of
> Firm_ID. The reason being that this is more selective but this depends a
> lot on how you access these tables. Do you look for Acc_No or Firm_ID mor
e?
> In either case there is no need to have a non-clustered index on Firm_ID i
f
> the clustered index already has Firm_ID as the first column in the index
> expression. Also why does the Order By have the same columns as the group
by
> but in a different order? It may help to have them the same so the engine
> does not potentially have to order twice.
> --
> Andrew J. Kelly SQL MVP
>
> "Consultant Mark" <ConsultantMark@.discussions.microsoft.com> wrote in
> message news:95D8D214-FFA2-4C49-A6D9-64AE8CDDF645@.microsoft.com...
>
>|||If the data file keeps growing back to that size then you should leave it
there. Growing a data file is very expensive. There is no penalty for too
much free space but a big one for too little. If Firm_ID is always the same
value I would argue that you don't have it in the Clustered index at all.
The column(s) of the clustered index are appended to the end of all the
nonclustered indexes so you are propagating 4 bytes per row * every
nonclustered index. Let us know how it works out.
Andrew J. Kelly SQL MVP
"Consultant Mark" <ConsultantMark@.discussions.microsoft.com> wrote in
message news:36502EBA-6CFA-4F94-92B8-1402769CE178@.microsoft.com...
> Andrew,
> Thanks. Not quite sure why the order and group by statements have
> different
> field order, but that's easy to fix.
> The mdf for tempdb is the one growing.
> We will try variations of your good suggestions. Firm_id is baggage, part
> of the account number and in this implementation of the database always
> containing the same value, so we will flip that around.
> Monthly we use DTS to add 1 million records to proft and drop and replace
> all rows in the accnt table. Because the proft is so large I drop and
> rebuild all indexes (loading the data in between), so if I have a primary
> key
> I have to deal with all the dependencies, constraints, etc. Proft by the
> way
> is unique on acct_no, firm_id, database_date, so its primary key would
> need
> all three but this field order should be ok.
> With the foreign key constraint connecting proft into accnt I assumed the
> db
> engine would recognize what is supposed to be happening.
> "Andrew J. Kelly" wrote:
>

Monday, March 19, 2012

controlling temdb size

No comments:

Post a Comment

Consolidation Query

Blog Archive

About Me