Consolidation Query: Constraints?

I am importing data from a .csv. I have noticed that it copies this
information over and over when I import the data. I only want it to update
anything that is different and add any new lines in the .csv file. I have
looked at setting up a primary key but this won't work.
ProjectID, Phase, Unit,Tract, Release, UnitPlan, UnitOpt
As long as one of these fields is different I want it to allow it to be
entered. My problem is some of them may be null. Which is ok.
KB5IR,10,405,,,4516,,
KB5IR,10,406,,4516,REV,
I am not sure how to go about doing this!
Message posted via webservertalk.com
http://www.webservertalk.com/Uwe/Forum...amming/200603/1brenenger via webservertalk.com wrote:
> I am importing data from a .csv. I have noticed that it copies this
> information over and over when I import the data. I only want it to update
> anything that is different and add any new lines in the .csv file. I have
> looked at setting up a primary key but this won't work.
> ProjectID, Phase, Unit,Tract, Release, UnitPlan, UnitOpt
> As long as one of these fields is different I want it to allow it to be
> entered. My problem is some of them may be null. Which is ok.
> KB5IR,10,405,,,4516,,
> KB5IR,10,406,,4516,REV,
> I am not sure how to go about doing this!
> --
> Message posted via webservertalk.com
> http://www.webservertalk.com/Uwe/Forum...amming/200603/1
Do you have an existing data model or are you designing one? Don't make
the mistake of trying to mimic a file in a table. Relational design
principles and file storage are not the same!
Assuming you have or create a suitably normalized data model there are
basically 2 possible approaches:
1. Transform the file as you load it to the normalized schema (using
DTS or Integration Services or some other tool for example)
2. Load the file as-is to an intermediate "staging" table and then
transform the data to your real data model using SQL. The staging table
isn't used for anything other than the load process and the data is
usually deleted some time afterwards.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Here are a couple of options to look at-
use of a unique index with the ignore dupe option (it will just not enter
the data for duplicate insert attempts) . This would not work with the
updates you mention however. Another option would be to import into a load
table then use a procedure containing the merge logic to update/insert as
appropriate.
HTH
--Tony
"brenenger via webservertalk.com" wrote:

> I am importing data from a .csv. I have noticed that it copies this
> information over and over when I import the data. I only want it to update
> anything that is different and add any new lines in the .csv file. I have
> looked at setting up a primary key but this won't work.
> ProjectID, Phase, Unit,Tract, Release, UnitPlan, UnitOpt
> As long as one of these fields is different I want it to allow it to be
> entered. My problem is some of them may be null. Which is ok.
> KB5IR,10,405,,,4516,,
> KB5IR,10,406,,4516,REV,
> I am not sure how to go about doing this!
> --
> Message posted via webservertalk.com
> http://www.webservertalk.com/Uwe/Forum...amming/200603/1
>|||Hi everyone. I am sorry about this. I am VERY new to SQL and I am having a
hard time trying to figure out what to do. I got Access configured to link t
o
SQL Server and I think I'll be able to get the rest of this figured out. My
only problem now is the import. I need to UPDATE/INSERT the .csv information
.
Here is the transformation that I got help creating.
Function Main()
IF DTSSource("Col001") <>0 Then
' add a 0 to the front, and keep only the 6 right characters.
'21706 -> 021706 -> 021706
'100106 -> 0100106 -> 100106
str = DTSSource("Col001")
iMo = CInt( Mid( str, 1, 1) )
iDay = CInt( Mid( str, 2, 2) )
iYear = CInt( Mid( str, 4, 2) )
DTSDestination("DelDate") = DateSerial( iYear, iMo, iDay)
Else
'This will not return a value for the date
'which means it will be NULL if the col001 = "0"
'If you want to specify a date then uncomment the next line
'DTSDestination("DelDate") = "19000101"
End If
DTSDestination("ProjectID") = DTSSource("Col002")
DTSDestination("Phase") = DTSSource("Col003")
DTSDestination("Unit") = DTSSource("Col004")
If DTSSource("Col005") = Null Then
DTSDestination ("Tract") = " "
Else
End If
If DTSSource("Col006") = Null Then
DTSDestination ("Release") = " "
Else
End IF
DTSDestination("UnitPlan") = DTSSource("Col007")
DTSDestination("UnitOpt") = DTSSource("Col008")
DTSDestination("POComp") = DTSSource("Col009")
DTSDestination("PrjFrm") = DTSSource("Col010")
DTSDestination("OrderNo") = DTSSource("Col011")
DTSDestination("OrderStat") = DTSSource("Col012")
DTSDestination("Boxes") = DTSSource("Col013")
Main = DTSTransformStat_OK
End Function
I cannot create a PK because I can have null values. As long as ProjectID,
Phase, Unit, Tract, Release, UnitPlan, UnitOpt are unique all together it
should be ok. Here is an example of what my .csv looks like. I need to be
able to import this periodically. I would like for it to update anything tha
t
has changed and insert anything that is missing. The way it is now, it just
adds the import to the table. So I could have this stuff listed over and ove
r.
Which I don't want.
21706 KBMP NEW 200 2031 Y Y 64149 SCHED 17
21706 KBMP NEW 201 2031 Y Y 64150 SCHED 8
21706 KBMP NEW 201 2031 OPT04 Y Y 64151 SCHED 13
21706 KBMP NEW 201 2031 OPT136 Y Y 64151 SCHED 0
21706 KBMP NEW 201 2031 OPT142 Y Y 64151 SCHED 0
21706 KBMP NEW 201 2031 OPT143 Y Y 64151 SCHED 0
21706 KBMP NEW 201 2031 OPT144 Y Y 64151 SCHED 0
Message posted via webservertalk.com
http://www.webservertalk.com/Uwe/Forum...amming/200603/1|||brenenger via webservertalk.com wrote:
> Hi everyone. I am sorry about this. I am VERY new to SQL and I am having a
> hard time trying to figure out what to do. I got Access configured to link
to
> SQL Server and I think I'll be able to get the rest of this figured out. M
y
> only problem now is the import. I need to UPDATE/INSERT the .csv informati
on.
> Here is the transformation that I got help creating.
> Function Main()
> IF DTSSource("Col001") <>0 Then
> ' add a 0 to the front, and keep only the 6 right characters.
> '21706 -> 021706 -> 021706
> '100106 -> 0100106 -> 100106
> str = DTSSource("Col001")
> iMo = CInt( Mid( str, 1, 1) )
> iDay = CInt( Mid( str, 2, 2) )
> iYear = CInt( Mid( str, 4, 2) )
> DTSDestination("DelDate") = DateSerial( iYear, iMo, iDay)
> Else
> 'This will not return a value for the date
> 'which means it will be NULL if the col001 = "0"
> 'If you want to specify a date then uncomment the next line
> 'DTSDestination("DelDate") = "19000101"
> End If
>
> DTSDestination("ProjectID") = DTSSource("Col002")
> DTSDestination("Phase") = DTSSource("Col003")
> DTSDestination("Unit") = DTSSource("Col004")
> If DTSSource("Col005") = Null Then
> DTSDestination ("Tract") = " "
> Else
> End If
> If DTSSource("Col006") = Null Then
> DTSDestination ("Release") = " "
> Else
> End IF
> DTSDestination("UnitPlan") = DTSSource("Col007")
> DTSDestination("UnitOpt") = DTSSource("Col008")
> DTSDestination("POComp") = DTSSource("Col009")
> DTSDestination("PrjFrm") = DTSSource("Col010")
> DTSDestination("OrderNo") = DTSSource("Col011")
> DTSDestination("OrderStat") = DTSSource("Col012")
> DTSDestination("Boxes") = DTSSource("Col013")
> Main = DTSTransformStat_OK
> End Function
> I cannot create a PK because I can have null values. As long as ProjectID,
> Phase, Unit, Tract, Release, UnitPlan, UnitOpt are unique all together it
> should be ok. Here is an example of what my .csv looks like. I need to be
> able to import this periodically. I would like for it to update anything t
hat
> has changed and insert anything that is missing. The way it is now, it jus
t
> adds the import to the table. So I could have this stuff listed over and o
ver.
> Which I don't want.
>
> 21706 KBMP NEW 200 2031 Y Y 64149 SCHED 17
> 21706 KBMP NEW 201 2031 Y Y 64150 SCHED 8
> 21706 KBMP NEW 201 2031 OPT04 Y Y 64151 SCHED 13
> 21706 KBMP NEW 201 2031 OPT136 Y Y 64151 SCHED 0
> 21706 KBMP NEW 201 2031 OPT142 Y Y 64151 SCHED 0
> 21706 KBMP NEW 201 2031 OPT143 Y Y 64151 SCHED 0
> 21706 KBMP NEW 201 2031 OPT144 Y Y 64151 SCHED 0
> --
> Message posted via webservertalk.com
> http://www.webservertalk.com/Uwe/Forum...amming/200603/1
I think the easiest approach will be to load this data to a temporary
staging table that matches the file structure. Then use UPDATE and
INSERT statements to load the data into actual table(s) in your
database. In the staging table you can use a file name, date and/or row
number as the key.
You will find it VERY hard and maybe even impossible to maintain the
integrity of the changing data unless you first implement the correct
data model with keys in each table. I can't help you to do that just
based on a list of column names and a snapshot of your data file. The
data model should be based on your business rules and knowledge of your
business environment. There is little point in basing your data model
on the format that happens to have been used in this file.
If you don't already know about relational design principles like
Normalization and the normal forms then you should master those
concepts before you attempt a final design.
On the other hand if you don't care about design right now and just
want to see some data in a table then you could create a unique index
on all columns using the IGNORE_DUP_KEY option. That will eliminate any
duplicates but won't help you any further than that:
CREATE UNIQUE NONCLUSTERED INDEX idx_tbl
ON tbl (col1, col2, col3)
WITH IGNORE_DUP_KEY ;
IMPORTANT: I do NOT recommend this for a live production environment.
Hope this helps.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Ok. I am importing a .csv file in it's own table. I want this table to be
overwritten each time it is updated. I am using DTS to import. Once this
table is populated, I need to run a Stored Procedure that will look at that
table and tblUnitImport. This is the code I have so far, I am not sure if I
am doing this right. If the row is not listed in tblUnitImport then I need t
o
INSERT it, otherwise I need to update the DELDATE and/or ORDERSTAT field if
it is different in m#ds001o1. Am I on the right track?
BEGIN INSERT INTO tblUnitImport
(Deldate, ProjectID, Phase, Unit, Tract,
Release, UnitPlan, UnitOpt, POComp, PrjFrm, OrderNo, OrderStat, Boxes)
SELECT Col001, Col002, Col003,Col004, Col005, Col006,
Col007, Col008, Col009, Col010, Col011, Col012, Col013
FROM m#ds001o1 AS M
WHERE NOT EXISTS
(SELECT *
FROM tblUnitImport
WHERE tblUnitImport.
ProjectId = m#ds001o1.Col002
tblUnitImport.
Phase = m#ds001o1.Col003
tblUnitImport.
Unit = m#ds001o1.Col004
tblUnitImport.
Tract = m#ds001o1.Col005
tblUnitImport.
Release = m#ds001o1.Col006
tblUnitImport.
UnitPlan = m#ds001o1.Col007
tblUnitImport.
UnitOpt = m#ds001o1.Col008);
UPDATE tblUnitImport
SET col1 =
(SELECT
col001, col012
FROM
m#ds001o1
WHERE
tblUnitImport.ProjectId = m#ds001o1.Col002
tblUnitImport.Phase = m#ds001o1.Col003
tblUnitImport.Unit = m#ds001o1.Col004
tblUnitImport.Tract = m#ds001o1.Col005
tblUnitImport.Release = m#ds001o1.Col006
tblUnitImport.UnitPlan = m#ds001o1.Col007
tblUnitImport.UnitOpt = m#ds001o1.Col008)
WHERE EXISTS
(SELECT
*
FROM
Foobar
WHERE
Foobar.keycol = Merge_table.keycol); END;
Message posted via http://www.webservertalk.com

Tuesday, February 14, 2012

Constraints?

No comments:

Post a Comment

Consolidation Query

Blog Archive

About Me