where there's a primary key constraint violation?
I've got a case where daily I insert >500k rows of daily data, where
the data is date and time stamped. So, for example, I have an insert
statement with constraint: WHERE date >= '5/20/05' AND date <
'5/21/05'. That takes care of one day's data (5/20).
However, the next day's data (5/21) will still have some time stamps
from the previous day. Therefore the statement needs to be something
like WHERE date >= '5/20/05' AND date <= '5/21/05'. The 5/20 data is
already loaded but I need to take the 5/21 data which just happens to
contain just a few rows of data marked 5/20 and insert it without
generating a primary key error from all the other 5/20 rows that are
already inserted.
-DaveINSERT INTO TargetTable (key_col, ...)
SELECT S.key_col, ...
FROM SourceTable AS S
LEFT JOIN TargetTable AS T
ON S.key_col = T.key_col
WHERE T.key_col IS NULL
AND S.date >= '20050520' AND date < '20050522'
--
David Portas
SQL Server MVP
--|||The easy way is to limit the insert query to 11:59 59 of the previous
day. Then you tell your users, "this report contains all the data from
yesterday" In fact, if you're doing a report of some kind, this is
really the best way to do it because otherwise, you have incomplete
(and therefore bad) data for the current day.
Another way is to delete yesterday's data right before you run the
insert.|||Should the join run very slowly? If I do the insert with a standard
insert query it takes about 7 minutes. With the join query it runs and
doesn't seem to be able to finish. If I run the query on dates with no
data it finishes ok. Is my join incorrect since I can't use S.keyrow?
insert into final(keyRow, cell, recordDate, high_set )
SELECT CONVERT(CHAR(16),dateadd(hh,datepart(hh, .access_time),
S.record_date),20)+'|'+CONVERT(CHAR(3), S.bts_id)+'-'+CONVERT(CHAR(1),
S.sector_id)+'-'+CONVERT(CHAR(3), S.carrier_id) AS keyRow,
(CONVERT(CHAR(3), S.bts_id)+'-'+CONVERT(CHAR(1),
S.sector_id)+'-'+CONVERT(CHAR(3), S.carrier_id)) AS cell,
CONVERT(CHAR(16),dateadd(hh,datepart(hh, S.access_time),
S.record_date),20)as recordDate, SUM(S.high_set_int) AS high_set
from SourceTable AS S
LEFT JOIN TargetTable AS T
ON keyRow = T.keyRow
WHERE T.keyRow IS NULL
AND S.record_date >= '5/06/2005' AND S.record_date < '5/07/2005' AND
convert (char(8), S.access_time,108) != '00:00:00'
GROUP BY CONVERT(CHAR(16),dateadd(hh,datepart(hh, S.access_time),
S.record_date),20),
CONVERT(CHAR(16),dateadd(hh,datepart(hh, S.access_time),
S.record_date),20)+'|'+
CONVERT(CHAR(3), S.bts_id)+'-'+CONVERT(CHAR(1),
S.sector_id)+'-'+CONVERT(CHAR(3), S.carrier_id)
ORDER BY CONVERT(CHAR(16),dateadd(hh,datepart(hh, S.access_time),
S.record_date),20),
CONVERT(CHAR(16),dateadd(hh,datepart(hh, S.access_time),
S.record_date),20)+'|'+CONVERT(CHAR(3), S.bts_id)+'-'+
CONVERT(CHAR(1), S.sector_id)+'-'+CONVERT(CHAR(3), S.carrier_id),
S.cell|||
christopher.secord@.gmail.com wrote:
> The easy way is to limit the insert query to 11:59 59 of the previous
> day. Then you tell your users, "this report contains all the data from
> yesterday" In fact, if you're doing a report of some kind, this is
> really the best way to do it because otherwise, you have incomplete
> (and therefore bad) data for the current day.
Yes, I agree but the way the data is generated results in "today's"
data flat file containing some of yesterday's data. So although 99% of
yesterday's data is already in the db, the last little bit needs be
added for completeness. It's not that the nearly all users can't use
the 99% data for their purposes but still the missing 1% needs to be
added for later complete, accurate reports.
> Another way is to delete yesterday's data right before you run the
> insert.
This puts the problem back 1 day because I would still need to add
yesterday's data which is in its own flat file which contains data from
the day before yesterday.
-David|||David Portas wrote:
> INSERT INTO TargetTable (key_col, ...)
> SELECT S.key_col, ...
> FROM SourceTable AS S
> LEFT JOIN TargetTable AS T
> ON S.key_col = T.key_col
> WHERE T.key_col IS NULL
> AND S.date >= '20050520' AND date < '20050522'
I'm thinking maybe the best thing to do is add another column to my
table that uniquely identifies the data from a particular day. Some of
the data from the particular flat file will be from the day before but
it won't matter because I'll use the new field in the where criteria
instead of the actual record dates.
Also thought about using NOT EXISTS somehow.
-Dave|||Make sure you have indexes on the columns that are being joined.|||(wireless200@.yahoo.com) writes:
> Should the join run very slowly? If I do the insert with a standard
> insert query it takes about 7 minutes. With the join query it runs and
> doesn't seem to be able to finish. If I run the query on dates with no
> data it finishes ok. Is my join incorrect since I can't use S.keyrow?
I don't understand that last question. What do you mean, you cannot
use S.keyrow?
A clustered index on S.record_date would be a good thing.
I would also replace the LEFT JOIN with NOT EXISTS. Not because this
is faster, but because expresses what you mean.
Does the target table have an IDENTITY column? Else there is no reason at
all to have the ORDER BY clause. Removing that could also gain some
performance.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
No comments:
Post a Comment