I have 2 tables. The following are just a stripped down version of these tables.
TableA
Id <pk> incrementing
Name varchar(50)
TableB
TableAId <pk> non incrementing
Name varchar(50)
Now these tables have a relationship to each other.
Scenario
User 1 comes to my site and does some actions(in this case adds rows to Table A). So I use a SqlBulkCopy all this data in Table A.
However I need to add the data also to Table B but I don't know the newly created Id's from Table A as SQLBulkCopy won't return these.
So I am thinking of having a stored procedure that finds all the id's that don't exist in Table B and then insert them in.
INSERT INTO TableB (TableAId , Name)
SELECT Id,Name FROM TableA as tableA
WHERE not exists( ...)
However this comes with a problem. A user at any time can delete something from TableB so if a user deletes say a row and then another user comes around or even the same user comes around and does something to Table A my stored procedure will bring back that deleted row in Table B. Since it will still exist in Table A but not Table B and thus satisfy the stored procedure condition.
So is there a better way of dealing with two tables that need to be updated when using bulk insert?
SQLBulkCopy complicates this so I'd consider using a staging table and an OUTPUT clause
Example, in a mixture of client pseudo code and SQL
create SQLConnection
Create #temptable
Bulkcopy to #temptable
Call proc on same SQLConnection
proc:
INSERT tableA (..)
OUTPUT INSERTED.key, .. INTO TableB
SELECT .. FROM #temptable
close connection
Notes:
temptable will be local to the connection and be isolated
the writes to A and B will be atomic
overlapping or later writes don't care about what happens later to A and B
emphasising the last point, A and B will only ever be populated from the set of rows in #temptable
Alternative:
Add another column to A and B called sessionid and use that to identify row batches.
One option would be to use SQL Servers output clause:
INSERT YourTable (name)
OUTPUT INSERTED.*
VALUES ('NewName')
This will return the id, name of the inserted rows to the client, so you can use them in the insert operation for the second table.
Just as an alternative solution you could use database triggers to update the second table.
Related
I have 4 tables:
Table A:
LogID (unique identifier),
UserID (bigint),
LogDate (date/time),
LogEventID (int),
IPID (varchar(36)),
UserAgentID (varchar(36))
Table B:
IPID (unique identifier),
IPAddress (varchar(255))
Table C:
UserAgentID (unique identifier),
UserAgent (varchar(255))
Table D:
LogEventID (int),
LogEvent (varchar(255))
I am trying to write the to Table A but need to check Table B, Table C and Table D contain data so I can link them. If they don’t contain any data, I would need to create some. Some of the tables may contain data, sometimes none of them may.
Pretty much everything, really struggling
first, you do a insert into table B, C, D WHERE NOT EXISTS
example
INSERT INTO TableB (IPID, IPAddress)
SELECT #IPPD, #IPAddress
WHERE NOT EXISTS
(
SELECT *
FROM TableB x
WHERE x.IPID = #IPID
)
then you insert into table A
INSERT INTO TableA ( . . . )
SELECT . . .
SQL Server doesn't let you modify multiple tables in a single statement, so you cannot do this with a single statement.
What can you do?
You can wrap the multiple statements in a single transaction, if your goal is to modify the database only once.
You can write the multiple statements in stored procedure.
What you really probably want is a view with insert triggers on the view. You can define a view that is the join of the tables with the values from the reference tables. An insert trigger can then check if the values exist and replace them with the appropriate ids. Or, insert into the appropriate table.
The third option does exactly what you want. I find that it is a bit of trouble to maintain triggers, so for an application, I would prefer wrapping the logic in a stored procedure.
I currently have a stored procedure that performs bulk insert into a table named "TomorrowPatients" from a .csv file. When performing the bulk insert, I need to determine if the record being added already exists within the table and if so DO NOT add the record. If the record does not exist then I need to APPEND it to the table. What is the most efficient way to go about this? Any help will be greatly appreciated.
EDIT: I have created a temp table called "TomorrowPatients_Temp". I am trying to use this table to determine which records to insert.
Insert your the whole data into a temporary table, say #TempData.Then use the following code :
INSERT INTO TommorowPatients
SELECT * FROM #TempTable TT
LEFT JOIN TommorowPatients TP ON TT.PatientId = TP.PatienId
AND TT.PatientName = TP.PatientName
AND TT.PatientSSN = TP.PatientSSN
WHERE TP.PatientId IS NULL
Where PatientId is you primary key for the TommorowPatients table.
DO NOT add the "RoomNumber" column with the LEFT JOIN like : TT.RoomNo = TP.RoomNo. This way even if the room number changes, new data won't be inserted as we have joined only based on patient specific data.
I have a table that looks like the below table:
Every time the user loan a book a new record is inserted.
The data in this table is derived or taken from another table which has no dates.
I need to update this tables based on the records in the other table: Meaning I only need to update this table based on what changes.
Example: Lets say the user return the book Starship Troopers and the book return is indicated to Yes.
How do I update just that column?
What I have tried:
I tried using the MERGE Statement but it works only with unique rows of data, meaning you get an error if the same ID appears more than once.
I also tried using a basic UPDATE Statement and a JOIN but that's not going well.
I am asking because I have ran out of ideas.
Thanks for reading
If you need to update BooksReturn in target table based on the same column in source table
UPDATE t
SET t.booksreturn = s.booksreturn
FROM target t JOIN source s
ON t.userid = s.userid
AND t.booksloaned = s.booksloaned
Here is SQLFiddle demo
You can do this by simple Update & Insert statement.....
Two table A & B
From B you want to insert data into A if not exists other wise Update that data....
,First Insert into temp table....
SELECT *
INTO #MYTEMP
FROM B
WHERE BOOKSLOANED NOT IN (SELECT BOOKSLOANED
FROM A)
,Second Check data and insert into A.
INSERT INTO A
SELECT *
FROM #MYTEMP
And at last write one simple update statement which update all data of A. If any change then it also reflect to that data otherwise data as it is.
You can also update from #MYTEMP table.
Sybase db tables do not have a concept of self updating row numbers. However , for one of the modules , I require the presence of rownumber corresponding to each row in the database such that max(Column) would always tell me the number of rows in the table.
I thought I'll introduce an int column and keep updating this column to keep track of the row number. However I'm having problems in updating this column in case of deletes. What sql should I use in delete trigger to update this column?
You can easily assign a unique number to each row by using an identity column. The identity can be a numeric or an integer (in ASE12+).
This will almost do what you require. There are certain circumstances in which you will get a gap in the identity sequence. (These are called "identity gaps", the best discussion on them is here). Also deletes will cause gaps in the sequence as you've identified.
Why do you need to use max(col) to get the number of rows in the table, when you could just use count(*)? If you're trying to get the last row from the table, then you can do
select * from table where column = (select max(column) from table).
Regarding the delete trigger to update a manually managed column, I think this would be a potential source of deadlocks, and many performance issues. Imagine you have 1 million rows in your table, and you delete row 1, that's 999999 rows you now have to update to subtract 1 from the id.
Delete trigger
CREATE TRIGGER tigger ON myTable FOR DELETE
AS
update myTable
set id = id - (select count(*) from deleted d where d.id < t.id)
from myTable t
To avoid locking problems
You could add an extra table (which joins to your primary table) like this:
CREATE TABLE rowCounter
(id int, -- foreign key to main table
rownum int)
... and use the rownum field from this table.
If you put the delete trigger on this table then you would hugely reduce the potential for locking problems.
Approximate solution?
Does the table need to keep its rownumbers up to date all the time?
If not, you could have a job which runs every minute or so, which checks for gaps in the rownum, and does an update.
Question: do the rownumbers have to reflect the order in which rows were inserted?
If not, you could do far fewer updates, but only updating the most recent rows, "moving" them into gaps.
Leave a comment if you would like me to post any SQL for these ideas.
I'm not sure why you would want to do this. You could experiment with using temporary tables and "select into" with an Identity column like below.
create table test
(
col1 int,
col2 varchar(3)
)
insert into test values (100, "abc")
insert into test values (111, "def")
insert into test values (222, "ghi")
insert into test values (300, "jkl")
insert into test values (400, "mno")
select rank = identity(10), col1 into #t1 from Test
select * from #t1
delete from test where col2="ghi"
select rank = identity(10), col1 into #t2 from Test
select * from #t2
drop table test
drop table #t1
drop table #t2
This would give you a dynamic id (of sorts)
I have a web application that uses a fairly large table (millions of rows, about 30 columns). Let's call that TableA. Among the 30 columns, this table has a primary key named "id", and another column named "campaignID".
As part of the application, users are able to upload new sets of data pertaining to new "campaigns".
These data sets have the same structure as TableA, but typically only about 10,000-20,000 rows.
Every row in a new data set will have a unique "id", but they'll all share the same campaignID. In other words, the user is loading the complete data for a new "campaign", so all 10,000 rows have the same "campaignID".
Usually, users are uploading data for a NEW campaign, so there are no rows in TableA with the same campaignID. Since the "id" is unique to each campaign, the id of every row of new data will be unique in TableA.
However, in the rare case where a user tries to load a new set of rows for a "campaign" that's already in the database, the requirement was to remove all the old rows for that campaign from TableA first, and then insert the new rows from the new data set.
So, my stored procedure was simple:
BULK INSERT the new data into a temporary table (#tableB)
Delete any existing rows in TableA with the same campaignID
INSERT INTO Table A ([columns]) SELECT [columns] from #TableB
Drop #TableB
This worked just fine.
But the new requirement is to give users 3 options when they upload new data for handling "duplicates" - instances where the user is uploading data for a campaign that's already in TableA.
Remove ALL data in TableA with the same campaignID, then insert all the new data from #TableB. (This is the old behavior. With this option, they'll never be duplicates.)
If a row in #TableB has the same id as a row in TableA, then update that row in TableA with the row from #TableB (Effectively, this is "replacing" the old data with the new data)
If a row in #TableB has the same id as a row in TableA, then ignore that row in #TableB (Essentially, this is preserving the original data, and ignoring the new data).
A user doesn't get to choose this on a row-by-row basis. She chooses how the data will be merged, and this logic is applied to the entire data set.
In a similar application I worked on that used MySQL, I used the "LOAD DATA INFILE" function, with the "REPLACE" or "IGNORE" option. But I don't know how to do this with SQL Server/T-SQL.
Any solution needs to be efficient enough to handle the fact that TableA has millions of rows, and #TableB (the new data set) may have 10k-20k rows.
I googled for something like a "Merge" command (something that seems to be supported for SQL Server 2008), but I only have access to SQL Server 2005.
In rough pseudocode, I need something like this:
If user selects option 1:
[I'm all set here - I have this working]
If user selects option 2 (replace):
merge into TableA as Target
using #TableB as Source
on TableA.id=#TableB.id
when matched then
update row in TableA with row from #TableB
when not matched then
insert row from #TableB into TableA
If user selects option 3 (preserve):
merge into TableA as Target
using #TableB as Source
on TableA.id=#TableB.id
when matched then
do nothing
when not matched then
insert row from #TableB into TableA
How about this?
option 2:
begin tran;
delete from tablea where exists (select 1 from tableb where tablea.id=tableb.id);
insert into tablea select * from tableb;
commit tran;
option 3:
begin tran;
delete from tableb where exists (select 1 from tablea where tablea.id=tableb.id);
insert into tablea select * from tableb;
commit tran;
As for performance, so long as the id field(s) in tablea (the big table) are indexed, you should be fine.
Why are you using Upserts when he claims he wanted a MERGE? MAREG in SQL 2008 is faster and more efficient.
I would let the merge handle the differences.