Sql exists copy - sql

I have a mssql stored procedure question for you experts:
I have two tables [table1] which retrieves new entries all the time. I then have another table [table2] which I need to copy the content of [table1] into.
I need to check if some of the the rows already exists in [table2], if it do, just update the Update-timestamp of [table2], otherwise insert it into [table2].
The tables can be rather big, about 100k entries, so which is the fastest way to do this?
It should be noticed that this is a simplified idea, since there is some more datahandling happening when copying new content from [Table1] -> [Table2].
So to sum up:
If a row exist both [Table1] and [Table2] update the timestamp of the row in [Table2] otherwise just insert a new record with the content into [Table1].

If you have SQL Server 2008, it has a MERGE command that can do an insert or update as appropriate.

This works across all versions of SQL Server. MERGE does the same in SQL Server 2008.
UPDATE
table2
SET
timestampcolumn = whatever
WHERE
EXISTS (SELECT *
FROM table1
WHERE
table1.key = table2.key)
INSERT table2 (col1, col2, col3...)
SELECT col1, col2, col3...
FROM table1
WHERE
NOT EXISTS (SELECT *
FROM table2
WHERE
table2.key = table1.key)

Given that this sounds like a ETL Process. Have you considered using SQL Server Integration Services?
If you are planning on export/loading/processing lost of data then this is the way to go in my view. You also have the added advantage of being able to run multiple threads in parallel and more options to tweak your data throughput and server memory utilisation etc.

Related

Importing sql table (t1) from one db (db1)to another (db2) (t2)and then use t2 to update values of a table in db2

I need a query to import data of a table into a different table and different database. Then using the new table I need to update another table in same database
As you tagged SSMS, I'm assuming you're using SQL Server. This answer may apply to other databases, but you'd need to check.
How to refer to other databases
Assuming your database is on the same server, and the table is in the 'dbo' schema, then you can refer to the other database/table as follows
SELECT *
FROM [db1].[dbo].[t1]
If it's on a different server (but has been set up as a linked server) you can do something similar
SELECT *
FROM [servername].[db1].[dbo].[t1]
Making a copy of the data
To save it as a new table within your current database, you could either create a copy of the table's structure in the current database and use an INSERT INTO command, or instead let SQL Server make the table for you by using the 'INTO' clause in the SELECT e.g.,
SELECT *
INTO [db2].[dbo].[t2] -- Or just simply [t2] if you're running this from that database and are happy with [dbo]
FROM [db1].[dbo].[t1]
The above copies the data from the t1 table in db1 to a new table t2 in db2.
You can then use the data in t2 as you normally would use a table.
If you're running this from within d1, you will always need to refer to the tables in d2 properly (and/or alias them) as below.
UPDATE t3
SET xfield = t2.zfield
FROM [db2].[dbo].[t3] AS t3
INNER JOIN [db2].[dbo].[t2] AS t2 ON t3.id = t2.id

SQL request from 2 different databases with same structure

I have 2 really similar databases, with exact same structure (one of them is a backup of the other so some values changed which is why they are similar but not the exact same).
So here is what I would like to do, taking value from database 2 and updating database 1 with it (so it will allow me later to backup some data from a certain user without having to do it all manually or to backup everyone)
UPDATE s
SET
t1.column=t2.column
FROM database1.table1 t1
JOIN database2.table1 t2
WHERE t1.table2='test'
AND t2.table2='test'
I tried something like this but it didn't work, both database are in a same server and "next to each others", also names are different, so I wanted to know if what I try to do is possible or not
If table's name is Test, then try this query for Update (instead of key, place column primary key of table) :
Update database1.dbo.test
Set
database1.dbo.test.Column=t2.Column
From
(Select Column,Key from database2.dbo.test) t2
Join database1.dbo.test t1
On t2.Key=t1.Key
Or
UPDATE pereger.dbo.characters
SET
pereger.dbo.characters.level=t2.level FROM
(SELECT level,characterId FROM
peregercopy.dbo.characters) t2
JOIN pereger.dbo.characters t1 ON
t2.characterID=t1.characterId
WHERE t2.characterId=5

Avoid Duplicates with INSERT INTO TABLE VALUES from csv file

I have a .csv file with 600 million plus rows. I need to upload this into a database. It will have 3 columns assigned as primary keys.
I use pandas to read the file in chunks of 1000 lines.
At each chunk iteration I use the
INSERT INTO db_name.dbo.table_name("col1", "col2", "col3", "col4")
VALUES (?,?,?,?)
cursor.executemany(query, df.values.tolist())
Syntax with pyodbc in python to upload data in chunks of 1000 lines.
Unfortunately, there are apparently some duplicate rows present. When the duplicate row is encountered the uploading stops with an error from SQL Server.
Question: how can I upload data such that whenever a duplicate is encountered instead of stopping it will just skip that line and upload the rest? I found some questions and answers on insert into table from another table, or insert into table from variables declared, but nothing on reading from a file and using insert into table col_names values() command.
Based on those answers one idea might be:
At each iteration of chunks:
Upload to a temp table
Do the insertion from the temp table into the final table
Delete the rows in the temp table
However, with such a large file each second counts, and I was looking for an answer with better efficiency.
I also tried to deal with duplicates using python, however, since the file is too large to fit into the memory I could not find a way to do that.
Question 2: if I were to use bulk insert, how would I achieve to skip over the duplicates?
Thank you
You can try to use a CTE and an INSERT ... SELECT ... WHERE NOT EXISTS.
WITH cte
AS
(
SELECT ? col1,
? col2,
? col3,
? col4
)
INSERT INTO db_name.dbo.table_name
(col1,
col2,
col3,
col4)
SELECT col1,
col2,
col3,
col4
FROM cte
WHERE NOT EXISTS (SELECT *
FROM db_name.dbo.table_name
WHERE table_name.col1 = cte.col1
AND table_name.col2 = cte.col2
AND table_name.col3 = cte.col3
AND table_name.col4 = cte.col4);
Possibly delete some of the table_name.col<n> = cte.col<n>, if the column isn't part of the primary key.
I would always load into a temporary load table first, which doesn't have any unique or PK constraint on those columns. This way you can always see that the whole file has loaded, which is an invaluable check in any ETL work, and for any other easy analysis of the source data.
After that then use an insert such as suggested by an earlier answer, or if you know that the target table is empty then simply
INSERT INTO db_name.dbo.table_name(col1,col2,col3,col4)
SELECT distinct col1,col2,col3,col4 from load_table
The best approach is to use a temporary table and execute a MERGE-INSERT statement. You can do something like this (not tested):
CREATE TABLE #MyTempTable (col1 VARCHAR(50), col2, col3...);
INSERT INTO #MyTempTable(col1, col2, col3, col4)
VALUES (?,?,?,?)
CREATE CLUSTERED INDEX ix_tempCol1 ON #MyTempTable (col1);
MERGE INTO db_name.dbo.table_name AS TARGET
USING #MyTempTable AS SOURCE ON TARGET.COL1 = SOURCE.COL1 AND TARGET.COL2 = SOURCE.COL2 ...
WHEN NOT MATCHED THEN
INSERT(col1, col2, col3, col4)
VALUES(source.col1, source.col2, source.col3, source.col4);
You need to consider the best indexes for your temporary table to make the MERGE faster. With the statement WHEN NOT MATCHED you avoid duplicates depending on the ON clause.
SQL Server Integration Services offers one method that can read data from a source (via a Dataflow task), then remove duplicates using it's Sort control (a checkbox to remove duplicates).
https://www.mssqltips.com/sqlservertip/3036/removing-duplicates-rows-with-ssis-sort-transformation/
Of course the data has to be sorted and 60 million+ rows isn't going to be fast.
If you want to use pure SQL Server then you need a staging table (without a pk constraint). After importing your data into Staging, you would insert into your target table using filtering for the composite PK combination. For example,
Insert into dbo.RealTable (KeyCol1, KeyCol2, KeyCol3, Col4)
Select Col1, Col2, Col3, Col4
from dbo.Staging S
where not exists (Select *
from dbo.RealTable RT
where RT.KeyCol1 = S.Col1
AND RT.KeyCol2 = S.Col2
AND RT.KeyCol3 = S.Col3
)
In theory you could also use the set operator EXCEPT since it takes the distinct values from both tables. For example:
INSERT INTO RealTable
SELECT * FROM Staging
EXCEPT
SELECT * FROM RealTable
Would insert distinct rows from Staging into RealTable (that don't already exist in RealTable). This method doesn't take into account the composite PK using different values on multiple rows- so an insert error would indicate different values are being assigned to the same PK composite key in the csv.

How to manually move rows from one database to the next?

I have two databases on the same server. One db is newer than the other and has had its schema modified quite a bit. I want to transfer data from a table in the old db to a table in the new db but I need total control over the process so I can mold the old data to fit the new schema.
[NewDB].[dbo].[Aliases]
[OldDB].[Terminal].[Alias]
I'm not very adept at SQL yet. Is there a way I can loop through all the records in the old table and then on each iteration of the loop craft a custom insert statement for the new table?
Try an INSERT SELECT statement.
INSERT INTO [NewDB].[dbo].[Aliases]
SELECT columns
FROM [OldDB].[Terminal].[Alias]
http://msdn.microsoft.com/en-us/library/ms174335(v=sql.100).aspx - derived table section
http://msdn.microsoft.com/en-us/library/ms189499(v=sql.100).aspx
Do it the same way as you would if both tables were in the same database. Just fully qualify the names of the tables. Example:
INSERT INTO [NewDB].[dbo].[Aliases] (col1, col2, col3)
SELECT LEFT(col1,3), col2, col3 FROM [OldDB].[Terminal].[Alias]
Depending on the data size you wish to transfer, you might want to consider using BULK INSERT: http://msdn.microsoft.com/en-us/library/ms188365.aspx
INSERT INTO NEWDB..TABLENAME( *fieldlist* )
SELECT *fieldlist* FROM OLDDB..TABLENAME

SQL Server Generate Script To Fill Tables With Data From Other Database?

Let's say I have two databases with identical tables, but one database's tables contains data while the other doesn't. Is there a way in SQL Server to generate a script to fill the empty tables with data from the full tables?
If the tables are identical and don't use an IDENTITY column, it is quite easy.
You would do something like this:
INSERT INTO TableB
SELECT * FROM TableA
Again, only for identical table structures, otherwise you have to change the SELECT * to the correct columns and perform any conversions that are necessary.
And, to add to the #WilliamD answer, if there is an IDENTITY column you can use a variation of the INSERT statement.
Assuming you have two columns (Col1 and Col2, with Col1 having IDENTITY property) in the tables, you can do the following:
SET IDENTITY_INSERT TableB ON
INSERT INTO TableB (col1, col2)
SELECT col1, col2 FROM TableA
SET IDENTITY_INSERT TableB OFF
It's necessary to list the columns in this situation.