I am connecting two databases for data migration. I want to check whether a record from the table of the first database exists in the second database.
I.e. from the source database user table I want to migrate data to destination database user table.
How to write query using if not exists?
insert into myTable
select * from myOldTable ot
where NOT EXISTS (select 1 from mytable t where t.ID = ot.ID)
You might be better writing it as a join
insert into myTable
select ot.*
from myOldTable ot
LEFT JOIN mtTable t
ON ot.ID = t.ID
WHERE t.ID IS NULL
or depending on your database, a merge might be better, there are lots of options
I find the following syntax easiest to read:
insert TargetTable
(col1, col2)
from SourceTable as source
where not exists
(
select *
from TargetTable as duplicate
where source.col1 = duplicate.col1
and source.col2 = duplicate.col2
)
Normally you don't have to worry about concurrency during a data migration. If you do, you can specify locking hints like with (tablock) or a higher transaction isolation level. Or you can use merge as suggested, but that has a rather convoluted syntax.
SQL2003 defines MERGE, otherwise you can do an an INSERT INTO ... SELECT and in the SELECT you should LEFT JOIN in the destination table using the natural key in the ON predicate and then just chuck in a WHERE <column> IS NULL.
select * from db1.schema1.table1
intersect
select * from db2.schema2.table2
Related
As title says, I was wondering how I can calculate and set a variable inside a merge statement. If that is even possible.
Example:
MERGE TABLE_1 as target
USING TABLE_2 as source
ON (target.USER_ID = source.USER_ID)
WHEN NOT MATCHED THEN
INSERT (
USER_ID,
CURRENT_CALCULATION,
CURRENT_CALCULATION_VALUE )
VALUES (
source.USER_ID,
SET #CURRENT_CALCULATION = (select value from table3 where table3.USER_ID = source.USER_ID),
... REUSE #CURRENT_CALCULATION for other purposes ...
);
I have tried different kind of syntax but none seems to work.
I don't believe this is possible. Without a little more detail I can't be sure that this will work for your situation, but how about simply carrying out this logic before your MERGE statement? You could always dump everything into a temp table at the point you do all the calculations if what you're trying to avoid is hitting the same tables twice.
If you did that, you could simply use your temp table as the source for the merge - you may not even need to put it into a variable, as you might be able to include it as a column of the temp table.
Move the logic to source part of Merge and reuse it
MERGE TABLE_1 AS target
USING (SELECT t2.*,
t3.value
FROM TABLE_2 t2
LEFT JOIN table3 t3
ON t3.USER_ID = t2.USER_ID) AS source
ON ( target.USER_ID = source.USER_ID )
WHEN NOT MATCHED THEN
INSERT ( USER_ID,
CURRENT_CALCULATION,
CURRENT_CALCULATION_VALUE )
VALUES ( source.USER_ID,
source.value,
source.value + some logic );
Considering there is 1:1 relationship between table 2 and table 3 based on your correlated sub-query used to find #CURRENT_CALCULATION
Please consider the following:
use db1;
select * into #db1_tmp from mytable;
use db2;
select * into #db2_tmp from myothertable;
-- join temp table 1 and 2
select * from #db1_tmp a
left join db2_tmp b on where a.uid = b.uid;
This works, but SQL Server Management Studio is red-underlining #db1_tmp in the last query and therefore in all other statements that depend on this table.
Question: what is the proper way to access a temp table created in another database to prevent this underlining from happening? I tried db1.#db1_tmp but this does not work. I'm on SQL Server 2008.
Temp tables actually appear in their own database, TempDB. I think the root of your issue is the use statements. Try this instead:
select * into #db1_tmp from db1.dbo.mytable;
select * into #db2_tmp from db2.dbo.myothertable;
-- join temp table 1 and 2
select * from #db1_tmp a
left join db2_tmp b on where a.uid = b.uid;
But if this is the extent of what you're doing (creating the temp tables just to do a join across the databases), you can skip the temp tables altogether:
select * from db1.dbo.mytable a join db2.dbo.myothertable b on a.uid = b.uid.
Temp tables has no use of Database reference for creation so gets created in tempdb.
Only the source can be changed with "use" or "dbname.dbo.mytable".
The red-underlining is due to intellisense. The temp table is just identified as a normal table before execution and redlined due to database change.
Note: The select query at the last has syntax errors.
it should be,
select * from #db1_tmp a left join #db2_tmp b on a.uid = b.uid;
I have a scenario where I would like to update multiple fields in multiple Tables using just one instuction. I need a Syntax to perform such opperations on multiple Databases (Oracle and MSSQL).
At the moment I am stuck at the following statement from MSSQL:
update table1
set table1.value = 'foo'
from table1 t1 join table2 t2 on t1.id = t2.tab1_id
where t1.id = 1234
I would like to update a field in t2 aswell in the same statement.
Further I would like to perform the same Update(s) on Oracle.
EDIT:Seems like I can not update multiple Tables in just one statement. Is there a syntax that works for Oracle and MSSql when updating using a Join?
Regards
Seems like I can not update multiple
Tables in just one statement.
Is there a syntax that works for
Oracle and MSSql when updating using a
Join?
I assume when you re-posed the question you want syntax that will work on both Oracle and SQL Server even though it will inevitably affect only one table.
Entry level SQL-92 Standard code is supported by both platforms, therefore the following 'scalar subqueries' SQL-92 code should work:
UPDATE table1
SET my_value = (
SELECT t2.tab1_id
FROM table2 AS t2
WHERE t2.tab1_id = table1.id
)
WHERE id = 1234
AND EXISTS (
SELECT *
FROM table2 AS t2
WHERE t2.tab1_id = table1.id
);
Note that while using the correlation name t1 for Ttble1 is valid syntax according to the SQL-92 Standard this will materialize a table and the UPDATE will then target the materialized table 't1' and leave your base table 'table1` unaffected, which I assume is not the desired affect. While I'm fairly sure both Oracle and SQL Server are non-compliant is this regard and that in practise would work as expected, there's no harm in being ultra cautious and sticking to the SQL-92 syntax by fully qualifying the target table.
Folk tend not to like the 'repeated' code in the above subqueries (even though the optimizer should be smart enough to evaluate it only once).
More recent versions of Oracle and SQL Server support both support Standard SQL:2003 MERGE syntax, would may be able to use something close to this:
MERGE INTO table1
USING (
SELECT t2.tab1_id
FROM table2 AS t2
) AS source
ON id = source.tab1_id
AND id = 1234
WHEN MATCHED THEN
UPDATE
SET my_value = source.tab1_id;
I just noticed your example is even simpler than I first thought and merely requires a simple subquery that should run on most SQL products e.g.
UPDATE table1
SET my_value = 'foo'
WHERE EXISTS (
SELECT *
FROM table2 AS t2
WHERE t2.tab1_id = table1.id
);
on Oracle, you can update only one table , but you could think of using a trigger .
Hi to all you mighty SQLsuperheros out there..
Can anyone rescue me from imminent disaster and ruin?
I'm working with Microsoft Access SQL. I'd like to select records in one table (table1) that don't appear in another (table2) .. and then insert new records into table2 that are based on records in table1, as follows:
[table1]
file_index : filename
[table2]
file_index : celeb_name
I want to:
Select all records from table1 where [filename] is like aud
and whose corresponding [file_index] value does not
exist in table2 with with field [celeb_name] = 'Audrey Hepburn'
With that selection I then want to insert a new record into [table2]
[file_index] = [table1].[file_index]
[celeb_name] = 'Audrey Hepburn'
There is a one to many relationship between [file_index] in [table1] and [table2]
One record in [table1], to many in [table2].
Many thanks
Will this do? Obviously add some square brackets and stuff. Not too into Access myself.
INSERT INTO table2 (file_index, celeb_name)
SELECT file_index, 'Audrey Hepburn'
FROM table1
WHERE filename = 'aud'
AND file_index NOT IN (SELECT DISTINCT file_index
FROM table2
WHERE celeb_name = 'Audrey Hepburn')
As I said in comments, NOT IN is not well-optimized by Jet/ACE and it's usually more efficient to use an OUTER JOIN. In this case, because you need to filter on the outer side of the join, you'll need a subquery:
INSERT INTO photos_by_celebrity ( ORIG_FILE_INDEX, celebrity_name )
SELECT tblOriginal_Files.ORIG_FILE_INDEX, 'Audrey Hepburn'
FROM tblOriginal_Files
LEFT JOIN (SELECT DISTINCT ORIG_FILE_INDEX
FROM photos_by_celebrity
WHERE celebrity_name = 'Audrey Hepburn') AS Photos
ON tblOriginal_Files.ORIG_FILE_INDEX = Photos.ORIG_FILE_INDEX
WHERE Photos.ORIG_FILE_INDEX Is Null;
(that may not be exactly right -- I'm terrible with writing SQL by hand, particularly getting the JOIN syntax right)
I must say, though, that I'm wondering if this will insert too many records (and the same reservation applies to the NOT IN version).
In the original question I'd modified my table and field names and inserted square brackets in to make it easier to read.
Below is the final SQL statement that worked in MS Access format. Awesome result, thanks again Tor!!
INSERT INTO photos_by_celebrity ( ORIG_FILE_INDEX, celebrity_name )
SELECT tblOriginal_Files.ORIG_FILE_INDEX, 'Audrey Hepburn' AS Expr1
FROM tblOriginal_Files
WHERE (((tblOriginal_Files.ORIG_FILE_INDEX) Not In (SELECT DISTINCT ORIG_FILE_INDEX
FROM photos_by_celebrity
WHERE celebrity_name = 'Audrey Hepburn')) AND ((tblOriginal_Files.ORIGINAL_FILE) Like "*aud*"));
You can use NOT Exists
I think it is the best way from the side of performance.
As Follow:
INSERT INTO table2 (file_index, celeb_name)
SELECT file_index, 'Audrey Hepburn'
FROM table1
WHERE filename = 'aud'
AND NOT Exists (SELECT file_index
FROM table2
WHERE celeb_name = 'Audrey Hepburn')
I got a query with five joins on some rather large tables (largest table is 10 mil. records), and I want to know if rows exists. So far I've done this to check if rows exists:
SELECT TOP 1 tbl.Id
FROM table tbl
INNER JOIN ... ON ... = ... (x5)
WHERE tbl.xxx = ...
Using this query, in a stored procedure takes 22 seconds and I would like it to be close to "instant". Is this even possible? What can I do to speed it up?
I got indexes on the fields that I'm joining on and the fields in the WHERE clause.
Any ideas?
switch to EXISTS predicate. In general I have found it to be faster than selecting top 1 etc.
So you could write like this IF EXISTS (SELECT * FROM table tbl INNER JOIN table tbl2 .. do your stuff
Depending on your RDBMS you can check what parts of the query are taking a long time and which indexes are being used (so you can know they're being used properly).
In MSSQL, you can use see a diagram of the execution path of any query you submit.
In Oracle and MySQL you can use the EXPLAIN keyword to get details about how the query is working.
But it might just be that 22 seconds is the best you can do with your query. We can't answer that, only the execution details provided by your RDBMS can. If you tell us which RDBMS you're using we can tell you how to find the information you need to see what the bottleneck is.
4 options
Try COUNT(*) in place of TOP 1 tbl.id
An index per column may not be good enough: you may need to use composite indexes
Are you on SQL Server 2005? If som, you can find missing indexes. Or try the database tuning advisor
Also, it's possible that you don't need 5 joins.
Assuming parent-child-grandchild etc, then grandchild rows can't exist without the parent rows (assuming you have foreign keys)
So your query could become
SELECT TOP 1
tbl.Id --or count(*)
FROM
grandchildtable tbl
INNER JOIN
anothertable ON ... = ...
WHERE
tbl.xxx = ...
Try EXISTS.
For either for 5 tables or for assumed heirarchy
SELECT TOP 1 --or count(*)
tbl.Id
FROM
grandchildtable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
-- or
SELECT TOP 1 --or count(*)
tbl.Id
FROM
mytable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
AND
EXISTS (SELECT *
FROM
yetanothertable T3
WHERE
tbl.key = T3.key /* AND T3 condition*/)
Doing a filter early on your first select will help if you can do it; as you filter the data in the first instance all the joins will join on reduced data.
Select top 1 tbl.id
From
(
Select top 1 * from
table tbl1
Where Key = Key
) tbl1
inner join ...
After that you will likely need to provide more of the query to understand how it works.
Maybe you could offload/cache this fact-finding mission. Like if it doesn't need to be done dynamically or at runtime, just cache the result into a much smaller table and then query that. Also, make sure all the tables you're querying to have the appropriate clustered index. Granted you may be using these tables for other types of queries, but for the absolute fastest way to go, you can tune all your clustered indexes for this one query.
Edit: Yes, what other people said. Measure, measure, measure! Your query plan estimate can show you what your bottleneck is.
Use the maximun row table first in every join and if more than one condition use
in where then sequence of the where is condition is important use the condition
which give you maximum rows.
use filters very carefully for optimizing Query.