Optimize query that compares two tables with similar schema in different databases - sql

I have two different tables with similar schema in different database. What is the best way to compare records between these two tables. I need to find out-
records that exists in first table whose corresponding record does not exist in second table filtering records from the first table with some where clauses.
So far I have come with this SQL construct:
Select t1_col1, t1_ col2 from table1
where t1_col1=<condition> AND
t1_col2=<> AND
NOT EXISTS
(SELECT * FROM
table2
WHERE
t1_col1=t2_col1 AND
t1_col2=t2_col2)
Is there a better way to do this?
This above query seems fine but I suspect it is doing row by row comparison without evaluating the conditions in the first part of the query because the first part of the query will reduce the resultset very much. Is this happening?

Just use except keyword!!!
Select t1_col1, t1_ col2 from table1
where t1_col1=<condition> AND
t1_col2=<condition>
except
SELECT t2_col1, t2_ col2 FROM table2
It returns any distinct values from the query to the left of the EXCEPT operand that are not also returned from the right query.
For more information on MSDN

If the data in both table are expected to have the same primary key, you can use IN keyword to filter those are not found in the other table. This could be the simplest way.
If you are open to third party tools like Redgate Data Compare you can try it, it's a very nice tool. Visual Studio 2010 Ultimate edition also have this feature.

Related

Trying to use cursor on one database using select from another db

So I'm trying to wrap my head around cursors. I have task to transfer data from one database to another, but they have slightly diffrent schemas. Let's say I have TableOne (Id, Name, Gold) and TableTwo (Id, Name, Lvl). I want to take all records from TableTwo and insert it into TableOne, but it can be duplicated data on Name column. So if single record from TableTwo exist (on Name column comparison) in TableOne, I want to skip it, if don't - create record in TableOne with unique Id.
I was thinking about looping on each record in TableTwo, and for every record check if it's exist in TableOne. So, how do I make this check without making call to another database every time? I wanted first select all record from TableOne, save it into variable and in loop itself make check against this variable. Is this even possible in SQL? I'm not so familiar with SQL, some code sample would help a lot.
I'm using Microsoft SQL Server Management Studio if that matters. And of course, TableOne and TableTwo exists in diffrent databases.
Try this
Insert into table1(id,name,gold)
Select id,name,lvl from table2
Where table2.name not in(select t1.name from table1 t1)
If you want to add newId for every row you can try
Insert into table1(id,name,gold)
Select (select max(m.id) from table1 m) + row_number() over (order by t2.id) ,name,lvl from table2 t2
Where t2.name not in(select t1.name from table1 t1)
It is possible yes, but I would not recommend it. Looping (which is essentially what a cursor does) is usually not advisable in SQL when a set-based operation will do.
At a high level, you probably want to join the two tables together (the fact that they're in different databases shouldn't make a difference). You mention one table has duplicates. You can eliminate those in a number of ways such as using a group by or a row_number. Both approaches will require you understanding which rows you want to "pick" and which ones you want to "ignore". You could also do what another user posted in a comment where you do an existence check against the target table using a correlated subquery. That will essentially mean that if any rows exist in the target table that have duplicates you're trying to insert, none of those duplicates will be put in.
As far as cursors are concerned, to do something like this, you'd be doing essentially the same thing, except on each pass of the cursor you would be temporarily assigning and using variables instead of columns. This approach is sometimes called RBAR (for "Rob by Agonizing Row"). On every pass of the cursor or loop, it has to re-open the table, figure out what data it needs, then operate on it. Even if that's efficient and it's only pulling back one row, there's still lots of overhead to doing that query. So while, yes, you can force SQL to do what you've describe, the database engine already has an operation for this (joins) which does it far faster than any loop you could conceivably write

Difference between two tables, unknown fields

Is there a way in Access using SQL to get the difference between 2 tables?
I'm building an audit function and I want to return all records from table1 where a value (or values) doesn't match the corresponding record in table2. Primary keys will always match between the two tables. They will always contain the exact same number of fields, field names, and types, as each other. However, the number and name of those fields cannot be determined before the query is run.
Please also note, I am looking for an Access SQL solution. I know how to solve this with VBA.
Thanks,
There are several possibilities to compare fields with known names, but there is no way in SQL to access fields without knowing their name. Mostly becase SQL doesn't consider fields to have a specific order in a table.
So the only way to accomplish what you need in pure Access-SQL would be, if there was a SQL-Command for it (kind of like the * as placeholder for all fields). But there isn't. Microsoft Access SQL Reference.
What you COULD do is create an SQL-clause on the fly in VBA. (I know, you said you didn't want to do it in VBA - but this is doing it in SQL, but using VBA to create the SQL..).
Doing everything in VBA would probably take some time, but creating an SQL on the fly is very fast and you can optimize it to the specific table. Then executing the SQL is the fastest solution you can get.
Not sure without your table structure but you can probably get that done using NOT IN operator (OR) using WHERE NOT EXISTS like
select * from table1
where some_field not in (select some_other_field from table2);
(OR)
select * from table1 t1
where not exists (select 1 from table2 where some_other_field = t1.some_field);
SELECT A.*, B.* FROM A FULL JOIN B ON (A.C = B.C) WHERE A.C IS NULL OR B.C IS NULL;
IF you have tables A and B, both with colum C, here are the records, which are present in table A but not in B.To get all the differences with a single query, a full join must be used,like above

Using EXCEPT clause in PostgreSQL

I am trying to use the EXCEPT clause to retrieve data from table. I want to get all the rows from table1 except the one's that exist in table2.
As far I understand, the following would not work:
CREATE TABLE table1(pk_id int, fk_id_tbl2 int);
CREATE TABLE table2(pk_id int);
Select fk_id_tbl2
FROM table1
Except
Select pk_id
FROM table2
The only way I can use EXCEPT seems to be to select from the same tables or select columns that have the same column name from different tables.
Can someone please explain how best to use the explain clause?
Your query seems perfectly valid:
SELECT fk_id_tbl2 AS some_name
FROM table1
EXCEPT -- you may want to use EXCEPT ALL
SELECT pk_id
FROM table2;
Column names are irrelevant to the query. Only data types must match. The output column name of your query is fk_id_tbl2, just because it's the column name in the first SELECT. You can use any alias.
What's often overlooked: the subtle differences between EXCEPT (which folds duplicates) and EXCEPT ALL - which keeps all individual unmatched rows.
More explanation and other ways to do the same, some of them much more flexible:
Select rows which are not present in other table
Details for EXCEPT in the manual.

Optimizing an Oracle SQL query which uses IN clause extensively

I maintain an application where I am trying to optimize an Oracle SQL query wherein multiple IN clauses are used. This query is now a blocker as it hogs nearly 3 minutes of execution time and affects application performance severely.The query is called from Java code(JDBC) and looks like this :
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1 and not(col1 in (idsetone1,idsetone2,... idsetoneN)) or
(col1 in(idsettwo1,idsettwo2,...idsettwoN))....
(col1 in(idsetN1,idsetN2,...idsetNN))
The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible. ID sets have grown over time with use of the application and currently they number more than 10,000 records.
How can I start with optimizing this query ?
I really doupt about "The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible." Of course you can join the tables, provided you got select privileges on it.
Anyway, let's assume it is not possible due to whatever reason. One solution could be to insert all entries first into a Nested Table and the use this one:
CREATE OR REPLACE TYPE NUMBER_TABLE_TYPE AS TABLE OF NUMBER;
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1
and not (col1 NOT MEMBER OF (NUMBER_TABLE_TYPE(idsetone1,idsetone2,... idsetoneN))
OR
(col1 MEMBER OF NUMBER_TABLE_TYPE(idsettwo1,idsettwo2,...idsettwoN))
Regarding the max. number of elements Oracle Documentation says: Because a nested table does not have a declared size, you can put as many elements in the constructor as necessary.
I don't know how serious you can take this statement.
You should put all the items into one temporary table and to an explicit join:
Select your cols
from Table1
left join table_with_items
on table_with_items.id = Table1.col1
where table_with_items.id is null;
Also that distinct suggest a problem in your business logic or in the architecture of application. Why do you have duplicate ids? You should get rid of that distinct.

How to put more than 1million ID's using union All [duplicate]

I have comma delimited id's that I want to use in NOT IN clause..
I'm using oracle 11g.
select * from table where ID NOT IN (1,2,3,4,...,1001,1002,...)
results in
ORA-01795: maximum number of expressions in a list is 1000
I don't want to use temp table. am trying considering doing this
select * from table1 where ID NOT IN (1,2,3,4,…,1000) AND
ID NOT IN (1001,1002,…,2000)
Is there any other better workaround to this issue?
You said you don't want to, but: use a temporary table. That's the correct solution here.
Query parsing is expensive in Oracle, and that's what you'll get when you put thousands of identifiers into a giant blob of SQL. Also, there are ill-defined limits on query length that you're going to hit. Doing an anti-JOIN against a table, on the other hand... Oracle is good at that. Bulk loading data into a table, Oracle is good at that too. Use a temp table.
Limiting IN to a thousand entries is a sanity check. The fact that you're hitting it means you're trying to do something insane.
Jump out of the question, can you combine the SQL to get more than 1000 IDs with this SQL. That's the better way to simplify your SQLs.
It's insane.
But you can probably try to select from select:
SELECT * FROM
(SELECT * FROM table WHERE ID NOT IN (1,2,3,4,...,1000))
WHERE ID NOT IN (1001,1002,…,2000)
Make as many levels as you need.
Use MINUS, the opposite to `UNION
SELECT * FROM TABLE
MINUS
SELECT T.* FROM TABLE T,TABLE2 T2 WHERE T.ID = T2.ID
This represents registers on table T which id not in table2 t2