SQL - Two DISTINCTs performing very poorly - sql

I've got two tables containing a column with the same name. I try to find out which distinct values exist in Table2 but don't exist in Table1. For that I have two SELECTs:
SELECT DISTINCT Field
FROM Table1
SELECT DISTINCT Field
FROM Table2
Both SELECTs finish within 2 Seconds and return about 10 rows each. If I restructure my query to find out which values are missing in Table1, the query takes several minutes to finish:
SELECT DISTINCT Field
FROM Table1
WHERE Field NOT IN
(
SELECT DISTINCT Field
FROM Table2
)
My temporary workaround is inserting the results of the second distinct in a temporary table an comparing against it. But the performance still isn't great.
Does anyone know why this happens? I guess because SQL-Server keeps recalculating the second DISTINCT but why would it? Shouldn't SQL-Server optimize this somehow?

Not sure if this will improve performance but i'd use EXCEPT:
SELECT Field
FROM Table1
EXCEPT
SELECT Field
FROM Table2
There is no need to use DISTINCT because EXCEPT is a set operator that removes duplicates.
EXCEPT returns distinct rows from the left input query that aren’t
output by the right input query.
The number and the order of the columns must be the same in all queries.
The data types must be compatible.

Related

MSSQL - Question about how insert queries run

We have two tables we want to merge. Say, table1 and table2.
They have the exact same columns, and the exact same purpose. The difference being table2 having newer data.
We used a query that uses LEFT JOIN to find the the rows that are common between them, and skip those rows while merging. The problem is this. both tables have 500M rows.
When we ran the query, it kept going on and on. For an hour it just kept running. We were certain this was because of the large number of rows.
But when we wanted to see how many rows were already inserted to table2, we ran the code select count(*) from table2, it gave us the exact same row count of table2 as when we started.
Our questions is, is that how it's supposed to be? Do the rows get inserted all at the same time after all the matches have been found?
If you would like to read uncommited data, than the count should me modified, like this:
select count(*) from table2 WITH (NOLOCK)
NOLOCK is over-used, but in this specific scenario, it might be handy.
No data are inserted or updated one by one.
I have no idea how it is related with "Select count(*) from table2 WITH (NOLOCK) "
Join condition is taking too long to produce Resultset which will be use by insert operator .So actually there is no insert because no resultset is being produce.
Join query is taking too long because Left Join condition produces very very high cardinality estimate.
so one has to fix Join condition first.
for that need other info like Table schema ,Data type and length and existing index,requirement.

Count rows with column varbinary NOT NULL tooks a lot of time

This query
SELECT COUNT(*)
FROM Table
WHERE [Column] IS NOT NULL
takes a lot of time. The table has 5000 rows, and the column is of type VARBINARY(MAX).
What can I do?
Your query needs to do a table scan on a column that can potentially be very large without any way to index it. There isn't much you can do to fix this without changing your approach.
One option is to split the table into two tables. The first table could have all the details you have now in it and the second table would have just the file. You can make this a 1-1 table to ensure data is not duplicated.
You would only add the binary data as needed into the second table. If it is not needed anymore, you simply delete the record. This will allow you to simply write a JOIN query to get the information you are looking for.
SELECT
COUNT(*)
FROM dbo.Table1
INNER JOIN dbo.Table2
ON Table1.Id = Table2.Id

Using EXCEPT clause in PostgreSQL

I am trying to use the EXCEPT clause to retrieve data from table. I want to get all the rows from table1 except the one's that exist in table2.
As far I understand, the following would not work:
CREATE TABLE table1(pk_id int, fk_id_tbl2 int);
CREATE TABLE table2(pk_id int);
Select fk_id_tbl2
FROM table1
Except
Select pk_id
FROM table2
The only way I can use EXCEPT seems to be to select from the same tables or select columns that have the same column name from different tables.
Can someone please explain how best to use the explain clause?
Your query seems perfectly valid:
SELECT fk_id_tbl2 AS some_name
FROM table1
EXCEPT -- you may want to use EXCEPT ALL
SELECT pk_id
FROM table2;
Column names are irrelevant to the query. Only data types must match. The output column name of your query is fk_id_tbl2, just because it's the column name in the first SELECT. You can use any alias.
What's often overlooked: the subtle differences between EXCEPT (which folds duplicates) and EXCEPT ALL - which keeps all individual unmatched rows.
More explanation and other ways to do the same, some of them much more flexible:
Select rows which are not present in other table
Details for EXCEPT in the manual.

How to put more than 1million ID's using union All [duplicate]

I have comma delimited id's that I want to use in NOT IN clause..
I'm using oracle 11g.
select * from table where ID NOT IN (1,2,3,4,...,1001,1002,...)
results in
ORA-01795: maximum number of expressions in a list is 1000
I don't want to use temp table. am trying considering doing this
select * from table1 where ID NOT IN (1,2,3,4,…,1000) AND
ID NOT IN (1001,1002,…,2000)
Is there any other better workaround to this issue?
You said you don't want to, but: use a temporary table. That's the correct solution here.
Query parsing is expensive in Oracle, and that's what you'll get when you put thousands of identifiers into a giant blob of SQL. Also, there are ill-defined limits on query length that you're going to hit. Doing an anti-JOIN against a table, on the other hand... Oracle is good at that. Bulk loading data into a table, Oracle is good at that too. Use a temp table.
Limiting IN to a thousand entries is a sanity check. The fact that you're hitting it means you're trying to do something insane.
Jump out of the question, can you combine the SQL to get more than 1000 IDs with this SQL. That's the better way to simplify your SQLs.
It's insane.
But you can probably try to select from select:
SELECT * FROM
(SELECT * FROM table WHERE ID NOT IN (1,2,3,4,...,1000))
WHERE ID NOT IN (1001,1002,…,2000)
Make as many levels as you need.
Use MINUS, the opposite to `UNION
SELECT * FROM TABLE
MINUS
SELECT T.* FROM TABLE T,TABLE2 T2 WHERE T.ID = T2.ID
This represents registers on table T which id not in table2 t2

Optimize query that compares two tables with similar schema in different databases

I have two different tables with similar schema in different database. What is the best way to compare records between these two tables. I need to find out-
records that exists in first table whose corresponding record does not exist in second table filtering records from the first table with some where clauses.
So far I have come with this SQL construct:
Select t1_col1, t1_ col2 from table1
where t1_col1=<condition> AND
t1_col2=<> AND
NOT EXISTS
(SELECT * FROM
table2
WHERE
t1_col1=t2_col1 AND
t1_col2=t2_col2)
Is there a better way to do this?
This above query seems fine but I suspect it is doing row by row comparison without evaluating the conditions in the first part of the query because the first part of the query will reduce the resultset very much. Is this happening?
Just use except keyword!!!
Select t1_col1, t1_ col2 from table1
where t1_col1=<condition> AND
t1_col2=<condition>
except
SELECT t2_col1, t2_ col2 FROM table2
It returns any distinct values from the query to the left of the EXCEPT operand that are not also returned from the right query.
For more information on MSDN
If the data in both table are expected to have the same primary key, you can use IN keyword to filter those are not found in the other table. This could be the simplest way.
If you are open to third party tools like Redgate Data Compare you can try it, it's a very nice tool. Visual Studio 2010 Ultimate edition also have this feature.