I have two databases on a local machine, connected to localhost. They both have roughly two million rows a piece. I was doing the following very simple join and it took over a minute to complete.
select distinct x.patid
from [i 3 sci study].dbo.clm_extract as x
left join [i 3 study].dbo.claims as y on y.patid=x.patid
where y.patid is null
When I looked at the execution plan I saw that the join showplan operator had this to say
Why is the actual number of rows so exorbitantly high compared to the actual number of rows in both tables?
The LEFT JOIN will match each row on the left with each row on the right, and then filter. Assuming patid is not unique in either table, the number of possible match combinations could get very high.
Try the following:
SET NOCOUNT ON;
GO
CREATE TABLE #t1 (Id INT NOT NULL);
CREATE TABLE #t2 (Id INT NOT NULL);
GO
INSERT #t1 (Id)
VALUES (1);
GO 100
INSERT #t2 (Id)
SELECT Id FROM #t1;
GO
Now look at the execution plan for the left join query form:
SELECT *
FROM #t1
LEFT OUTER JOIN #t2 ON #t1.Id = #t2.Id
WHERE #t2.Id IS NULL;
Looking at the execution plan, the hash join shows 10,000 actual rows (100 from #t1 x 100 from #t2). This shows the advantage of checking for existence (or a lack thereof) using any of the following T-SQL syntaxes:
SELECT #t1.Id
FROM #t1
WHERE NOT EXISTS (SELECT * FROM #t2 WHERE Id = #t1.Id);
-- #t2.Id must not contain any NULLs for this to be correct
SELECT #t1.Id
FROM #t1
WHERE Id NOT IN (SELECT #t2.Id FROM #t2);
-- Returns DISTINCT #t1 values
SELECT Id
FROM #t1
EXCEPT
SELECT Id
FROM #t2;
Checking for a lack of existence enables the engine to short circuit. This is due to the anti semi join. As soon as the first match is found, it moves on to the next record. For more details, see this blog post.
Related
This question already has an answer here:
What's different between INTERSECT and JOIN?
(1 answer)
Closed 4 years ago.
I understand, that INNER JOIN is made for referenced keys and INTERSECT is not. But afaik in some cases, both of them can do the same thing. So, is there a difference (in performance or anything) between the following two expressions? And if there is, which one is better?
Expression 1:
SELECT id FROM customers
INNER JOIN orders ON customers.id = orders.customerID;
Expression 2:
SELECT id FROM customers
INTERSECT
SELECT customerID FROM orders
They are very different, even in your case.
The INNER JOIN will return duplicates, if id is duplicated in either table. INTERSECT removes duplicates. The INNER JOIN will never return NULL, but INTERSECT will return NULL.
The two are very different; INNER JOIN is an operator that generally matches on a limited set of columns and can return zero rows or more rows from either table. INTERSECT is a set-based operator that compares complete rows between two sets and can never return more rows than in the smaller table.
Try the following, for example:
CREATE TABLE #a (id INT)
CREATE TABLE #b (id INT)
INSERT INTO #a VALUES (1), (NULL), (2)
INSERT INTO #b VALUES (1), (NULL), (3), (1)
SELECT a.id FROM #a a
INNER JOIN #b b ON a.id = b.id
SELECT id FROM #a
INTERSECT
SELECT id FROM #b
This question already has an answer here:
What's different between INTERSECT and JOIN?
(1 answer)
Closed 4 years ago.
I understand, that INNER JOIN is made for referenced keys and INTERSECT is not. But afaik in some cases, both of them can do the same thing. So, is there a difference (in performance or anything) between the following two expressions? And if there is, which one is better?
Expression 1:
SELECT id FROM customers
INNER JOIN orders ON customers.id = orders.customerID;
Expression 2:
SELECT id FROM customers
INTERSECT
SELECT customerID FROM orders
They are very different, even in your case.
The INNER JOIN will return duplicates, if id is duplicated in either table. INTERSECT removes duplicates. The INNER JOIN will never return NULL, but INTERSECT will return NULL.
The two are very different; INNER JOIN is an operator that generally matches on a limited set of columns and can return zero rows or more rows from either table. INTERSECT is a set-based operator that compares complete rows between two sets and can never return more rows than in the smaller table.
Try the following, for example:
CREATE TABLE #a (id INT)
CREATE TABLE #b (id INT)
INSERT INTO #a VALUES (1), (NULL), (2)
INSERT INTO #b VALUES (1), (NULL), (3), (1)
SELECT a.id FROM #a a
INNER JOIN #b b ON a.id = b.id
SELECT id FROM #a
INTERSECT
SELECT id FROM #b
I have a table A which is having around 50000 records and a table B which is having 50000 records as well.
sample data:
A B
1 1
2 2
3 null
4 null
I want to find records 3, 4 which are present in Table A but not in Table B.
I am using
select id from A where id NOT IN(select id from B)
I have also tried NOT Exist, but as the records are very large in number, it still takes a lot of time.
select id from A where NOT Exists(select id from B and B.id = A.id)
Left Outer Join cant be used to find the missing records as the id is not present in Table B.
Is there any way to make the Query Work Faster in Sybase itself?
Or Shifting the database to MongoDB is the solution?
I'm not sure why you are not prepare LEFT JOIN, I tried with the LEFT JOIN it returns your expected result.
Sample execution with the given data:
DECLARE #TableA TABLE (Id INT);
DECLARE #TableB TABLE (Id INT);
INSERT INTO #TableA (Id) VALUES (1), (2), (3), (4);
INSERT INTO #TableB (Id) VALUES (1), (2), (NULL), (NULL);
SELECT T1.Id
FROM #TableA T1
LEFT JOIN #TableB T2 ON T2.Id = T1.Id
WHERE T2.Id IS NULL
Result
3
4
In performance perspective, always try to avoid using inverse keywords like NOT IN, NOT EXISTS. Because to check the inverse items DBMS need to runs through all the available records and drop the inverse selection.
LEFT JOIN / IS NULL and NOT EXISTS are semantically equivalent, while NOT IN is not. These method differ in how they handle NULL values in table_right.
Therefore, You should go for LEFT JOIN to improve your sql performance.
select A.id from A LEFT JOIN B
on A.id = B.id
where B.id is null
order by A.id;
I have a scenario whereby I have 3 tables (Table1, Table2, Table3)
Table1 contains data whereby each MEMBNO is unique
I would like to JOIN to Table2 and Table3 to display results but only have one row for each result
I tried
SELECT A.MEMBNO,A.FIELD1,B.FIELD1,B.FIELD2,C.FIELD1
FROM Table1 A
INNER join Table2 B ON A.MEMBNO = B.MEMBNO
INNER join Table3 C ON A.MEMBNO = C.MEMBNO
but I get multiple results. If the MEMBNO is in Table2 twice and Table3 four times, I get 8 rows returned.
Is my JOIN correct or is the only way to control this through the WHERE statement after the JOIN to control what is returned from Table2 and Table3 (ie: does SQL "dumb" join all the data and expect the WHERE statement to be the filer?)
Many thanks
What you are fighting with is the different relationships between the data. Table1 is the primary key table which has your one row per MEMBNO. Table2\3 have more than one row for each MEMBNO. What you therefore need to think about is what data you actually want to see before you attempt the joins. The difference in cardinality is causing your row duplication when the joins are happening. If you want the data in Table2\3 to be squished into a single row, have a think how that might look. i.e. do you want to sum the numbers from the different rows into a total? do you want to take the maximum date? etc
Best thing to do is give some data examples from each table and give an example result. More than happy to have a go if you add that info.
As I am concern about only MEMBNO. What if I use distinct of MEMBNO from both tables Table2 and Table3.
Check the below example:
create table #t1
(
F1 int,
F2 int
)
Insert into #t1 values(1, 111)
Create table #t2
(
F1 int,
F2 int
)
Insert into #t2 values(1, 111)
Insert into #t2 values(1, 222)
Create table #t3
(
F1 int,
F2 int
)
Insert into #t3 values(1, 333)
Insert into #t3 values(1, 444)
SELECT a.*
FroM #t1 a left join (Select distinct f1 from #t2) b on a.F1 = b.f1
left join (Select distinct f1 from #t3) c on a.F1 = c.f1
Where #t1, #t2, #t3 are table1, table2, table3 respecively
AND F1 is your MEMBNO in all the tables.
You get multiple results because of using inner join.
You should use left or right join.
EDIT: I have written it a bit wrong gill change my Q
I'm a newbie with SQL and I have a Q..
I made 2 Temp. Tables.
Each has 25 Rows.(DateValue)
I want to combine this 2 tables in a third table..
First Table is [From]
Second Table is [To]...
Both tables have different values
I want to get it like this:
From| To |
1111|2222
2222|3333
3333|4444
etc..
I use this simple Query
Create Table #T3
(
[From] Datetime
,[To] Datetime
)
INSERT Into #T3
SELECT Distinct #T1.[From], #T2.[To]
From #T1,#T2
Where #T1.[From] is not null
And #T2.[To] is not null
Select * from #T3
Drop Table #T3
Drop Table #T2
Drop Table #T1
But my results are like this
From| To |
1111|1111
1111|2222
1111|3333
2222|1111
2222|2222
2222|3333
It multiplies the first field with the second wich gives me alot more records back..
Any help ?
THANKS !
After the OP's edit
This may work as you want (which is not entirely clear):
INSERT INTO #T3
SELECT #T1.[From]
, MIN(#T2.[To])
FROM #T1
JOIN #T2
ON #T1.[From] < #T2.[To]
GROUP BY #T1.[From]
Using
FROM T1, T2
results in all combinations or rows of T1 and T2. It's called a cross product and (properly) used with CROSS JOIN, like this:
FROM T1 CROSS JOIN T2
When you want to join the two tables based on a condition (and not get the cross product), you use a JOIN or INNER JOIN (these two are same thing):
FROM T1 JOIN T2
ON T1.[From] = T2.[To]
will get you all rows combinations where T1.From matches T2.To (on equality). I suppose you wanted to match every row of T1 with the row of T2 where T2.To was just larger than T1.From so I used the "smaller than" < operator instead of the "equality" = operator.
The GROUP BY and MIN() were added to get only the one with smallest T2.To from those rows.
It would do. It will insert a copy of table 2 for each line of table 1, as you didnt say how for it to work out how to extract what you want.
Now, assuming from and to are the same.. you can do
INSERT Into #T3
SELECT Distinct #T1.[From], #T2.[To]
From #T1 left join #T2 on #T1.[From]=#T2.[To]
Where #T1.[From] is not null
if this isnt how you mean (although having same value in both columns would seem counter productive in that sense), what other fields have you got and how would you tie the lines together.