not getting expected result while joining - sql

i want to display #acc that has no child here result must be B,C,D,E but getting B,C,D only
create table #acc (mainid int,name nvarchar(20),subid int)
insert into #acc values(1,'A',0)
insert into #acc values(2,'B',1)
insert into #acc values(3,'C',1)
insert into #acc values(4,'D',1)
insert into #acc values(5,'E',0)
select A.name from #acc
A inner join #acc B
on
A.subid = B.mainid
drop table #acc

First of all, I think you should rename the subid column to superid or parentid or something like that, because it is B, C & D that are sub-items of A, not the other way round. Maybe the inconsistent naming is exactly the reason why the results of your query seem incomprehensible to you or why you find it difficult to construct a query that returns the correct results.
Your query is essentially returning items that are some other items' children. They themselves may or may not have their own children. For example, if B, C or D had children, your query would return those children in addition to B, C and D. That does not seem exactly what you are after.
What you need here is not an inner join but an anti-join. It is when results are returned based on the fact that something has not matched. Anti-joins can be implemented in different ways:
Using LEFT JOIN + IS NULL check:
SELECT A.*
FROM #acc A
LEFT JOIN #acc B ON A.mainid = B.subid
WHERE B.mainid IS NULL
Here we are joining the table to itself and returning the left side of the join where the right side have had no matches (i.e. returning rows with mainid values which are never found in the subid column).
Using NOT EXISTS:
SELECT *
FROM #acc A
WHERE NOT EXISTS (
SELECT *
FROM #acc B
WHERE A.mainid = B.subid
)
This query can be interpreted thus: return every row from #acc when there doesn't exist a match between that row's mainid and any other row's subid.
Using NOT IN:
SELECT *
FROM #acc
WHERE mainid NOT IN (
SELECT subid
FROM #acc
)
This seems to me most straightforward (though not necessarily most efficient): return rows where mainid is not in the list of all existing subid values. If you used NULLs instead of 0 as root items' subid values, you'd also have to amend the last query by adding this filter to the subquery:
…
WHERE subid IS NOT NULL
Otherwise it would work incorrectly.
You might also want to read this thread:
What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?

This will do it
select * from
(select A.mainid , A.name from #acc
A left join #acc B
on A.subid = B.mainid ) as m where m.mainid not in (select subid from #acc)

Related

LEFT JOIN with OR clause without UNION

I know this shouldn't happen in a database, but it happened and we have to deal with it. We need to insert new rows into a table if they don't exist based on the values in another table. This is easy enough (just do LEFT JOIN and check for NULL values in 1st table). But...the join isn't very straight forward and we need to search 1st table on 2 conditions with an OR and not AND. So basically if it finds a match on either of the 2 attributes, we consider that the corresponding row in 1st table exists and we don't have to insert a new one. If there are no matches on either of the 2 attributes, then we consider it as a new row. We can use OR condition in the LEFT JOIN statement but from what I understand, it does full table scan and the query takes a very long time to complete even though it yields the right results. We cannot use UNION either because it will not give us what we're looking for.
Just for simplicity purpose consider the scenario below (we need to insert data into tableA).
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
create table #tableA ( email nvarchar(50), id int )
create table #tableB ( email nvarchar(50), id int )
insert into #tableA (email, id) values ('123#abc.com', 1), ('456#abc.com', 2), ('789#abc.com', 3), ('012#abc.com', 4)
insert into #tableB (email, id) values ('234#abc.com', 1), ('456#abc.com', 2), ('567#abc.com', 3), ('012#abc.com', 4), ('345#abc.com', 5)
--THIS QUERY IS CORRECTLY RETURNING 1 RECORD
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
--THIS QUERY IS INCORRECTLY RETURNING 3 RECORDS SINCE THERE ARE ALREADY RECORDS WITH ID's 1 & 3 in tableA though the email addresses of these records don't match
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
union
select B.email, B.id
from #tableB B
left join #tableA A on B.id = A.id
where A.id is null
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
The 1st query works correctly and only returns 1 record, but the table size is just few records and it completes under 1 sec. When the 2 tables have thousands or records, the query may take 10 min to complete. The 2nd query of course returns the records we don't want to insert because we consider them existing. Is there a way to optimize this query so it takes an acceptable time to complete?
You are using an anti join, which is another way of writing the straight-forward NOT EXISTS:
where not exists
(
select null
from #tableA A
where A.email = B.email or B.id = A.id
)
I.e. where not exists a row in table A with the same email or the same id. In other words: where not exists a row with the same email and not exists a row with the same id.
where not exists (select null from #tableA A where A.email = B.email)
and not exists (select null from #tableA A where B.id = A.id)
With the appropriate indexes
on #tableA (id);
on #tableA (email);
this should be very fast.
It's hard to tune something you can't see. Another option to get the data is to:
SELECT B.email
, B.id
FROM #TableB B
EXCEPT
(
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON A.email = B.email
UNION ALL
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON B.id = A.id
)
This way you don't have to use OR, you can use INNER JOIN rather than LEFT JOIN and you can use UNION ALL instead of UNION (though this advantage may well be negated by the EXCEPT). All of which may help your performance. Perhaps the joins can be more efficient when replaced with EXISTS.
You didn't mention how this problem occurred (where the data from both tables is coming from, and why they are out of sync when they shouldn't be), but it would be preferable to fix it at the source.
No the query returns correctly 3 rows
because
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
Allone reurns the 3 rows.
For your "problemm"
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
will che3kc for every row, if it is true to be included
So for example
('123#abc.com', 1) ('234#abc.com', 1)
as the Ids are the same it will be joined
but when you join by the emails the condition is false and so is included in the result set
You can only use the UNION approach, when you are comparing only the emails or the ids, but with both the queries are not equivalent

max function does not when having case when clause

i have two tables.
one is as below
table a
ID, count
1, 123
2, 123
3, 123
table b
ID, count
table b is empty
when using
SELECT CASE
WHEN isnotnull(max(b.count)) THEN max(a.count) + max(b.count)
ELSE max(a.count)
FROM a, b
the only result is always NULL
i am very confused. why?
You don't need to use a JOIN, a simple SUM of two sub-queries will give you your desired result. Since you only add MAX(b.count) when it is non-NULL, we can just add it all the time but COALESCE it to 0 when it is NULL.
SELECT COALESCE((SELECT MAX(count) FROM b), 0) + (SELECT MAX(count) FROM a)
Another way to make this work is to UNION the count values from each table:
SELECT COALESCE(MAX(bcount), 0) + MAX(acount)
FROM (SELECT count AS acount, NULL AS bcount FROM a
UNION
SELECT NULL AS acount, count AS bcount FROM b) u
Note that if you use a JOIN it must be a FULL JOIN. If you use a LEFT JOIN you risk not seeing all the values from table b. For example, consider the case where table b has one entry: ID=4, count=456. A LEFT JOIN on ID will not include this value in the result table (since table a only has ID values of 1,2 and 3) so you will get the wrong result:
CREATE TABLE a (ID INT, count INT);
INSERT INTO a VALUES (1, 123), (2, 123), (3, 123);
CREATE TABLE b (ID INT, count INT);
INSERT INTO b VALUES (4, 456);
SELECT COALESCE(MAX(b.count), 0) + MAX(a.count)
FROM a
LEFT JOIN b ON a.ID = b.ID
Output
123 (should be 579)
To use a FULL JOIN you would write
SELECT COALESCE(MAX(b.count), 0) + MAX(a.count)
FROM a
FULL JOIN b ON a.ID = b.ID
Since, tableb is empty, max(b.count) will return NULL. And any operation done with NULL, results in NULL.
So, max(a.count) + max(b.count) is NULL.(this is 123 + NULL which will be NULL always). Hence, your query is returning NULL.
Just use a coalesce to assign a default value whenever NULL comes.
use coalesce() function and explicit join, avoid coma separated table name type old join method
select coalesce(max(a.count)+max(b.count),max(a.count))
from a left join b on a.id=b.id
Use left join
SELECT coalesce(max(a.count) + max(b.count),max(a.count))
FROM a left join b a.id=b.id

Is a scalar database function used in a join called once per distinct set of inputs or once per row?

If I have a sql statement like this:
select *
from tableA a
inner join tableB b on dbo.fn_something(a.ColX) = b.ColY
if you assume there are 5 rows in tableA with the same value for ColX will dbo.fn_something() be called with that value 5 times or just one time?
Clearly this is a trivial example, but I'm interested for the purposes of thinking about performance in a more complex scenario.
UPDATE
Thanks #DStanley, following from your answer I investigated further. Using SQL Profiler with the SP:StmtStarting event on the SQL below illustrates what happens. i.e. as you said: the function will be called once for each row in the join.
This has an extra join from the original question.
create table tableA
( id int )
create table tableB
( id_a int not null
, id_c int not null
)
create table tableC
( id int )
go
create function dbo.fn_something( #id int )
returns int
as
begin
return #id
end
go
-- add test data
-- 5 rows:
insert into tableA (id) values (1), (2), (3), (4), (5)
-- 5 rows:
insert into tableC (id) values (101), (102), (103), (104), (105)
-- 25 rows:
insert into tableB (id_a, id_c) select a.id, c.id from tableA a, tableC c
go
-- here dbo.fn_something() is called 25 times:
select *
from tableA a
inner join tableB b on a.id = b.id_a
inner join tableC c on c.id = dbo.fn_something(b.id_c)
-- here dbo.fn_something() is called just 5 times,
-- as the 'b.id_c < 102' happens to be applied first.
-- That's likely to depend on whether SQL thinks it's
-- faster to evaluate the '<' or the function.
select *
from tableA a
inner join tableB b on a.id = b.id_a
inner join tableC c on c.id = dbo.fn_something(b.id_c) and b.id_c < 102
go
drop table tableA ;
drop table tableB;
drop table tableC;
drop function dbo.fn_something;
go
It will be called for each row in a. I do not know of any optimization that would call the function just for unique inputs. If performance is an issue you could create a temp table with distinct input values and use thoce results in your join, but I would only do that it it was an issue - don't assume it's a problem and clutter your query unnecessarily.
If you declare your function as schema bound, it can be run one for each unique case. This requires that the function be deterministic and always has the same output for a given input.
CREATE FUNCTION dbo.fn_something (#id INT)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN #id
END
GO

Not exist in anyone

I want to make a query where I select all the IDs of table A, which ids will connect to existing values of end_date in table B.
I need to get the IDs of table A which will connect only to finished IDs(i.e. with existing end_date) on B table.
Relation of table a and b is one to many . A can correlate to many Bs . B will always correlate to one A table.
I have made something like this:
select id
from A
where not exists
(select 1
from B
where end_date is null
and A.id=B.id)
Is this correct? Or is there a faster query for the same thing?
EDIT: end_date is in table B
example :
In the data set:
A.id=1
B.id=1
B.bid=333
B.end_date=null
A.id=1
B.id=1
B.bid=334
B.end_date=05/05/2014
A.id=2
B.id=2
B.bid=335
B.end_date=null
A.id=2
B.id=2
B.bid=336
B.end_date=null
A.id=3
B.id=3
B.bid=337
B.end_date=04/04/2014
A.id=3
B.id=3
B.bid=338
B.end_date=04/04/2014`
My query should result only id=3.
Assuming your table structure is
A(id)
B(id, end_date)
Then to select all A.id where there is no b.end_date (or it is null) you can use this query
Select id
From A
Where id Not In (Select id From B Where end_date is Not Null)
You don't specify your DBMS, but in later versions of SQL Server, this might be faster. You will have to test based on your data:
SELECT DISTINCT A.ID
FROM A
INNER JOIN B ON A.ID = b.ID
WHERE b.End_date IS NOT NULL
EXCEPT
SELECT B.ID
WHERE b.End_date IS NULL
EXCEPT is a set operator that returns all entries in the first set that don't exist in the second set. Doing the query this way gives you two SARGable WHERE clauses rather than one nonSARGable subquery, so it could end up faster depending on your data topography and your physical indexes.
You can probably use a LEFT JOIN like
select A.id
from A a
left join B b
on a.id = b.id
and b.end_date is not null
where b.id is null

SQL Insert into table A from table B based off table C

I have an empty table that I would like to fill with rows from a second table, based off a third table, Ill call them A,B,C respectively.
Table C has ID numbers that match ID numbers for rows in Table B. For every ID in table C, I want to add the corresponding row from table B into Table A.
This is what I have, and I am getting an error saying that I cannot use the last statement.
INSERT INTO TABLEA
SELECT * FROM TABLEB
WHERE ID FROM TABLEB = ID FROM TABLEC;
DSNT408I SQLCODE = -199, ERROR: ILLEGAL USE OF KEYWORD FROM. TOKEN ( . AT
MICROSECONDS MICROSECOND SECONDS SECOND MINUTES MINUTE WAS EXPECTED
DSNT418I SQLSTATE = 42601 SQLSTATE RETURN CODE
Any help would be appreciated.
INSERT INTO TableA
SELECT B.*
FROM TableB AS B
JOIN TableC AS C ON B.ID = C.ID
Or possibly that will give you too many duplicates (if there are multiple rows in C that match a given row in B), in which case you might need:
INSERT INTO TableA
SELECT B.*
FROM TableB AS B
WHERE B.ID IN (SELECT C.ID FROM TableC AS C)
Or:
INSERT INTO TableA
SELECT DISTINCT B.*
FROM TableB AS B
JOIN TableC AS C ON B.ID = C.ID
Both of those give you one row in A for each row in B that matches one or more rows in C.
How would I add a WHEN clause to this? Let's say Table C has another column called VALUE, and I want to add all the ID numbers that have a value of 'x' or greater. How would I do that, I tried adding JOIN TableC AS C ON B.ID = C.ID AND C.VALUE > 5 but I still got all the values from TABLE C.
Working with the first query (fixing the others being left as an 'exercise for the reader'), then what I think you should be doing is just:
INSERT INTO TableA
SELECT B.*
FROM TableB AS B
JOIN TableC AS C ON B.ID = C.ID
WHERE C.Value > 5
The optimizer should translate that to an equivalent expression:
INSERT INTO TableA
SELECT B.*
FROM TableB AS B
JOIN TableC AS C ON B.ID = C.ID AND C.Value > 5
I'm not clear from your comment whether you somehow added a second reference to TableC in the one query, or you modified your query as shown in this second example. If you were not using LEFT JOIN anywhere, then adding the AND C.Value > 5 term to the ON clause or as a WHERE clause should have yielded the correct data.
When debugging this sort of problem, it is worth noting that this INSERT statement has a perfectly good SELECT statement in it that you can run on its own to review what is going to be added to TableA. You might want to augment the select-list to include (at least) C.ID and C.Value just to make sure nothing is going haywire.