Using having count() in exists clause - sql

I am trying to make a SQL query where the subquery in an 'exists' clause has a 'having' clause. The strange thing is that. There is no error and the subquery works as a stand-alone query. However, the whole query gives exactly the same results with the 'having' clause as without.
This is kind of what my query looks like:
SELECT X
FROM A
WHERE exists (
SELECT X, count(distinct Y)
FROM B
GROUP BY X
HAVING count(distinct Y) > 2)
So I'm trying to select the rows from A where X has more then two occurances of Y in B.
However, the results also include records that do not exist in the subquery. What am I doing wrong here?

You don't correlate the two queries:
SELECT X
FROM A
WHERE (
SELECT COUNT(DISTINCT y)
FROM b
WHERE b.x = a.x
) > 2

Your query says something like this:
select X from A IF THERE ARE records having more than one occurence if grouped by Y in B.
If your 'exists subquery' returns even one record from table B the condition is true and you will get all the rows from A.
Try:
select X
from A
where exists (select 1
from B
where B.x = A.x
group by b.x
having count(distinct b.y) > 2
)

I had a similar situation and solved by a JOIN since the other answers didn't work for me. I tried to correlate to your generic example. Hope it is helpful to someone else!
SELECT X
FROM A
JOIN (SELECT X, COUNT(DISTINCT y)
FROM B
GROUP BY X
HAVING count(distinct Y) > 2) C
ON A.X = C.X

Related

If there are common attributes in the table in the FROM clause of inner and outer query?

Schema for table A: A(x,y,z)
Schema for table B: B(u,x,v)
[Primary keys mentioned in bold]
For the SQL query as mentioned:-
SELECT x
FROM A
WHERE x in ( SELECT x
FROM B
WHERE x<10)
How does the inner query resolve that this x mentioned is from the table B and not the table A?
x is resolved from the innermost query out. It is always better to qualify column names, so write this query as:
SELECT A.x
FROM A
WHERE A.x IN (SELECT B.x
FROM B
WHERE B.x < 10
);
This has the advantage that if B.x does not exist, you will get an error. Otherwise, the IN (SELECT x . . . will refer to A.x (but only when B.x does not exist).

Integrate two sql queries

I have these two queries:
SELECT DISTINCT a, b, c, d, FROM x WHERE b IN (1, 2)
SELECT DISTINCT c, d, FROM y
I would now like to merge these queries such that the statement initiated in the first query only includes rows where the c, d combination is in the output resulting from the second query. Any thoughts on how to do this? My table is large, so efficiency is important.
Use exists?
SELECT DISTINCT a, b, c, d
FROM x
WHERE b IN (1, 2) AND
EXISTS (SELECT 1 FROM y WHERE x.c = y.c and x.d = y.d);
When using exists, the select distinct is only necessary if x has duplicate values. Otherwise it is not necessary.
And, for performance, you want an index on y(c, d). Also, an index on x(b, a, c, d) would also be helpful in most databases.
Note: The distinct is not necessary in the subquery. In some databases, you can use in with composite values as well.
SELECT DISTINCT x.a,x.b,x.c,x.d
FROM x
INNER JOIN y ON x.c = y.c
AND x.d = y.d
WHERE b in (1,2)
Regarding efficiency, your indexing will determine how well that performs.

reuse result of subquery in IN operator

I found posts about reusing results from subquery, but none of them mention about IN operator
My query looks like this:
select count(*)
from X
where ...
and X.id NOT IN (select id from Y)
and X.id IN (select id from Z
where ...
and Z.id IN (select id from Y)
)
As you see, the subquery select id from Y is repeated
How can I reuse the result of the subquery in the IN operator?

How do I select a row from one table where the value row does not exist in another table?

Let's say I have two identical tables, A and B, with the row "x".
I want to select all elements in A, where the value of x in A is not in any value of x of B.
How do I do that?
You could also do something like this:
SELECT * FROM TableA
LEFT JOIN TableB on TableA.X = TableB.X
WHERE TableB.X IS NULL
(For the very straightforward example in your question, a NOT EXISTS / NOT IN approach is probably preferable, but is your real query is more complex, this is an option you might want to consider; if, for instace, you want som information from TableB where there is a match, but also want to know where there isn't one)
I'm having some trouble to understand what you need.
Anyway try this:
SELECT * FROM tableA
WHERE x not IN (SELECT x FROM tableB)
select *
from TableA
except
select *
from TableB
The fastest is the Left Join
SELECT * FROM A LEFT JOIN B ON A.X = B.X WHERE B.X IS NULL
use it :
select * from a where x not in (select x from b)

can this be written with an outer join

The requirement is to copy rows from Table B into Table A. Only rows with an id that doesn't already exist, need to be copied over:
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM B b
WHERE b.id IS NOT IN (SELECT id FROM A WHERE x='t');
^^^^^^^^^^^
Now, I was trying to write this with an outer join to compare the explain paths, but I can't write this (efficiently at least).
Note that the sql highlighted with ^'s make this tricky.
try
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM TableB b
Left Join TableA a
On a.Id = b.Id
And a.x = 't'
Where a.Id Is Null
But I prefer the subquery representation as I think it more clearly expresses what you are doing.
Why are you not happy with what you have? If you check your explain plan, I promise you it says that an anti-join is performed, if the optimizer thinks that is the most efficient way (which it most likely will).
For everyone who reads this: SQL is not what actually is executed. SQL is a way of telling the database what you want, not what to do. All decent databases will be able to treat NOT EXISTS and NOT IN as equal (when they are, ie. there are no null values) and perform an anti-join. The trick with an outer join and an IS NULL condition doesn't work on SQL Server, though (SQL Server is not clever enough to transform it to an antijoin).
Your query will perform better than the query with outer join.
I guess the following query will do the job:
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM B b
LEFT JOIN A a
ON b.id = a.id AND NOT a.x='t'
INSERT INTO A (id, x, y)
SELECT
B.id, B.x, B.y
FROM
B
WHERE
NOT EXISTS (SELECT * FROM A WHERE B.id = A.id AND A.x = 't')