difference between joining with WHERE AND JOIN in SELECT clause - sql

I have two queries:
select id,(SELECT N.NAME
FROM memuat.NETWORK N
join memuat.host H
on N.ID = H.NETWORK ) as network_name
from memuat.host;
select id,(SELECT N.NAME
FROM memuat.NETWORK N
where N.ID = H.NETWORK ) as network_name
from memuat.host H;
The first returns:
ORA-01427: single-row subquery returns more than one row
01427. 00000 - "single-row subquery returns more than one row"
the second runs fine and returns data. It is clear to me that the first returns more than one row , because I can run the sub-query alone and check that. What it's not clear to me is how the second query can return only one row ? My result is not a single row, it's multiple rows.

In both queries, the subquery gets executed once per host row.
In your second query the subquery is:
select n.name from memuat.network n where n.id = h.network
This gets the network name for the host's network ID.
In your first query the subquery is:
SELECT n.name
FROM memuat.network n
JOIN memuat.host h ON n.id = h.network
This gets list of network names. Probably with many duplicates, because we get one row per host.
The main query in both cases is:
SELECT id, ( <subquery> ) AS network_name
FROM memuat.host h;
Here the subquery is supposed to return one value, namely the host's network name. As the subquery in your second query returns one row with one value, this works. As the subquery in your first query returns many rows, you get an error.

Related

SQL - Difference between FROM(subquery) and WHERE - IN(subquery)

I would like to ask to diference between following two SQL statements.
The first one is working correctly, but the second one not. When I "create a new table" from subquery then result is correct, but if I use the same subquery in WHERE-IN statement then I get a different result.
SELECT `T`.`city`, COUNT(*)
FROM (
SELECT `address`.`city`
FROM `address`
INNER JOIN `person` ON `person`.`address_id`=`address`.`address_id`
INNER JOIN `person_detail` ON `person_detail`.`person_detail_id`=`person`.`person_detail_id`
WHERE (`person_detail`.`phone` LIKE '%+42056%') OR (`person_detail`.`phone` LIKE '%+42057%')
) AS T
GROUP BY `T`.`city`
ORDER BY `COUNT(*)` ASC
///////////////////////////////////
SELECT `address`.`city`, COUNT(*)
FROM `address`
WHERE `address`.`city` IN (
SELECT `address`.`city`
FROM `address`
INNER JOIN `person` ON `person`.`address_id`=`address`.`address_id`
INNER JOIN `person_detail` ON `person_detail`.`person_detail_id`=`person`.`person_detail_id`
WHERE (`person_detail`.`phone` LIKE '%+42056%') OR (`person_detail`.`phone` LIKE '%+42057%')
)
GROUP BY `address`.`city`
ORDER BY `COUNT(*)`;
The first query will run the subquery first which returns a distinct list of 'city'. You then do another group by on it with a count which should lead to a result set of 'city' with all ones next to it. In essence you are running your query off of the subquery (not the address table itself).
Your second query will run the subquery first, return the distinct list of 'city' then using that list, go back to the original table and return everything that matches (which should be the entire table of address) and then group by it and return a count. This will lead to a different result since you are hitting the original table vs hitting the subquery result.

How do I run a joined subquery once together for all rows?

I need to do something like this:
SELECT T.*, X.Val
FROM SomeTable T
LEFT OUTER JOIN (SELECT TOP 1 A.[Value] AS Val FROM AnotherTable A ORDER BY A.Id DESC) X ON X.Val = X.Val
The goal is to get a single value from a subquery and join/apply it to all rows of my record set. The value from the subquery is the same for all rows and is independent on them. So it would be efficient to run the subquery only once, and then only use the retrieved value for all rows. But based on the query running time, it seems that the subquery is running for each row again which is slow. The same was for other ways I tried like outer apply etc.
In fact, my subquery looks like this:
(SELECT dbo.MyScalarFunction() AS Val) X
But I hope it shouldn't matter. The scalar function itself is running some 1 sec., so it must run only once in the whole query, I can't wait thousand seconds for thousand rows in the record set.
Is there a way to enforce running the subquery only once before joining it?
I am inside of a view, so I can't use a declared variable to store the subquery value.
Why are you phrasing this as coming from another table? Based on your later question, why not just use CROSS JOIN:
SELECT T.*, X.Val
FROM SomeTable T CROSS JOIN
(SELECT dbo.MyScalarFunction() AS Val) X
I cannot see how the function would be called more than once, unless it is a recursive function.

Exists sub-query with a HAVING clause

I'm trying to understand how EXISTS work.
The following query is based on this answer, and it queries for all SalesOrderIDs that have more than 1 record in the table, where at lease one of those records has OrderQty > 1 and ProductID = 777:
USE AdventureWorks2012;
GO
SELECT SalesOrderID, OrderQty, ProductID
FROM Sales.SalesOrderDetail s
WHERE EXISTS
( SELECT 1
FROM Sales.SalesOrderDetail s2
WHERE s.SalesOrderID = s2.SalesOrderID
GROUP BY SalesOrderID
HAVING COUNT(*) > 1
AND COUNT(CASE WHEN OrderQty > 1 AND ProductID = 777 THEN 1 END) >= 1
);
What I don't understand is this: The sub-query returns a single-columned table filled with the value 1 on each row. So the way I understand it, the WHERE in the outer query has no real condition to apply, just a bunch of 1s. Why\How, then, the outer query returns only part of the Sales.SalesOrderDetail, and not its entirety?
What happens in EXISTS is that, it only checks if the record from the outer table satisfies the conditions given in the inner query. That's why we specify "1" unlike IN where we need to specify the individual columns (and data is checked for each and every record).
So, it does not return any bunch of 1's and validates it. As the name implies, it checks only for the existence of the record as per the given condition.
Hope this clarifies.
Note : Always use table alias names for the columns to prevent ambiguity.
the inner SELECT 1 ... will not always return 1.
When inner WHERE/HAVING condition is not met you will not get 1 returned. Instead there will be nothing, I mean the SQL Server Management Studio (if I recall correctly) will display NO result at all, not even NULL for the inner SELECT 1 thus failing the whole outer WHERE for that particular row.
Therefore part of your outer query result set will be cut off and the total number of rows returned with EXITS(...) will be less then if EXISTS(...) was not present.

Join within an "exists" subquery

I am wondering why when you join two tables on a key within an exists subquery the join has to happen within the WHERE clause instead of the FROM clause.
This is my example:
Join within FROM clause:
SELECT payer_id
FROM Population1
WHERE NOT EXISTS
(Select *
From Population2 join Population1
On Population2.payer_id = Population1.payer_id)
Join within WHERE clause:
SELECT payer_id
FROM Population1
WHERE NOT EXISTS
(Select *
From Population2
WHERE Population2.payer_id = Population1.payer_id)
The first query gives me 0 results, which I know is incorrect, while the second query gives the the thousands of results I am expecting to see.
Could someone just explain to me why where the join happens in an EXISTS subquery matters? If you take the subqueries without the parent query and run them they literally give you the same result.
It would help me a lot to remember to not continue to make this mistake when using exists.
Thanks in advance.
You need to understand the distinction between a regular subquery and a correlated subquery.
Using your examples, this should be easy. The first where clause is:
where not exists (Select 1
from Population2 join
Population1
on Population2.payer_id = Population1.payer_id
)
This condition does exactly what it says it is doing. The subquery has no connection to the outer query. So, the not exists will either filter out all rows or keep all rows.
In this case, the engine runs the subquery and determines that at least one row is returned. Hence, not exists returns false in all cases, and the nothing is returned.
In the second case, the subquery is a correlated subquery. So, for each row in population1 the subquery is run using the value of Population1.payer_id. In some cases, matching rows exist in Population2; these are filtered out. In other cases, matching rows do not exist; these are in the result set.
The first example is not actually reffering to the base table which creates a logic that is unpredictable.
Another way to do the same logic would be:
SELECT payer_id
FROM Population1 P1
LEFT JOIN Population2 P2 ON
P2.Payer_Id = P1.Payer_Id
WHERE
P2.Payer_Id IS NULL
You qry return ROW EXISTS always if exist even if there is one result row.
Select *
from Population2
join Population1 on Population2.payer_id = Population1.payer_id
If exist at least one row from this join (and for sure there exists), you can imagine your subqry looks like:
select 'ROW EXISTS'
And result of:
select *
from Population1
where not exists (select 'ROW EXISTS')
So your anti-semijoin return:
payer_id 1 --> some ROW EXISTS -> dont't return this row
payer_id 2 --> some ROW EXISTS -> dont't return this row

How does EXISTS return things other than all rows or no rows?

I am a beginning SQL programmer - I am getting most things, but not EXISTS.
It looks to me, and looks by the documentation, that an entire EXISTS statement returns a boolean value.
However, I see specific examples where it can be used and returns part of a table as opposed to all or none of it.
SELECT DISTINCT PNAME
FROM P
WHERE EXISTS
(
SELECT *
FROM SP Join S ON SP.SNO = S.SNO
WHERE SP.PNO = P.PNO
AND S.STATUS > 25
)
This query returns to me one value, the one that meets the criteria (S.Status > 25).
However, with other queries, it seems to return the whole table I am selecting from if even one of the rows in the EXISTS subquery is true.
How does one control this?
Subqueries such as with EXISTS can either be correlated or non-correlated.
In your example you use a correlated subquery, which is usually the case with EXISTS. You look up records in SP for a given P.PNO, i.e. you do the lookup for each P record.
Without SP.PNO = P.PNO you would have a non-correlated subquery. I.e. the subquery no longer depends on the P record. It would return the same result for any P record (either a Status > 25 exists at all or not). Most often when this happens this is done by mistake (one forgot to relate the subquery to the record in question), but sometimes it is desired so.
You have actually created a Correlated subquery. Exists predicate accepts a subquery as input and returns TRUE if the subquery returns any rows and FALSE otherwise.
The outer query against table P doesn't have any filters, so all the rows from this table will be considered for which the EXISTS predicate returns TRUE.
SELECT DISTINCT PNAME -- Outer Query
FROM P
Now, the EXISTS predicate returns TRUE if the current row in table P has related rows in SP Join S ON SP.SNO = S.SNO where S.STATUS > 25
SELECT *
FROM SP Join S ON SP.SNO = S.SNO
WHERE SP.PNO = P.PNO -- Inner query
AND S.STATUS > 25
One of the benefits of using the EXISTS predicate is that it allows you to intuitively phrase English like queries. For example, this query can be read just as you would say it in ordinary English: select all unique PNAME from table P where at least one row exists in which PNO equals PNO in table SP and Status in table S > 25, provided table SP and S are joined based on SNO.
Which SQL language are you using?
Either EXISTS return allways true or false or it allways returning rows, but in WHERE EXISTS... it will check returned rows > 0 (=>true).
Oracle, MySQL, PostreSQL:
The EXISTS condition is used in combination with a subquery and is considered "to be met" if the subquery returns at least one row.
(http://www.techonthenet.com)
your condition in where clause for main query
SELECT DISTINCT PNAME FROM P
is dependent to Exist ,
if your subquery returns any rows ,
then exists returns true ,otherwise it returns false
and the main query where clause return all of records in p if Exists return true and nothing if it returns false