Understanding the semantics of subquery in FROM clause in PostgreSQL - sql

I have an rttmm table with conversation_id and duration fields. There's a query using two sub-queries in a FROM-clause, one of them is not used. I would expect it to be semantically equivalent to the one where you would remove the unused subquery, but it behaves very differently. Here's the query in question:
select
sum(subq2.dur) as res
from (
select sum(rttmm.duration) as dur, rttmm.conversation_id as conv_id
from rttmm
group by rttmm.conversation_id
) as subq1,
(
select sum(rttmm.duration) as dur, rttmm.conversation_id as conv_id
from rttmm
group by rttmm.conversation_id
) as subq2
and here's what I would expect it to be equivalent to (just removing the subq1):
select
sum(subq2.dur) as res
from
(
select sum(rttmm.duration) as dur, rttmm.conversation_id as conv_id
from rttmm
group by rttmm.conversation_id
) as subq2
Turns out it's not the same at all. What is the proper understanding of the first query here?

The first query uses the ancient SQL-89 join syntax and cross-joins two subqueries, whereas the second query does a simple select from the first subquery.
In simple words, the difference is:
select * from table1, table2 vs select * from table1
which is equivalent for
select * from table1 cross join table2 vs select * from table1

Related

Spark SQL SELECT from UNION returns erroneous count

I encounter a weird bug with Spark SQL. Considering the following query
select cast(file_modification_time as date) mod_on, count(*)
from ((select id, _metadata.file_modification_time from table_a
where _metadata.file_modification_time >= '2022-10-01')
union
(select id, _metadata.file_modification_time from table_b
where _metadata.file_modification_time >= '2022-10-01')
)
group by mod_on
order by mod_on;
table_a and table_b have identical schema. The statement above returns the expect result. However, if I omit id from the sub-query, Spark (3.3.0) returns erroneous counts -- way less than expected. What is happening here?

How to use to functions - MAX(smthng) and after COUNT(MAX(smthng)

I don't understand why I can't use this in my code :
SELECT MAX(SMTHNG), COUNT(MAX(SMTHNG))
FROM SomeTable;
Searched for an answer but didn't find it in documentation about these aggregate functions.
Also I get an SQL-compiler error "Invalid column name "SMTHNG"".
You want to know what the maximum SMTHNG in the table is with:
SELECT MAX(SMTHNG) FROM SomeTable;
This is an aggregation without GROUP BY and hence results in one single row containing the maximum SMTHNG.
Now you also want to know how often this SMTHNG occurs and you add COUNT(MAX(SMTHNG)). This, however, does not work, because you can not aggregate an aggregate directly.
This doesn't work either:
SELECT ANY_VALUE(max_smthng), COUNT(*)
FROM (SELECT MAX(smthng) AS max_smthng FROM sometable) t;
because the sub query only contains one row, so it's too late to count.
So, either use a sub query and select from the table again:
SELECT ANY_VALUE(smthng), COUNT(*)
FROM sometable
WHERE smthng = (SELECT MAX(smthng) FROM sometable);
Or count per SMTHNG before looking for the maximum. Here is how to get the counts:
SELECT smthng, COUNT(*)
FROM sometable
GROUP BY smthng;
And the easiest way to get the maximum from this result is:
SELECT TOP(1) smthng, COUNT(*)
FROM sometable
GROUP BY smthng
ORDER BY COUNT(*) DESC;
First of all, please read my comment.
Depending on what you're trying to achieve, the statement have to be changed.
If you want to count the highest values in SMTHNG field, you may try this:
SELECT T1.SMTHNG, COUNT(T1.SMTHNG)
FROM SomeTable T1 INNER JOIN
(
SELECT MAX(SMTHNG) AS A
FROM SomeTable
) T2 ON T1.SMTHNG = T2.A
GROUP BY T1.SMTHNG;
use cte like below or subquery
with cte as
(
select count(*) as cnt ,col from table_name
group by col
) select max(cnt) from cte
you can not use double aggregate function at a time on same column

Explain how this SELECT WHERE subquery works?

Here's the query:
SELECT ID, Name, EventTime, State
FROM mytable as mm Where EventTime IN
(Select MAX(EventTime) from mytable mt where mt.id=mm.id)
Here is the fiddle:
http://sqlfiddle.com/#!3/9630c0/5
It comes from this S.O. question:
Select distinct rows whilst grouping by max value
I would like to hear in plain english how it works. I'm missing some fundamental understanding of part of it.
I don't really understand what the aliases are doing in the mt.id=mm.id part. It selects rows where the id is equal to the id?
The mt.id=mm.id part makes it a correlated subquery, hence the subquery is re-evaluated for each ID.
The query, then, selects the most recent event for each ID.
It is basically translated into "Get me the data for each id with maximum EventTime associated with."
You can also rewrite the code as
SELECT t1.ID, t1.Name, t1.EventTime, t1.State FROM mytable as t1
inner join
(
select id,max(EventTime) as EventTime from mytable group by id
) as t2 on t1.id=t2.id and t1.EventTime=t2.EventTime

How to join two equivalent tables which are the result of the previous recursive select in SQL Server

Good day everyone! Firstly, I'm sorry for my poor english. Well, I've got a question that you can read in the title of this message.
SQL Server returns this message(Error 253) when I'm trying to select necessary data.
Translate "Recursive element from CTE (which name is 'recurse' - my
note) has multiple reference in CTE.
How can I solve this problem?
Can you advice me how to join two tables (with 2 columns(for example : a and b) which are the result of previous recursive select (I'm writing about the same select, but about another iteration of if)
with recurse (who_acts,on_whom_influence)
as
(
-------------------------------------------FIRST SELECT
select distinct interface_1.robot_name as who_acts,interface_2.robot_name as on_whom_influence
from INTERFACE as interface_1,INTERFACE as interface_2
where (interface_1.number in ( select INPUT_INTERFACE.source
from INPUT_INTERFACE
)
and interface_2.number in (
select INPUT_INTERFACE.number
from INPUT_INTERFACE
where (INPUT_INTERFACE.source=interface_1.number )
)
)
-------------------------------------------RECURSIVE PART
union all
select rec1.who_acts,rec1.on_whom_influence
from recurse as rec1
inner join
(select rec2.who_acts,rec2.on_whom_influence
from recurse as rec2) as P on (1=1)
)
select * from recurse
The problem is in recurse CTE.The connecting condition is not simple, but it have no
influence on this problem.
Can you type some working code in comments
Here's a dummy table
create table tbl1 ( a int, b int );
insert tbl1 select 1,2;
insert tbl1 select 11,12;
insert tbl1 select 2,3;
insert tbl1 select 4,5;
And a similar query to yours
with cte as (
select a,b from tbl1
union all
select x.a,x.b from cte x join cte y on x.a=y.a+1
)
select * from cte;
The error:
Recursive member of a common table expression 'cte' has multiple recursive references.: with cte as ( select a,b from tbl1 union all select x.a,x.b from cte x join cte y on x.a=y.a+1 ) select * from cte
Basically, the error is exactly what it says. You cannot have a recursive CTE appear more than ONCE in a recursive section. Above, you see CTE aliased as both x and y. There are various reasons for this limitation, such as the fact that CTEs are recursed depth-first and not generation-by-generation.
What you should think about is why you would need it more than once. Your recursive portion doesn't make sense.
select rec1.who_acts,rec1.on_whom_influence
from recurse as rec1
inner join
( select rec2.who_acts,rec2.on_whom_influence
from recurse as rec2) as P on (1=1)
On the surface, the following are true if recurse were a real table (non-CTE):
The number of rows generated is count(recurse as [rec1]) x count(recurse as [rec2]).
The rows in recurse (rec1) are each replicated per row in recurse, hence #1
Columns from rec2 are never used. rec2 serves only to multiply
If this were permitted to run, the recursive portion of the query would keep quadratically increasing its number of rows and never finish.

Compare SQL groups against eachother

How can one filter a grouped resultset for only those groups that meet some criterion compared against the other groups? For example, only those groups that have the maximum number of constituent records?
I had thought that a subquery as follows should do the trick:
SELECT * FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t HAVING Records = MAX(Records);
However the addition of the final HAVING clause results in an empty recordset... what's going on?
In MySQL (Which I assume you are using since you have posted SELECT *, COUNT(*) FROM T GROUP BY X Which would fail in all RDBMS that I know of). You can use:
SELECT T.*
FROM T
INNER JOIN
( SELECT X, COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
) T2
ON T2.X = T.X
This has been tested in MySQL and removes the implicit grouping/aggregation.
If you can use windowed functions and one of TOP/LIMIT with Ties or Common Table expressions it becomes even shorter:
Windowed function + CTE: (MS SQL-Server & PostgreSQL Tested)
WITH CTE AS
( SELECT *, COUNT(*) OVER(PARTITION BY X) AS Records
FROM T
)
SELECT *
FROM CTE
WHERE Records = (SELECT MAX(Records) FROM CTE)
Windowed Function with TOP (MS SQL-Server Tested)
SELECT TOP 1 WITH TIES *
FROM ( SELECT *, COUNT(*) OVER(PARTITION BY X) [Records]
FROM T
)
ORDER BY Records DESC
Lastly, I have never used oracle so apolgies for not adding a solution that works on oracle...
EDIT
My Solution for MySQL did not take into account ties, and my suggestion for a solution to this kind of steps on the toes of what you have said you want to avoid (duplicate subqueries) so I am not sure I can help after all, however just in case it is preferable here is a version that will work as required on your fiddle:
SELECT T.*
FROM T
INNER JOIN
( SELECT X
FROM T
GROUP BY X
HAVING COUNT(*) =
( SELECT COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
)
) T2
ON T2.X = T.X
For the exact question you give, one way to look at it is that you want the group of records where there is no other group that has more records. So if you say
SELECT taxid, COUNT(*) as howMany
GROUP by taxid
You get all counties and their counts
Then you can treat that expressions as a table by making it a subquery, and give it an alias. Below I assign two "copies" of the query the names X and Y and ask for taxids that don't have any more in one table. If there are two with the same number I'd get two or more. Different databases have proprietary syntax, notably TOP and LIMIT, that make this kind of query simpler, easier to understand.
SELECT taxid FROM
(select taxid, count(*) as HowMany from flats
GROUP by taxid) as X
WHERE NOT EXISTS
(
SELECT * from
(
SELECT taxid, count(*) as HowMany FROM
flats
GROUP by taxid
) AS Y
WHERE Y.howmany > X.howmany
)
Try this:
SELECT * FROM (
SELECT *, MAX(Records) as max_records FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t
) WHERE Records = max_records
I'm sorry that I can't test the validity of this query right now.