Why sometimes a subquery can work like using 'group by'

Why sometimes a subquery can work like using 'group by' - sql

I'm new to sql and can't understand why sometimes a subquery can work like using 'group by'.
Say, there are two tables in a data base.
'food' is a table crated by:
CREATE TABLE foods (
id integer PRIMARY KEY,
type_id integer,
name text
);
'foods_episodes' is a table created by:
CREATE TABLE foods_episodes (
food_id integer,
episode_id integer
);
Now I'm using the following two sqls and generating the same result.
SELECT name, (SELECT count(*) FROM foods_episodes WHERE food_id=f.id) AS frequency
FROM foods AS f
ORDER BY name;
SELECT name, count(*) AS frequency
FROM foods_episodes,
foods AS f
WHERE food_id=f.id
GROUP BY name;
So why the subquery in the first sql works like it group the result by name?
When I run the subquery alone:
SELECT count(*)
FROM foods_episodes,
foods f
WHERE food_id=f.id
the result is just one row. Why using this sql as a subquery can generate multi-rows result?

The first query isn't actually grouping by name. If you have more than 1 record with the same name (different ID), you will see it being displayed twice (hence, not grouped by).
The first query uses what is called a correlated subquery, it calculates the subquery (the inner SELECT) once for each row of the outmost select. Because the FROM in this outmost SELECT is just from the table foods, you will get one record for each food + the results of the subquery, thus no need to group.

Related

Return all data when grouping on a field

I have the following 2 tables (there are more fields in the real tables):
create table publisher(id serial not null primary key,
name text not null);
create table product(id serial not null primary key,
name text not null,
publisherRef int not null references publisher(id));
Sample data:
insert into publisher (id,name) values (1,'pub1'),(2,'pub2'),(3,'pub3');
insert into product (name,publisherRef) values('p1',1),('p2',2),('p3',2),('p4',2),('p5',3),('p6',3);
And I would like the query to return:
name, numProducts
pub2, 3
pub3, 2
pub1, 1
A product is published by a publisher. Now I need a list of name, id of all publishers which have at least one product, ordered by the total number of products each publisher has.
I can get the id of the publishers ordered by number of products with:
select publisherRef AS id, count(*)
from product
order by count(*) desc;
But I also need the name of each publisher in the result. I thought I could use a subquery like:
select *
from publisher
where id in (
select publisherRef
from product
order by count(*) desc)
But the order of rows in the subquery is lost in the outer SELECT.
Is there any way to do this with a single sql query?

SELECT pub.name, pro.num_products
FROM (
SELECT publisherref AS id, count(*) AS num_products
FROM product
GROUP BY 1
) pro
JOIN publisher pub USING (id)
ORDER BY 2 DESC;
db<>fiddle here
Or (since the title mentions "all data") return all columns of the publisher with pub.*. After products have been aggregated in the subquery, you are free to list anything in the outer SELECT.
This only lists publisher which
have at least one product
And the result is ordered by
the total number of products each publisher has
It's typically faster to aggregate the "n"-table before joining to the "1"-table. Then use an [INNER] JOIN (not a LEFT JOIN) to exclude publishers without products.
Note that the order of rows in an IN expression (or items in the given list - there are two syntax variants) is insignificant.
The column alias in publisherref AS id is totally optional to use the simpler USING clause for identical column names in the following join condition.
Aside: avoid CaMeL-case names in Postgres. Use unquoted, legal, lowercase names exclusively to make your life easier.
Are PostgreSQL column names case-sensitive?

Is it possible to query rows with multiple counts of appearances in other tables?

I am dealing with a problem. I have to create query that can return me occurrences of a specific key let's call it keyid in other tables for each row from my table.
It has to look like this:
SELECT
parameter1,
parameter2,
parameter3,
parameter4,
parameter5(count of keyid occurrences in specific table),
parameter6(count of keyid occurrences in another specific table),
parameter7(count of keyid occurrences in another specific table),
parameter8(count of keyid occurrences in another specific table)
I have so far made it thus far:
SELECT
keyid, name, section, address, updatedAt,
(SELECT COUNT(library.keyid) AS storeCount
FROM library
LEFT JOIN store ON library.keyid = store.keyid
GROUP BY library.keyid)
FROM
library
But I get an error:
ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
Because the subquery wants to return multiple rows that contain count of same keyid occurrences for just one row that I take from library. Any ideas on how to fix this problem?

Presumably, you want a correlated subquery:
SELECT keyid, name, section, address, updatedAt,
(SELECT COUNT(*)
FROM store s
WHERE l.keyid = s.keyid
) AS storeCount
FROM library l;
Note that the GROUP BY is unnecessary. But your real problem is that your query is not correlated because library is repeated in the subquery.

Order by date, while grouping matches by another column

I have this query
SELECT *, COUNT(app.id) AS totalApps FROM users JOIN app ON app.id = users.id
GROUP BY app.id ORDER BY app.time DESC LIMIT ?
which is supposed to get all results from "users" ordered by another column (time) in a related table (the id from the app tables references the id from the users table).
The issue I have is that the grouping is done before the ordering by date, so I get very old results. But I need the grouping in order to get distinct users, because each user can have multiple 'apps'... Is there a different way to achieve this?
Table users:
id TEXT PRIMARY KEY
Table app:
id TEXT
time DATETIME
FOREIGN KEY(id) REFERENCES users(id)
in my SELECT query I want to get a list of users, ordered by the app.time column. But because one user can have multiple app records associated, I could get duplicate users, that's why I used GROUP BY. But then the order is messed up

The underlying issue is that the SELECT is an aggregate query as it contains a GROUP BY clause :-
There are two types of simple SELECT statement - aggregate and
non-aggregate queries. A simple SELECT statement is an aggregate query
if it contains either a GROUP BY clause or one or more aggregate
functions in the result-set.
SQL As Understood By SQLite - SELECT
And thus that the column's value for that group, will be an arbitrary value the column of that group (first according to scan/search, I suspect, hence the lower values) :-
If the SELECT statement is an aggregate query without a GROUP BY
clause, then each aggregate expression in the result-set is evaluated
once across the entire dataset. Each non-aggregate expression in the
result-set is evaluated once for an arbitrarily selected row of the
dataset. The same arbitrarily selected row is used for each
non-aggregate expression. Or, if the dataset contains zero rows, then
each non-aggregate expression is evaluated against a row consisting
entirely of NULL values.
So in short you cannot rely upon the column values that aren't part of the group/aggregation, when it's an aggregate query.
Therefore have have to retrieve the required values using an aggregate expression, such as max(app.time). However, you can't ORDER by this value (not sure exactly why by it's probably inherrent in the efficiency aspect)
HOWEVER
What you can do is use the query to build a CTE and then sort without aggregates involved.
Consider the following, which I think mimics your problem:-
DROP TABLE IF EXISTS users;
DROP TABLE If EXISTS app;
CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, username TEXT);
INSERT INTO users (username) VALUES ('a'),('b'),('c'),('d');
CREATE TABLE app (the_id INTEGER PRIMARY KEY, id INTEGER, appname TEXT, time TEXT);
INSERT INTO app (id,appname,time) VALUES
(4,'app9',721),(4,'app10',7654),(4,'app11',11),
(3,'app1',1000),(3,'app2',7),
(2,'app3',10),(2,'app4',101),(2,'app5',1),
(1,'app6',15),(1,'app7',7),(1,'app8',212),
(4,'app9',721),(4,'app10',7654),(4,'app11',11),
(3,'app1',1000),(3,'app2',7),
(2,'app3',10),(2,'app4',101),(2,'app5',1),
(1,'app6',15),(1,'app7',7),(1,'app8',212)
;
SELECT * FROM users;
SELECT * FROM app;
SELECT username
,count(app.id)
, max(app.time) AS latest_time
, min(app.time) AS earliest_time
FROM users JOIN app ON users.id = app.id
GROUP BY users.id
ORDER BY max(app.time)
;
This results in :-
Where although the latest time for each group has been extracted the final result hasn't been sorted as you would think.
Wrapping it into a CTE can fix that e.g. :-
WITH cte1 AS
(
SELECT username
,count(app.id)
, max(app.time) AS latest_time
, min(app.time) AS earliest_time
FROM users JOIN app ON users.id = app.id
GROUP BY users.id
)
SELECT * FROM cte1 ORDER BY cast(latest_time AS INTEGER) DESC;
and now :-
Note simple integers have been used instead of real times for my convenience.

Since you need the newest date in every group, you could just MAX them:
SELECT
*,
COUNT(app.id) AS totalApps,
MAX(app.time) AS latestDate
FROM users
JOIN app ON app.id = users.id
GROUP BY app.id
ORDER BY latestDate DESC
LIMIT ?

You could use windowed COUNT:
SELECT *, COUNT(app.id) OVER(PARTITION BY app.id) AS totalApps
FROM users
JOIN app
ON app.id = users.id
ORDER BY app.time DESC
LIMIT ?

Maybe you could use?
SELECT DISTINCT
Read more here: https://www.w3schools.com/sql/sql_distinct.asp

Try to grouping by id and time and then order by time.
select ...
group by app.id desc, app.time
I assume that id is unique in app table.
and how you assign ID to? maybe you have enough to order by id desc

MS Access Count unique values of one table appearing in second table which is related to a third table

I am working with my lab database and close to complete it. But i am stuck in a query and a few similar queries which all give back the similar results.
Here is the Query in design mode
and this is what it gives out
This query is counting the number of ID values in table PatientTestIDs whereas I want to count the number of unique PatientID values grouped by each department
I have even tried Unique Values and Unique Records properties but all the times it gives the same result.

What you want requires two queries.
Query1:
SELECT DISTINCT PatientID, DepartmentID FROM PatientTestIDs;
Query2:
SELECT Count(*) AS PatientsPerDept, DepartmentID FROM Query1 GROUP BY DepartmentID;
Nested all in one:
SELECT Count(*) AS PatientsPerDept, DepartmentID FROM (SELECT DISTINCT PatientID, DepartmentID FROM PatientTestIDs) AS Query1 GROUP BY DepartmentID;
You can include the Departments table in query 2 (or the nested version) to pull in descriptive fields but will have to include those additional fields in the GROUP BY.

SQL: Find duplicates and for each duplicate group assign value of first duplicate of that group

I have the results in the top table. I would like the results in the bottom table.
Using an SQL query on the table above, I would like to find groups of duplicates (where the values in all columns except Id and Category are identical) and from that create a result that has for each entry the lowest Id from its group of duplicates and the (unmodified) Category from the original table.

Window function min can be used here:
select min(id) over (partition by first_name, last_name, company) id,
category
from t;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why sometimes a subquery can work like using 'group by' - sql

Related

Return all data when grouping on a field

Is it possible to query rows with multiple counts of appearances in other tables?

Order by date, while grouping matches by another column

MS Access Count unique values of one table appearing in second table which is related to a third table

SQL: Find duplicates and for each duplicate group assign value of first duplicate of that group

Categories

Resources