SQL Join and count relations - sql

Having a PostgreSQL query problem, and wondering if there's an efficient way to get this in a single query. Let's take the following simple table structure. Think of it as the traditional many to many relationship.
users <-> user_collections <-> collections
Given a users id, I'd like to first get all of their collections. This is the simple part for which I have a query:
SELECT c.id, c.name, c.description, c.created_at, c.updated_at
FROM collections c
JOIN user_collections uc ON c.id = uc.collection_id
WHERE uc.user_id = $1
ORDER BY created_at DESC
So for example:
users
id | email
1 | user1#example.com
2 | user2#example.com
3 | user3#example.com
user_collections
id | user_id | collection_id
1 | 1 | 1
2 | 2 | 1
3 | 3 | 1
collections
id | name | description
1 | Example | Demo collection
In the above case, querying for collections for user one would yield the first collection. However I'd also like to get a count of how many users are associated with each collection. In this case, a total count of 3, since all three members share this collection. A member count if you will. Is there a sensible way to do this in one query, or is two probably better?

An alternative approach to using correlated subquery is to pre-calculated the number of users in each collection, then join it to your existing query.
i.e.
with collection_counts as (
select
collection_id
, count(1) as collection_count
from user_collections
group by collection_id
)
SELECT
c.id
, c.name
, c.description
, c.created_at
, c.updated_at
, cc.collecion_count
FROM collections c
JOIN user_collections uc ON c.id = uc.collection_id
join collection_counts as cc on c.id = cc.collection_id
WHERE uc.user_id = $1
ORDER BY created_at DESC

You need to use here a correlated sub query to get the corresponding total user of that collection [Correlated sub query help https://www.geeksforgeeks.org/sql-correlated-subqueries/]
SELECT c.* , (select count(user_id) from user_collections sc where sc.collection_id=uc.collection_id) as GroupCount
FROM collections c
JOIN user_collections uc ON c.id = uc.collection_id
WHERE uc.user_id = $1
ORDER BY created_at DESC

One method is conditional aggregation:
SELECT c.id, c.name, c.description, c.created_at, c.updated_at,
COUNT(*) as num_users
FROM collections c JOIN
user_collections uc
ON c.id = uc.collection_id
GROUP BY c.id
HAVING COUNT(*) FILTER (WHERE uc.user_id = $1) > 0
ORDER BY created_at DESC;
That said, it might be faster to do:
SELECT c.id, c.name, c.description, c.created_at, c.updated_at,
(SELECT COUNT(*)
FROM user_collections uc
WHERE c.id = uc.collection_id
) as num_users
FROM collections c
WHERE EXISTS (SELECT 1
FROM user_collections uc
WHERE c.id = uc.collection_id AND
uc.user_id = $1
)
ORDER BY created_at DESC;
This would be faster for two reasons:
It avoids the outer aggregation. Aggregations on larger data are generally more expensive.
It calculates the count only for the collections that are in the result set.
This can also make use of indexes on the table -- which if you care about performance, you should have.

Related

SQL LIMIT by distinct column value without subqueries

user
id
name
age
1
anna
6
2
john
10
3
lord
50
cats
id
name
userID
1
miez
1
2
caty
1
3
random
2
4
idk
3
When using
SELECT U.id, C.name FROM user U
INNER JOIN cats C ON U.id = C.id
LIMIT 2
I get as a
result
UserID
CatName
1
miez
1
caty
What I want is to limit my rows by the distinct values of UserID, like this
SELECT U.id, C.name FROM user U
INNER JOIN cats C ON U.id = C.id
LIMIT 2 <distinct U.id rows>
UserID
CatName
1
miez
1
caty
2
random
People suggested using limit in subqueries and check if UserID is in the return
like
... WHERE UserID IN (SELECT id FROM User LIMIT 2)
but this only works well for small tables and is not an elegant solution for good performance.
My idea was using DENSE_RANK(), like:
SELECT U.id, C.name FROM user U
DENSE_RANK() OVER (ORDER BY U.id) as rows,
INNER JOIN cats C ON U.id = C.id
WHERE rows < 50
but it is not working either.
You can't use a column alias on the same level where you define it. You will have to wrap the query in a derived table. However if you want a specific number of row per user you need to use partition by, not order by
select id, name
from (
SELECT u.id,
c.name,
DENSE_RANK() OVER (PARTITION BY U.id ORDER BY c.name) as rnk
FROM user U
JOIN cats C ON U.id = C.userid
) t
WHERE t.rnk <= 2

GROUP BY Subquery returns more than one row

I'm looking for a way to solve the following situation. I have a table that I need to return only one number for each "p.pays", This query is supposed to list "nom from table Pays" where at least half of the "athlete" have are in the table "Resultat" but my subquery returns more than one line is there a way I can match "p.code" in both the query and the subquery so it only returns 1 line per "p.code".
SELECT p.nom , count(*) FROM Athlete a
INNER JOIN Pays p ON a.pays = p.code
GROUP BY p.code HAVING count(*)/2 >= (SELECT count(*) FROM Athlete a
INNER JOIN Pays p ON a.pays = p.code
INNER JOIN Resultat r ON a.code = r.athlete
GROUP BY p.code);
Expected result, show Countries"Pays" where at least half of the athletes "Athlete" have won a medal (Athlete is in the Resultat table). :
p.nom | count(*)
|----------|--------|
|Albania | 134 | <-- Total Number of athletes "Athlete" in the
|Argentina | 203 | country "Pays".
| ... | ... |
You want to have two counts of athlethes in the country:
all athletes
the resultat athletes
Use a conditional count for this:
SELECT p.nom, count(*)
FROM pays p
INNER JOIN athlete a ON a.pays = p.code
GROUP BY p.code
HAVING COUNT(*) / 2 >=
COUNT(*) FILTER (WHERE a.code IN (SELECT athlete FROM resultat))
ORDER BY p.nom;

Making a MAX() query from a subquery with COUNT()

Need to Show the Name of the boat which made the most trips, so i made a query that counts the trips:
SELECT B.IdBoat, COUNT(T.IdTrip)
FROM Trip T INNER JOIN Boat B ON T.IdBoat=B.IdBoat
GROUP BY B.IdBoat
Now I need to show the name of the one with the MAX trips, how do I use that query as a subquery, without using the ORDER BY DESC and TOP 1 but using MAX?
Currently got:
SELECT B.Name
FROM Trip T INNER JOIN Boat B ON T.IdBoat=B.IdBoat
WHERE B.IdBoat = MAX( the sub query above)
also tried
SELECT B.Name, T.IdTrip
FROM Boat B INNER JOIN Trip T ON B.IdBoat=T.IdBoat
WHERE B.IdBoat IN (
SELECT MAX(T.NTrips) FROM
(SELECT B.IdBoat AS [IdBoat], COUNT(T.IdTrip) AS [NTrips]
FROM Trip T INNER JOIN Boat B ON B.IdBoat=T.IdBoat
GROUP BY B.Boat) T
GROUP BY T.IdBoat)
The above returned the full count of 3 on the name of the boat instead of the correct 2.
I've tried googling and searching about said problem on stackoverflow and others but can't adapt their solution to my query, any help is good help.
Thank you.
edit 1. As asked, I'll provide some data as to help understand the problem better
Table Boat:
IdBoat | Name
1 | 'SS Sparrow'
2 | 'SS AndaNoMar'
Table Trip
IdTrip | IdBoat
1 | 1
2 | 1
3 | 2
Subquery 1 (COUNT)
IdBoat | NTrips
2 | 1
1 | 2
You can do:
with
x as (
select
b.idBoat,
b.Name,
count(*) as cnt
from trip t
join boat b on b.idBoat = t.idBoat
group by b.idBoat, b.Name
),
m as (
select max(cnt) as max_cnt from x
)
select
x.*
from x
join m on m.max_cnt = x.cnt
SELECT
B.IdBoat,
B.Name,
T.Trips
FROM
Boat AS B
INNER JOIN
(
SELECT
IdBoat,
COUNT(*) AS Trips,
RANK() OVER (PARTITION BY IdBoat
ORDER BY COUNT(*) DESC
)
AS TripsRank
FROM
Trip
GROUP BY
IdBoat
)
AS T
ON T.IdBoat = B.IdBoat
WHERE
T.TripsRank = 1
A better method than either of the other two answers is to use ORDER BY:
SELECT TOP (1) B.IdBoat, B.Name, COUNT(T.IdTrip) as cnt
FROM Trip T INNER JOIN
Boat B
ON T.IdBoat = B.IdBoat
GROUP BY B.IdBoat, B.Name
ORDER BY cnt DESC;
There is no need for subqueries or CTEs or window functions.
If you want ties, then you can use TOP (1) WITH TIES.

Oracle: How to use left outer join to get all entries from left table and satisfying the condition in Where clause

I have the tables below.
Client:
ID | clientName
--------------
1 A1
2 A2
3 A3
Order:
OrdID clientID status_cd
------------------------
100 1 DONE
101 1 SENT
102 3 SENT
Status:
status_cd status_category
DONE COMPL
SENT INPROG
I have to write a query to get all the clients and count of order against all of them, whether the client_id exists in Order table or not and has the orders with "COMPL" as status category.
In this case, I am using the query below but it's filtering out the clients which has no orders. I want to get all clients such that the expected result is as below.
Query:
select c.ID, count(distinct o.OrdID)
from client c, order o, status s
where c.ID=o.client_id(+)
and o.status_cd=s.status_cd where s.status_category='COMPL'
group by c.ID
Expected result:
C.ID count(distinct o.OrdID)
----------------------------
1 1
2 0
3 0
Can someone please help me with this? I know, in this case, left outer join is behaving like inner join when I am using where clause, but is there any other way to achieve the results above?
This can be dealt with a lot easier when using an explicit join operator:
select c.ID, count(distinct s.status_cd)
from client c
left join orders o on o.clientid = c.id
left join status s on s.status_cd = o.status_cd and s.status_category='COMPL'
group by c.ID;
The above assumes that orders.status_cd is defined as not null
Another option is to move the join between orders and status in a derived table:
select c.ID, count(distinct o.ordid)
from client c
left join (
select o.ordid
from orders o
join status s on s.status_cd = o.status_cd
where s.status_category='COMPL'
) o on o.clientid = c.id
group by c.ID;
The above "states" more clearly (at least in my eyes) that only orders within that status category are of interest compared to the first solution
As usual, there are lots of ways to express this requirement.
Try ANSI join people will hate me an vote down this answer ;) :
select c.ID, count(distinct o.OrdID)
from client c, order o, status s
where c.ID = o.client_id(+)
and o.status_cd = s.status_cd
and s.status_category='COMPL'
group by c.ID
;
or
select c.ID
, nvl((select count(distinct o.OrdID)
from order o, status s
where c.ID = o.client_id
and o.status_cd = s.status_cd
and s.status_category='COMPL'
), 0) as order_count
from client c
group by c.ID
;
or
with ord as
(select client_id, count(distinct o.OrdID) cnt
from order o, status s
where 1=1
and o.status_cd = s.status_cd
and s.status_category='COMPL'
group by client_id
)
select c.ID
, nvl((select cnt from ord o where c.ID = o.client_id ), 0) as order_count
from client c
group by c.ID
;
or
...
The second WHERE should be an AND.
Other than that, you need the plus sign, (+), marking left outer join, in the second join condition as well. It is not enough to left-outer-join the first two tables.
Something like
select c.ID, count(distinct o.OrdID)
from client c, order o, status s
where c.ID=o.client_id(+)
and o.status_cd=s.status_cd(+) AND s.status_category='COMPL'
-- ^^^ ^^^ (not WHERE)
group by c.ID
Of course, it would be much better if you used proper (SQL Standard) join syntax.

SQL: How to save order in sql query?

I have PostgreSQL database and I try to print all my users (Person).
When I execute this query
-- show owners
-- sorted by maximum cars amount
SELECT p.id
FROM car c JOIN person p ON c.person_id = p.id
GROUP BY p.id
ORDER BY COUNT(p.name) ASC;
I get all owners sorted by cars amount
Output: 3 2 4 1
And all order goes wrong when I try to link owner id.
SELECT *
FROM person p
WHERE p.id IN (
SELECT p.id
FROM car c JOIN person p ON c.person_id = p.id
GROUP BY p.id
ORDER BY COUNT(p.name) ASC);
Output: 1 2 3 4 and other data
You see than order is wrong. So here is my question how can I save that order?
Instead Of subquery use join. Try this.
SELECT p.*
FROM person p
JOIN (SELECT p.id,
Count(p.NAME)cnt
FROM car c
JOIN person p
ON c.person_id = p.id
GROUP BY p.id) b
ON p.id = b.id
ORDER BY cnt ASC
Untangle the mess. Aggregate first, join later:
SELECT p.*
FROM person p
JOIN (
SELECT person_id, count(*) AS ct
FROM car
GROUP BY person_id
) c ON c.person_id = p.id
ORDER BY c.cnt;
No need to join to person twice. This should be fastest if you count most or all rows.
For a small selection, correlated subqueries are faster:
SELECT p.*
FROM person p
ORDER BY (SELECT count(*) FROM car c WHERE c.person_id = p.id)
WHERE p.id BETWEEN 10 AND 20; -- some very selective predicate
As for your original: IN takes a set on the right hand, order of elements is ignored, so ORDER BY is pointless in the subuery.