Include 0 in count(*) SQL query - sql

I have two entities, User and MaBase. MaBase contains user_id and status. I want to get the count of status by user, I also want to show a 0 for any status values where the user doesn't have a record.
I created the below query using count, but it only returns non-null values. How I can solve this:
SELECT status, COUNT(*)
FROM ma_base
WHERE ma_base.user_id = 5
GROUP BY status
I have 5 types of status values. If a user only has ma_base records for 4 of them, I still want to see a 0 value for the 5th status.

It's not every day I get to write a CROSS JOIN:
SELECT u.ID, s.status,
coalesce((SELECT COUNT(*) FROM ma_base m WHERE m.User_Id = u.ID and m.status = s.Status),0) As Status_Count
FROM User u
CROSS JOIN (SELECT DISTINCT status FROM MA_Base) s
WHERE u.ID = 5
OR:
SELECT u.ID, s.status, COALESCE(COUNT(m.status), 0) AS Status_Count
FROM User u
CROSS JOIN (SELECT DISTINCT status FROM MA_Base) s
LEFT JOIN MA_Base m ON m.User_Id = u.ID AND m.status = s.status
WHERE u.ID = 5
GROUP BY u.ID, s.status
In a nutshell, we first need to create a projection for the user with every possible status value, to anchor the result records for your "missing" statuses. Then we can JOIN or do a correlated subquery to get your desired results.
For the JOIN option, note the expression in the COUNT() function. It's important; COUNT(*) won't do what you want. For both options, note the use of COALESCE() to put the expected result in for NULL.
If you have a separate table defining your status values, use that instead of deriving them from ma_base.

Related

Get users with item count <= 1 in sql

We have these tables in PostgreSQL 12:
User -> id, name, email
items -> id, user_id, description
We want to run a query to find users that have 1 item or less.
I tried using a join statement and in the WHERE clause tried to put the count of users < 1 with this query
select * from "user" inner join item on "user".id = item.user_id where count(item.user_id) < 1;
but it failed and gave me this error.
ERROR: aggregate functions are not allowed in WHERE
LINE 1: ...inner join item on "user".id = item.user_id where count(item...
so im thinking the query needs to be more techincal.
Can anyone please help me with this? thanks
You can do:
select u.*
from user u
left join (
select user_id, count(*) as cnt from items group by user_id
) x on x.user_id = u.id
where x.cnt = 1 or x.cnt is null
You don't technically need a JOIN for this. You can get all the necessary data from the item table with GROUP BY. The trick is you need to use HAVING instead of WHERE for aggregated data like COUNT()
SELECT user_id
FROM item
GROUP BY user_id
HAVING COUNT(id) > 1
But we can add a JOIN if you want to see more fields from the user table:
SELECT u.id, u.name, u.email
FROM item i
INNER JOIN "user" u on u.id = i.user_id
GROUP BY u.id, u.name, u.email
HAVING COUNT(i.id) > 1

SELECT DISTINCT + ORDER BY additional expression

I have no experience with PostgreSQL and I am migrating a Rails5+MySQL application to Rails5+PostgreSQL and I am having a problem with a query.
I've already looked at some questions/answers and still haven't been able to solve my problem. My problem seems to be ridiculous, but I needed to ask for help here!
Query:
SELECT DISTINCT users.* FROM users
INNER JOIN areas_users ON areas_users.user_id = users.id
INNER JOIN areas ON areas.deleted_at IS NULL AND areas.id = areas_users.area_id
WHERE users.deleted_at IS NULL AND users.company_id = 2 AND areas.id IN (2, 4, 5)
ORDER BY CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END, users.id, 1 ASC
Running the query in DBeaver, returns the error:
SQL Error [42P10]: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
What do I need to do to be able to use this SELECT DISTINCT with this ORDER BY CASE?
It's like error message says:
for SELECT DISTINCT, ORDER BY expressions must appear in select list
This is an expression:
CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END
You cannot order by it, while doing SELECT DISTINCT users.* FROM ... because that only allows ORDER BY expressions that appear in the SELECT list.
Typically, the best solution for DISTINCT is not to use it in the first place. If you don't duplicate rows, you don't have to de-duplicate them later. See:
How to speed up select distinct?
In your case, use an EXISTS semi-join (expression / subquery) instead of the joins. This avoids the duplication. Assuming distinct rows in table users, DISTINCT is out of job.
SELECT u.*
FROM users u
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND EXISTS (
SELECT FROM areas_users au JOIN areas a ON a.id = au.area_id
WHERE au.user_id = u.id
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
)
ORDER BY CASE u.id WHEN 3 THEN 0
WHEN 5 THEN 1 END, u.id, 1; -- ①
Does what you request, and typically much faster, too.
Using simple ("switched") CASE syntax.
① There is still an ugly bit. Using a positional reference in ORDER BY can be convenient short syntax. But while you have SELECT *, it's a really bad idea. If the order of columns in the underlying table changes, your query is silently changed. Spell out the column in this use case!
(Typically, you don't need SELECT * in the first place, but just a selection of columns.)
IF your ID column is guaranteed to have positive numbers, this would be a bit faster:
...
ORDER BY CASE u.id WHEN 3 THEN -2
WHEN 5 THEN -1
ELSE u.id END, <name_of_first_column>
I MUST use DISTINCT
(Really?) If you insist:
SELECT DISTINCT CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END AS order_column, u.*
FROM users u
JOIN areas_users au ON au.user_id = u.id
JOIN areas a ON a.id = au.area_id
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
ORDER BY 1, <name_of_previously_first_column>; -- now, "ORDER BY 1" is ok
You get the additional column order_column in the result. You can wrap it in a subquery with a different SELECT ...
Just a proof of concept. Don't use this.
Or DISTINCT ON?
SELECT DISTINCT ON (CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>)
u.*
FROM users u
JOIN areas_users au ON au.user_id = u.id
JOIN areas a ON a.id = au.area_id
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
ORDER BY CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>;
This works without returning an additional column. Still just proof of concept. Don't use it, the EXISTS query is much cheaper.
See:
Select first row in each GROUP BY group?

Accounting for nulls the correct way

In my below query I want to account for and not count users with null value in distance column - which of the below is the most optimal? I am also not sure how the HAVING ifnull works but it removes any user with null in distance column or 0 as the sum which what I wanted
Having
SELECT
name,
SUM(distance) as distance_traveled
FROM users
LEFT JOIN rides
ON users.id = rides.passenger_user_id
GROUP BY name
HAVING IFNULL(SUM(distance), 0)
Coalesce
SELECT
name,
COALESCE(SUM(distance),0) as distance_traveled
FROM users
LEFT JOIN rides
ON users.id = rides.passenger_user_id
GROUP BY name
Not Null Filter
SELECT
name,
SUM(distance) as distance_traveled
FROM users
LEFT JOIN rides
ON users.id = rides.passenger_user_id
and distance is not null
GROUP BY name
Thanks
The SUM aggregate function, by default, will ignore NULL values. So, any names having a mixture of NULL and non NULL distances will only report the sum of the non NULL values. However, for the case of a name only having NULL distances, the sum would return NULL. Using COALESCE as you have done in the second version is a typical way of dealing with this:
SELECT u.name, COALESCE(SUM(r.distance), 0) AS distance_traveled
FROM users u
LEFT JOIN rides r
ON u.id = r.passenger_user_id
GROUP BY u.name;
If you want to remove any users having all NULL distances, then add the following HAVING clause:
HAVING COUNT(r.distance) = 0
If you want to filter off any users with all zero distances (i.e. either missing or present but reported as NULL), then use this HAVING clause:
HAVING SUM(r.distance) > 0

Too much Data using DISTINCT MAX

I want to see the last activity each individual handset and the user that used that handset. I have a table UserSessions that stores the last activity of a particular user as well as what handset they used in that activity. There are roughly 40 handsets, yet I always get back way too many records, like 10,000 rows when I only want the last activity of each handset. What am I doing wrong?
SELECT DISTINCT MAX(UserSessions.LastActivity), Handsets.Name,Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE
Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY UserSessions.LastActivity, Handsets.Name,Users.Username
I expect to get one record per handset of the users last activity with that handset. What I get is multiple records on all handsets and dates over 10000 rows
You typically GROUP BY the same columns as you SELECT, except those who are arguments to set functions.
This GROUP BY returns no duplicates, so SELECT DISTINCT isn't needed.
SELECT MAX(UserSessions.LastActivity), Handsets.Name, Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY Handsets.Name, Users.Username
There is no such thing as DISTINCT MAX. You have SELECT DISTINCT which ensures that all columns referenced in the SELECT are not duplicated (as a group) across multiple rows. And there is MAX() an aggregation function.
As a note: SELECT DISTINCT is almost never appropriate with GROUP BY.
You seem to want:
SELECT *
FROM (SELECT h.Name, u.Username, MAX(us.LastActivity) as last_activity,
RANK() OVER (PARTITION BY h.Name ORDER BY MAX(us.LastActivity) desc) as seqnum
FROM UserSessions us JOIN
Handsets h
ON h.HandsetId = us.HandsetId INNER JOIN
Users u
ON u.UserId = us.UserId
WHERE h.Name in (1000,1001.1002,1003,1004....) AND
h.Deleted = 0
GROUP BY h.Name, u.Username
) h
WHERE seqnum = 1

PostgreSQL more than one row returned by a subquery used as an expression

Running PostgreSQL 9.6.
I'm trying to output rows consisting of a value and a list of names.
This is my query:
SELECT name, (SELECT car_name FROM cars WHERE user = id)
FROM users WHERE user_id = 1 ORDER BY name;
But it fails with:
ERROR: more than one row returned by a subquery used as an expression
It of course make sense, but I would like to have the nested query to be outputted as a list or json.
I've tried with row_to_json, but that fails also.
Use an aggregation function, such as string_agg() or json_agg():
SELECT name,
(SELECT string_agg(car_name) FROM cars WHERE user = id)
FROM users
WHERE user_id = 1
ORDER BY name;
You can do the JOIN instead :
SELECT u.name, string_agg(c.car_name)
FROM users u LEFT OUTER JOIN
cars c
ON c.id = u.user
WHERE u.user_id = 1
GROUP BY u.name
ORDER BY u.name;