Accounting for nulls the correct way - sql

In my below query I want to account for and not count users with null value in distance column - which of the below is the most optimal? I am also not sure how the HAVING ifnull works but it removes any user with null in distance column or 0 as the sum which what I wanted
Having
SELECT
name,
SUM(distance) as distance_traveled
FROM users
LEFT JOIN rides
ON users.id = rides.passenger_user_id
GROUP BY name
HAVING IFNULL(SUM(distance), 0)
Coalesce
SELECT
name,
COALESCE(SUM(distance),0) as distance_traveled
FROM users
LEFT JOIN rides
ON users.id = rides.passenger_user_id
GROUP BY name
Not Null Filter
SELECT
name,
SUM(distance) as distance_traveled
FROM users
LEFT JOIN rides
ON users.id = rides.passenger_user_id
and distance is not null
GROUP BY name
Thanks

The SUM aggregate function, by default, will ignore NULL values. So, any names having a mixture of NULL and non NULL distances will only report the sum of the non NULL values. However, for the case of a name only having NULL distances, the sum would return NULL. Using COALESCE as you have done in the second version is a typical way of dealing with this:
SELECT u.name, COALESCE(SUM(r.distance), 0) AS distance_traveled
FROM users u
LEFT JOIN rides r
ON u.id = r.passenger_user_id
GROUP BY u.name;
If you want to remove any users having all NULL distances, then add the following HAVING clause:
HAVING COUNT(r.distance) = 0
If you want to filter off any users with all zero distances (i.e. either missing or present but reported as NULL), then use this HAVING clause:
HAVING SUM(r.distance) > 0

Related

Include 0 in count(*) SQL query

I have two entities, User and MaBase. MaBase contains user_id and status. I want to get the count of status by user, I also want to show a 0 for any status values where the user doesn't have a record.
I created the below query using count, but it only returns non-null values. How I can solve this:
SELECT status, COUNT(*)
FROM ma_base
WHERE ma_base.user_id = 5
GROUP BY status
I have 5 types of status values. If a user only has ma_base records for 4 of them, I still want to see a 0 value for the 5th status.
It's not every day I get to write a CROSS JOIN:
SELECT u.ID, s.status,
coalesce((SELECT COUNT(*) FROM ma_base m WHERE m.User_Id = u.ID and m.status = s.Status),0) As Status_Count
FROM User u
CROSS JOIN (SELECT DISTINCT status FROM MA_Base) s
WHERE u.ID = 5
OR:
SELECT u.ID, s.status, COALESCE(COUNT(m.status), 0) AS Status_Count
FROM User u
CROSS JOIN (SELECT DISTINCT status FROM MA_Base) s
LEFT JOIN MA_Base m ON m.User_Id = u.ID AND m.status = s.status
WHERE u.ID = 5
GROUP BY u.ID, s.status
In a nutshell, we first need to create a projection for the user with every possible status value, to anchor the result records for your "missing" statuses. Then we can JOIN or do a correlated subquery to get your desired results.
For the JOIN option, note the expression in the COUNT() function. It's important; COUNT(*) won't do what you want. For both options, note the use of COALESCE() to put the expected result in for NULL.
If you have a separate table defining your status values, use that instead of deriving them from ma_base.

How to count all null values of right join by nesting it?

SELECT COUNT(Orders.EmployeeID)
FROM Orders
WHERE (Orders.EmployeeID IS NULL)
AND (IN(SELECT Orders.EmployeeID
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID))
GROUP BY Orders.EmplyoeeID;
You need to say us which DBMS are you using? MySQL or SQL Server? You have tagged both!
In SQL Server:
SELECT COUNT( CASE
WHEN Orders.EmployeeID IS NULL THEN 1
ELSE NULL
END
)
FROM Orders
RIGHT JOIN Employees
ON Orders.EmployeeID = Employees.EmployeeID;
If you pass a column name to the COUNT function, it wont count the null values, so in order to count the NULL values, you can use CASE to determine the NULL values and make COUNT function to count them(WHEN Orders.EmployeeID IS NULL THEN 1) and also you need to determine the non-NULL values and make make COUNT function not to count them(ELSE NULL).
Read more about COUNT here: https://learn.microsoft.com/en-us/sql/t-sql/functions/count-transact-sql?view=sql-server-2017

Postgres Join and return flag if a row exists

I am very certain that this is possible in SQL but I am not sure how to implement this. I am using PostgreSQL
I have 2 tables
users with columns id, name and created_date
user_docs with columns id, value
I want to write a select query which returns all users table columns, along with another column called has_docs which indicates whether the user has any document rows in the user_docs table.
Can someone help?
You can left join the two tables and check if not null for the value
SELECT u.id,
u.name,
u.created_date,
CASE WHEN ud.value IS NOT NULL
THEN 'Y'
ELSE 'N'
END has_docs
FROM users u
LEFT JOIN user_docs ud
ON u.id = ud.id

get COUNT to return 0 when EXISTS() subquery is false

Given the following query:
SELECT M.year, COUNT(C.eid)
FROM Card AS C, Month AS M
WHERE EXISTS(SELECT 1 FROM Charge AS CH
WHERE CH.usingcard=C.eid AND CH.year=M.year)
GROUP BY M.year
where EXISTS is used to avoid the number of 'Charge' matching the given year/card changing the count. The pb is that I would like a resulting row for each possible 'Month' row, eg when there are no matching 'Charge', I would like to get 0 as count, while I currently get no row at all.
Without EXISTS, I could use an outer join. I could also probably use an UNION with a query returning 0 for case where NOT EXISTS().
Anyone has a smarter idea ?
This avoids the the problem of multiple Charge records:
SELECT M.year, COUNT(distinct CH.usingcard)
FROM Card AS C
CROSS JOIN Month AS M
LEFT JOIN Charge CH on CH.usingcard=C.eid AND CH.year=M.year
GROUP BY M.year
This counts how may different cards were charged in the year. Non-joining rows will have CH.usingcard of null, which won't be counted.
The NOT EXISTS is excluding zero counts: as expected
If you want zero counts, then you need the OUTER JOIN. However, your FROM clause is a CROSS JOIN so you'd get wrong counts. And you can ignore the Card table because usingcard has the same data (otherwise you wouldn't use it in your EXISTS)
It is this simple...
SELECT
M.year, COUNT(CH.usingcard)
FROM
Month AS M
LEFT JOIN
Charge AS CH On CH.year = M.year
GROUP BY
M.year
If all arguments are NULL, COALESCE returns NULL.
COALESCE(expression1,...n) is equivalent to the following CASE expression:
CASE
WHEN (expression1 IS NOT NULL) THEN expression1
WHEN (expression2 IS NOT NULL) THEN expression2
...
ELSE expressionN
END
The following example shows how COALESCE selects the data from the first column that has a nonnull value.
SELECT Name, Class, Color, ProductNumber,
COALESCE(Class, Color, ProductNumber) AS FirstNotNull
FROM Production.Product ;
You could the NULLIF function which can be used inside the count function. This function takes two parameters: if the parameters are equal, then it returns null, else it returns the value of the first parameter.
SELECT M.year, COUNT (NULLIF (CH.usingcard, NULL))
FROM Month as M left join Charge as CH
ON CH.Year = M.Year
GROUP BY M.year

Problem With DISTINCT!

Here is my query:
SELECT
DISTINCT `c`.`user_id`,
`c`.`created_at`,
`c`.`body`,
(SELECT COUNT(*) FROM profiles_comments c2 WHERE c2.user_id = c.user_id AND c2.profile_id = 1) AS `comments_count`,
`u`.`username`,
`u`.`avatar_path`
FROM `profiles_comments` AS `c` INNER JOIN `users` AS `u` ON u.id = c.user_id
WHERE (c.profile_id = 1) ORDER BY `u`.`id` DESC;
It works. The problem though is with the DISTINCT word. As I understand it, it should select only one row per c.user_id.
But what I get is even 4-5 rows with the same c.user_id column. Where is the problem?
actually, DISTINCT does not limit itself to 1 column, basically when you say:
SELECT DISTINCT a, b
What you're saying is, "give me the distinct value of a and b combined" .. just like a multi-column UNIQUE index
distinct will ensure that ALL values in your select clause are unique, not just user_id. If you want to limit the results to individual user_ids, you should group by user_id.
Perhaps what you want is:
SELECT
`c`.`user_id`,
`u`.`username`,
`u`.`avatar_path`,
(SELECT COUNT(*) FROM profiles_comments c2 WHERE c2.user_id = c.user_id AND c2.profile_id = 1) AS `comments_count`
FROM `profiles_comments` AS `c` INNER JOIN `users` AS `u` ON u.id = c.user_id
WHERE (c.profile_id = 1)
GROUP BY `c`.`user_id`,
`u`.`username`,
`u`.`avatar_path`
ORDER BY `u`.`id` DESC;
DISTINCT works at a row level, not just a column level
If you want the DISTiNCT of only one column then you will have to aggregate the rest of the columns returned (MIN, MAX, SUM, AVG, etc)
SELECT DISTINCT (Name), Min (ID)
From MyTable
Distinct will try to return only unique rows, it will not return only 1 row per user id in your example.
http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html
You misunderstand. The DISTINCT modifier applies to the entire row — it states that no two identical ROWS will be returned in the result set.
Looking at your SQL, what value of the several available do you expect to see returned in the created_at column (for instance)? It would be impossible to predict the results of the query as written.
Also, you're using profile_comments twice in your SELECT. It appears that you're trying to obtain a count of how many times each user has commented. If so, what you want to do is use an AGGREGATE query, grouped on user_id and including only those columns that uniquely identify a user along with a COUNT of the comments:
SELECT user_id, COUNT(*) FROM profile_comments WHERE profile_id = 1 GROUP BY user_id
You can add the join to users to get the user name if you want but, logically, your result set cannot include other columns from profile_comments and still produce only a single row per user_id unless those columns are also aggregated in some way:
SELECT user_id, MIN(created_at) AS Earliest, MAX(created_at) AS Latest, COUNT(*) FROM profile_comments WHERE profile_id = 1 GROUP BY user_id