I am trying to count total number of times that each individual column is greater than zero, grouped by the driver name. Right now I have;
SELECT drivername
, COUNT(over_rpm) AS RPMViolations
, COUNT(over_spd) AS SpdViolations
, COUNT(brake_events) AS BrakeEvents
FROM performxbydriverdata
WHERE over_rpm > 0
OR over_spd > 0
OR brake_events > 0
GROUP BY drivername
This gives me all of the non-zero values but I get a display as:
Bob Smith 62 62 62
Nathan Jones 65 65 65
etc.
I'm trying to get a count of non-zeros in each individual values.. each violation should be grouped separately.
Use NULLIF to change zero to NULL, count ignores NULL
SELECT drivername,
COUNT(NULLIF(over_rpm,0)) AS RPMViolations,
COUNT(NULLIF(over_spd,0)) AS SpdViolations,
COUNT(NULLIF(brake_events,0)) AS BrakeEvents
FROM performxbydriverdata
GROUP BY drivername;
You can probably remove the WHERE clause too with this group to improve performance
OR conditions often run badly because of matching a good index
Using HAVING (as per other answers) will remove any rows where all 3 aggregates are zero which may or may not be useful for you. You can add this if you want. Saying that, the WHERE implies that at least one row has non-zero values so you don't need both WHERE and HAVING clauses
Putting filter predicate[s] inside of a Sum() function with a case statement is a useful trick anytime you need to count items based on some predicate condition.
Select DriverName,
Sum(case When over_rpm > 0 Then 1 Else 0 End) OverRpm,
Sum(case When over_spd > 0 Then 1 Else 0 End) OverSpeed,
Sum(case When brake_events > 0 Then 1 Else 0 End) BrakeEvents,
etc.
FROM performxbydriverdata
Group By DriverName
Related
I've run into a subtlety around count(*) and join, and a hoping to get some confirmation that I've figured out what's going on correctly. For background, we commonly convert continuous timeline data into discrete bins, such as hours. And since we don't want gaps for bins with no content, we'll use generate_series to synthesize the buckets we want values for. If there's no entry for, say 10AM, fine, we stil get a result. However, I noticed that I'm sometimes getting 1 instead of 0. Here's what I'm trying to confirm:
The count is 1 if you count the "grid" series, and 0 if you count the data table.
This only has to do with count, and no other aggregate.
The code below sets up some sample data to show what I'm talking about:
DROP TABLE IF EXISTS analytics.measurement_table CASCADE;
CREATE TABLE IF NOT EXISTS analytics.measurement_table (
hour smallint NOT NULL DEFAULT NULL,
measurement smallint NOT NULL DEFAULT NULL
);
INSERT INTO measurement_table (hour, measurement)
VALUES ( 0, 1),
( 1, 1), ( 1, 1),
(10, 2), (10, 3), (10, 5);
Here are the goal results for the query. I'm using 12 hours to keep the example results shorter.
Hour Count sum
0 1 1
1 2 2
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 3 10
11 0 0
12 0 0
This works correctly:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(measurement_table.hour) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (measurement_table.hour = hour_series.hour)
GROUP BY 1
ORDER BY 1
This returns misleading 1's on the match:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(*) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (hour_series.hour = measurement_table.hour)
GROUP BY 1
ORDER BY 1
0 1 1
1 2 2
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 3 10
11 1 0
12 1 0
The only difference between these two examples is the count term:
count(*) -- A result of 1 on no match, and a correct count otherwise.
count(joined to table field) -- 0 on no match, correct count otherwise.
That seems to be it, you've got to make it explicit that you're counting the data table. Otherwise, you get a count of 1 since the series data is matching once. Is this a nuance of joinining, or a nuance of count in Postgres?
Does this impact any other aggrgate? It seems like it sholdn't.
P.S. generate_series is just about the best thing ever.
You figured out the problem correctly: count() behaves differently depending on the argument is is given.
count(*) counts how many rows belong to the group. This just cannot be 0 since there is always at least one row in a group (otherwise, there would be no group).
On the other hand, when given a column name or expression as argument, count() takes in account any non-null value, and ignores null values. For your query, this lets you distinguish groups that have no match in the left joined table from groups where there are matches.
Note that this behavior is not Postgres specific, but belongs to the standard
ANSI SQL specification (all databases that I know conform to it).
Bottom line:
in general cases, uses count(*); this is more efficient, since the database does not need to check for nulls (and makes it clear to the reader of the query that you just want to know how many rows belong to the group)
in specific cases such as yours, put the relevant expression in the count()
I have data which contains 1000+ lines and in this it contains errors people make. I have added a extra column and would like to find all duplicate Rev Names and give the first one a 1 and all remaining duplicates a 0. When there is no duplicate, it should be a 1. The outcome should look like this:
RevName ErrorCount Duplicate
Rev5588 23 1
Rev5588 67 0
Rev5588 7 0
Rev5588 45 0
Rev7895 6 1
Rev9065 4 1
Rev5588 1 1
I have tried CASE WHEN but its not giving the first one a 1, its giving them all zero's.
Thanks guys, I am pulling out my hair here trying to get this done.
You could use a case expression over the row_number window function:
SELECT RevName,
Duplicate,
CASE ROW_NUMER() OVER (PARTITION BY RevName
ORDER BY (SELECT 1)) WHEN 1 THEN 1 ELSE 0 END AS Duplicate
FROM mytable
SQL tables represent unordered sets. There is no "first" of anything, unless a column specifies the ordering.
Your logic suggests lag():
select t.*,
(case when lag(revname) over (order by ??) = revname then 0
else 1
end) as is_duplicate
from t;
The ?? is for the column that specifies the ordering.
I have a table named 'candidate' which contains among others columns ,score_math' and 'score_language' reflecting candidate's score in respective tests. I need to
Show the number of students who scored at least 60 in both math and language (versatile_candidates) and the number of students who scored below 40 in both of
these tests (poor_candidates). Don't include students with NULL preferred_contact. My query is:
select
count(case when score_math>=60 and score_language>=60 then 1 else 0
end) as versatile_candidates,
count(case when score_math<40 and score_language<40 then 1 else 0 end) as
poor_candidates
from candidate
where preferred_contact is not null
But this produces always total number of candidates wit not-null preferred contact type. Can't really figure out what I did wrong and more importantly why this doesn't work. [DBMS is Postgres if this matters ]Please help
You're close - the reason you're getting the total number of all candidates is because COUNT() will count a 0 the same as a 1 (and any other non-NULL value, for that matter). And since the values could only ever be 0 or 1, your COUNT() will return the total number of all candidates.
Since you're already defaulting the cases that don't match to 0, all you need to do is change the COUNT() to a SUM():
Select Sum(Case When score_math >= 60
And score_language >= 60 Then 1
Else 0
End) As versatile_candidates
, Sum(Case When score_math < 40
And score_language < 40 Then 1
Else 0
End) As poor_candidates
From candidate
Where preferred_contact Is Not Null
COUNT() does not take into consideration NULL values. All other values which are not NULL will be counted.
You might want to replace it with SUM()
I'm doing a query to obtain the numbers of people for a Christmas dinner.
The people include the workers and their relatives. The relatives are stored in a different table.
Children and adults eat a different menu and we organize tables by families.
I'm already using this query
select worker_name,
count(*) as total_per_family,
SUM(CASE WHEN age < 18 THEN 1 ELSE 0 END) as children,
SUM(CASE WHEN age >= 18 THEN 1 ELSE 0 END) as adults
from
(
/*subquery*/
)
group by worker_name
order by worker_name;
This query returns the number of child and adults related to the worker and count gives me the total.
The problem is that I need to add the worker to the adults sum.
Is there a way to modify adults? Either setting its initial value to 1 or adding 1 after the sum is done but before the count is obtained.
Modifying your query to read
SUM(CASE WHEN AGE>=18 THEN 1 ELSE 0 END) + 1 as adults
would probably be a first approach. The aggregate SUM() would be computed first, with 1 added thereafter as your initial suggestion indicated.
I have a table that is just random values.
I have a query that looks something like this:
SELECT COUNT(DISTINCT(Value))
FROM RandomValueTable
WHERE
Value > #lowerRange
AND
Value < #upperRange
There will be a series of ranges I need to run this for (0-20, 21-45, 46-100 etc). Before running this query, I'll know what the ranges are. Do I need to run this query several times just filling in the range variables, or is there some way I can specify all the different ranges in one query?
You can specify them in one query using group by:
select (case when value between 0 and 20 then '0-20'
when value between 21 and 45 then '21-45'
when value between 46 and 100 then '46-100'
else 'other'
end) as range,
count(*)
from RandomValueTable
group by (case when value between 0 and 20 then '0-20'
when value between 21 and 45 then '21-45'
when value between 46 and 100 then '46-100'
else 'other'
end);