SQL SUM counting every instance and not if data is present - sql

I have a piece of a SQL statement below:
SELECT WOCUSTFIELD.WORKORDERID, WORKORDER.ACTUALFINISHDATE,
CASE WHEN wocustfield.custfieldname LIKE '%bags%'
THEN (REPLACE( wocustfield.custfieldname, 'bags:', ''))
ELSE REPLACE( wocustfield.custfieldname, 'boxes:', '')
END AS Facility,
For each of the custfieldname that has a number > than 0 put in for Bags or Boxes I would like to count it as a Facility (visit to teh site).
Currently, it is counting every custfieldname regardless if it has a number ( 0 or >) in the field as a visit and is totaling all custfieldnames for each day.
for example, if the data look like the following for Jan 1st 2023:
Bags Boxes
Dogstation 1 3 4
Dogstation 2 0 0
Dogstation 3 5 1
Dogstation 4 2 0
Dogstation 5 0 0
I would like to have 3 Facility (visits) stations and not 5. I hope this makes sense. thanks for any help given

The COUNT aggregation counts 1 for every non-null. Zero (0) is not null, so it will count as 1. You want to do something like this:
SUM(CASE WHEN bags >0 then 1 else 0 end) as stores_with_bags

Just add in a WHERE clause. I'm not exactly sure how your table is formatted, but ex:
WHERE Bags>0 OR Boxes>0

Related

Misleading count of 1 on JOIN in Postgres 11.7

I've run into a subtlety around count(*) and join, and a hoping to get some confirmation that I've figured out what's going on correctly. For background, we commonly convert continuous timeline data into discrete bins, such as hours. And since we don't want gaps for bins with no content, we'll use generate_series to synthesize the buckets we want values for. If there's no entry for, say 10AM, fine, we stil get a result. However, I noticed that I'm sometimes getting 1 instead of 0. Here's what I'm trying to confirm:
The count is 1 if you count the "grid" series, and 0 if you count the data table.
This only has to do with count, and no other aggregate.
The code below sets up some sample data to show what I'm talking about:
DROP TABLE IF EXISTS analytics.measurement_table CASCADE;
CREATE TABLE IF NOT EXISTS analytics.measurement_table (
hour smallint NOT NULL DEFAULT NULL,
measurement smallint NOT NULL DEFAULT NULL
);
INSERT INTO measurement_table (hour, measurement)
VALUES ( 0, 1),
( 1, 1), ( 1, 1),
(10, 2), (10, 3), (10, 5);
Here are the goal results for the query. I'm using 12 hours to keep the example results shorter.
Hour Count sum
0 1 1
1 2 2
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 3 10
11 0 0
12 0 0
This works correctly:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(measurement_table.hour) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (measurement_table.hour = hour_series.hour)
GROUP BY 1
ORDER BY 1
This returns misleading 1's on the match:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(*) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (hour_series.hour = measurement_table.hour)
GROUP BY 1
ORDER BY 1
0 1 1
1 2 2
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 3 10
11 1 0
12 1 0
The only difference between these two examples is the count term:
count(*) -- A result of 1 on no match, and a correct count otherwise.
count(joined to table field) -- 0 on no match, correct count otherwise.
That seems to be it, you've got to make it explicit that you're counting the data table. Otherwise, you get a count of 1 since the series data is matching once. Is this a nuance of joinining, or a nuance of count in Postgres?
Does this impact any other aggrgate? It seems like it sholdn't.
P.S. generate_series is just about the best thing ever.
You figured out the problem correctly: count() behaves differently depending on the argument is is given.
count(*) counts how many rows belong to the group. This just cannot be 0 since there is always at least one row in a group (otherwise, there would be no group).
On the other hand, when given a column name or expression as argument, count() takes in account any non-null value, and ignores null values. For your query, this lets you distinguish groups that have no match in the left joined table from groups where there are matches.
Note that this behavior is not Postgres specific, but belongs to the standard
ANSI SQL specification (all databases that I know conform to it).
Bottom line:
in general cases, uses count(*); this is more efficient, since the database does not need to check for nulls (and makes it clear to the reader of the query that you just want to know how many rows belong to the group)
in specific cases such as yours, put the relevant expression in the count()

Change the value of a sum in sql

I'm doing a query to obtain the numbers of people for a Christmas dinner.
The people include the workers and their relatives. The relatives are stored in a different table.
Children and adults eat a different menu and we organize tables by families.
I'm already using this query
select worker_name,
count(*) as total_per_family,
SUM(CASE WHEN age < 18 THEN 1 ELSE 0 END) as children,
SUM(CASE WHEN age >= 18 THEN 1 ELSE 0 END) as adults
from
(
/*subquery*/
)
group by worker_name
order by worker_name;
This query returns the number of child and adults related to the worker and count gives me the total.
The problem is that I need to add the worker to the adults sum.
Is there a way to modify adults? Either setting its initial value to 1 or adding 1 after the sum is done but before the count is obtained.
Modifying your query to read
SUM(CASE WHEN AGE>=18 THEN 1 ELSE 0 END) + 1 as adults
would probably be a first approach. The aggregate SUM() would be computed first, with 1 added thereafter as your initial suggestion indicated.

Converting Column Headers to Row elements

I have 2 tables I am combining and that works but I think I designed the second table wrong as I have a column for each item of what really is a multiple choice question. The query is this:
select Count(n.ID) as MemCount, u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
from name as n inner join
UD_Demo_ORG as u on n.ID = u.ID
where n.MEMBER_TYPE like 'ORG_%' and n.CATEGORY not like '%_2' and
(u.Pay1Click = '1' or u.PayMailCC = '1' or u.PayMailCheck = '1' or u.PayPhoneACH = '1' or u.PayPhoneCC = '1' or u.PayWuFoo = '1')
group by u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
The results come up like this:
Count Pay1Click PayMailCC PayMailCheck PayPhoneACH PayPhoneCC PayWuFoo
8 0 0 0 0 0 1
25 0 0 0 0 1 0
8 0 0 0 1 0 0
99 0 0 1 0 0 0
11 0 1 0 0 0 0
So the question is, how can I get this to 2 columns, Count and then the headers of the next 6 headers so the results look like this:
Count PaymentType
8 PayWuFoo
25 PayPhoneCC
8 PayPhoneACH
99 PayMailCheck
11 PayMailCC
Thanks.
Try this one
Select Count,
CASE WHEN Pay1Click=1 THEN 'Pay1Click'
PayMailCC=1 THEN ' PayMailCC'
PayMailCheck=1 THEN 'PayMailCheck'
PayPhoneACH=1 THEN 'PayPhoneACH'
PayPhoneCC=1 THEN 'PayPhoneCC'
PayWuFoo=1 THEN 'PayWuFoo'
END as PaymentType
FROM ......
I think indeed you made a mistake in the structure of the second table. Instead of creating a row for each multiple choice question, i would suggest transforming all those columns to a 'answer' column, so you would have the actual name of the alternative as the record in that column.
But for this, you have to change the structure of your tables, and change the way the table is populated. you should get the name of the alternative checked and put it into your table.
More on this, you could care for repetitive data in your table, so writing over and over again the same string could make your table grow larger.
if there are other things implied to the answer, other informations in the UD_Demo_ORG table, then you can normalize the table, creating a payment_dimension table or something like this, give your alternatives an ID such as
ID PaymentType OtherInfo(description, etc)...
1 PayWuFoo ...
2 PayPhoneCC ...
3 PayPhoneACH ...
4 PayMailCheck ...
5 PayMailCC ...
This is called a dimension table, and then in your records, you would have the ID of the payment type, and not the information you don't need.
So instead of a big result set, maybe you could simplify by much your query and have just
Count PaymentId
8 1
25 2
8 3
99 4
11 5
as a result set. it would make the query faster too, and if you need other information, you can then join the table and get it.
BUT if the only field you would have is the name, perhaps you could use the paymentType as the "id" in this case... just consider it. It is scalable if you separate to a dimension table.
Some references for further reading:
http://beginnersbook.com/2015/05/normalization-in-dbms/ "Normalization in DBMS"
http://searchdatamanagement.techtarget.com/answer/What-are-the-differences-between-fact-tables-and-dimension-tables-in-star-schemas "Differences between fact tables and dimensions tables"

Need help grouping a column that may or may not have the same value but have the same accounts

How do I output data stored in Table 1 so that each like account number has that also has the same CPT group's together but the ones that do not match fall to the bottom of the list?
I have one table: select * from CPTCounts and this is what is displays
Format (relevant fields only):
account OriginalCPT Count ModifiedCPT Count
11 0 71010 1
11 71010 1 0
2 0 71010 1
2 0 71020 9
2 0 73130 1
2 0 77800 1
2 71010 1 0
2 71020 8 0
2 73130 1 0
2 73610 1 0
2 G0202 4 0
31 99010 1 0
31 0 99010 4
31 0 99700 2
What I want the results to be grouped like is below... and display like this or similar.
Account OriginalCPT Count ModifiedCPT Count
11 71010 1 71010 1
2 71010 1 71010 1
2 71020 8 0
2 73130 1 0
2 73610 1 0
2 G0202 4 0
31 99010 1 99010 4
31 0 99700 2
I have one table with the values above;
Select * from #CPTCounts
The grouping I am looking for is the Original = Modified CPT and sometimes I will not have a value in one side or the other but most of the times I will have a match. I would like to place all of the unmatched ones at the bottom of the account.
any suggestions?
I was thinking of creating a second table and joining the two with the account but how do I return each value?
select cpt1.account, cpt1.originalCPT, cpt1.count, cpt2.modifiedcpt, cpt2.count
from #cptcounts cpt1
join #cptcounts cpt2 on cpt1.accont = cpt2.account
but am having trouble with that solution.
I'm not sure I have an exact solution, but perhaps some food for thought at least. The fact that you need either the "original" or the "modified" set of columns makes me think that you need a full outer join rather than a left join. You don't mention which database you are using. In MySql, for example, full joins can be emulated by means of a union of a left and a right join, as in the following:
select cpt1.account, cpt1.originalCPT, cpt1.countO, cpt2.modifiedcpt, cpt2.countM
from CPTCounts cpt1
left outer join CPTCounts cpt2
on cpt1.account = cpt2.account
and cpt1.originalCPT=cpt2.modifiedCPT
where cpt1.account is not null
and (cpt1.originalCPT is not null or cpt2.modifiedCPT is not null)
union
select cpt1.account, cpt1.originalCPT, cpt1.countO, cpt2.modifiedcpt, cpt2.countM
from CPTCounts cpt1
right outer join CPTCounts cpt2
on cpt1.account = cpt2.account
and cpt1.originalCPT=cpt2.modifiedCPT
where cpt2.account is not null
and (cpt1.originalCPT is not null or cpt2.modifiedCPT is not null)
order by originalCPT, modifiedCPT, account
The ordering brings the non-matching rows to the top, but that seemed a lesser problem than getting the matching to work.
(Your output data is a bit confusing, because the CPT 71020, for example, occurs in both original and modified columns, but you haven't shown it as one of the matching ones in your result set. I'm presuming this is because it is just an example... but if I'm wrong, then I am missing some part of your intention.)
You can play around in this SQL Fiddle.

Counting non-zero values in sql

I am trying to count total number of times that each individual column is greater than zero, grouped by the driver name. Right now I have;
SELECT drivername
, COUNT(over_rpm) AS RPMViolations
, COUNT(over_spd) AS SpdViolations
, COUNT(brake_events) AS BrakeEvents
FROM performxbydriverdata
WHERE over_rpm > 0
OR over_spd > 0
OR brake_events > 0
GROUP BY drivername
This gives me all of the non-zero values but I get a display as:
Bob Smith 62 62 62
Nathan Jones 65 65 65
etc.
I'm trying to get a count of non-zeros in each individual values.. each violation should be grouped separately.
Use NULLIF to change zero to NULL, count ignores NULL
SELECT drivername,
COUNT(NULLIF(over_rpm,0)) AS RPMViolations,
COUNT(NULLIF(over_spd,0)) AS SpdViolations,
COUNT(NULLIF(brake_events,0)) AS BrakeEvents
FROM performxbydriverdata
GROUP BY drivername;
You can probably remove the WHERE clause too with this group to improve performance
OR conditions often run badly because of matching a good index
Using HAVING (as per other answers) will remove any rows where all 3 aggregates are zero which may or may not be useful for you. You can add this if you want. Saying that, the WHERE implies that at least one row has non-zero values so you don't need both WHERE and HAVING clauses
Putting filter predicate[s] inside of a Sum() function with a case statement is a useful trick anytime you need to count items based on some predicate condition.
Select DriverName,
Sum(case When over_rpm > 0 Then 1 Else 0 End) OverRpm,
Sum(case When over_spd > 0 Then 1 Else 0 End) OverSpeed,
Sum(case When brake_events > 0 Then 1 Else 0 End) BrakeEvents,
etc.
FROM performxbydriverdata
Group By DriverName