I'm working with a data set that looks similar to the following:
Name Value
Unit 1 0
Unit 1 27
Unit 1 30
Unit 1 10
Unit 1 4
Unit 1 0
Unit 2 0
Unit 2 0
Unit 2 29
Unit 2 0
Unit 3 10
and so on. I would like to create a query that lists the records as follows:
Name ZeroRecords
Unit 1 2
Unit 2 3
Unit 3 0
Where I can list the number of records that are 0. I've tried using a totals row counting Value's, with a criteria of "=0" but it just turns up blank.
I'm sure this is much easier to do with SQL but I am not very familiar.
Any suggestions?
You could consider grouping your records by the Name property along with a COUNT() aggregate to get the count for each group and if you filter it down to only check the Value columns that are zero, you could use :
SELECT Name,
COUNT(*) AS ZeroRecords
FROM YourTable
WHERE Value = 0
GROUP BY Name
You can use conditional aggregation. In MS Access, this looks like:
select name, sum(iif(value = 0, 1, 0)) as numzeros
from t
group by name;
Related
I've run into a subtlety around count(*) and join, and a hoping to get some confirmation that I've figured out what's going on correctly. For background, we commonly convert continuous timeline data into discrete bins, such as hours. And since we don't want gaps for bins with no content, we'll use generate_series to synthesize the buckets we want values for. If there's no entry for, say 10AM, fine, we stil get a result. However, I noticed that I'm sometimes getting 1 instead of 0. Here's what I'm trying to confirm:
The count is 1 if you count the "grid" series, and 0 if you count the data table.
This only has to do with count, and no other aggregate.
The code below sets up some sample data to show what I'm talking about:
DROP TABLE IF EXISTS analytics.measurement_table CASCADE;
CREATE TABLE IF NOT EXISTS analytics.measurement_table (
hour smallint NOT NULL DEFAULT NULL,
measurement smallint NOT NULL DEFAULT NULL
);
INSERT INTO measurement_table (hour, measurement)
VALUES ( 0, 1),
( 1, 1), ( 1, 1),
(10, 2), (10, 3), (10, 5);
Here are the goal results for the query. I'm using 12 hours to keep the example results shorter.
Hour Count sum
0 1 1
1 2 2
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 3 10
11 0 0
12 0 0
This works correctly:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(measurement_table.hour) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (measurement_table.hour = hour_series.hour)
GROUP BY 1
ORDER BY 1
This returns misleading 1's on the match:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(*) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (hour_series.hour = measurement_table.hour)
GROUP BY 1
ORDER BY 1
0 1 1
1 2 2
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 3 10
11 1 0
12 1 0
The only difference between these two examples is the count term:
count(*) -- A result of 1 on no match, and a correct count otherwise.
count(joined to table field) -- 0 on no match, correct count otherwise.
That seems to be it, you've got to make it explicit that you're counting the data table. Otherwise, you get a count of 1 since the series data is matching once. Is this a nuance of joinining, or a nuance of count in Postgres?
Does this impact any other aggrgate? It seems like it sholdn't.
P.S. generate_series is just about the best thing ever.
You figured out the problem correctly: count() behaves differently depending on the argument is is given.
count(*) counts how many rows belong to the group. This just cannot be 0 since there is always at least one row in a group (otherwise, there would be no group).
On the other hand, when given a column name or expression as argument, count() takes in account any non-null value, and ignores null values. For your query, this lets you distinguish groups that have no match in the left joined table from groups where there are matches.
Note that this behavior is not Postgres specific, but belongs to the standard
ANSI SQL specification (all databases that I know conform to it).
Bottom line:
in general cases, uses count(*); this is more efficient, since the database does not need to check for nulls (and makes it clear to the reader of the query that you just want to know how many rows belong to the group)
in specific cases such as yours, put the relevant expression in the count()
I have a query, let's call it qry_01, that produces a set of data similar to this:
ID N CN Sum
1 4 0 0
2 3 3 3
5 4 4 7
8 3 3 10
The values shown in this query actually come from a chain of queries and from a bunch of different tables.
The corrected value CN is calculated within the query, and counts N if the ID is not 1, and 0 if it is 1.
The Sum is the value I want to calculate by progressively summing up the CN values.
I tried to use DSUM, but I came out with nothing.
Can anyone please help me?
You could use a correlated subquery in the following way:
select t.id, t.n, t.cn, (select sum(u.cn) from qry_01 u where u.id <= t.id) as [sum]
from qry_01 t
I need to count the number of times that a specific string occurs but they when one ID has the same string more than once, only count it once. Basically, I need to count the number of occurrences of a string that occur uniquely to an ID. I believe this should be a simple thing to do but I don't know what I'm doing. Here is my current code:
SELECT
RXNAME as Name,
DUPERSID as ID,
COUNT(RXNAME) as Number
FROM
`OmniHealth.PrescriptionsMEPS`
GROUP BY
ID,
Name
ORDER BY
Number
When run, it says everything was counted as 1. Thanks for the help!
UPDATE:
Dataset: https://storage.googleapis.com/omnihealth/MepsPrescriptionData.csv
OUTPUT when run with code above:
Row Name ID Number
1 SUMATRIPTAN 68896102 1
2 IBUPROFEN 65063102 1
3 PENICILLN VK 66179101 1
4 FUROSEMIDE 63217102 1
5 HYSINGLA ER 70373101 1
6 FUROSEMIDE 76090101 1
7 SKELETAL MUSCLE RELAXANTS 78414101 1
8 AMOXICILLIN 69467103 1
9 TRAMADOL HCL 67667101 1
10 PANTOPRAZOLE 60737102 1
11 CARBAMIDE PEROXIDE 6.5% OTIC SOLN 63990104 1
12 PROMETH/COD 68433101 1
13 AZITHROMYCIN 79045102 1
14 METRONIDAZOL 75414101 1
15 DEXILANT 69625101 1
16 TRAMADOL HCL 66890203 1
17 AZITHROMYCIN 73838101 1
18 COLCRYS 63856102 1
19 PERMETHRIN 62103107 1
20 ACETAMINOPHEN TAB 500 MG 62456102 1
not sure if it is what you asked - but if you are looking for DISTINCT COUNT - go with below:
#standardSQL
SELECT
RXNAME AS Name,
COUNT(DISTINCT DUPERSID) AS Number
FROM `OmniHealth.PrescriptionsMEPS`
GROUP BY 1
ORDER BY Number DESC
Try this...You are grouping on a different field than you are counting. I think you are meaning to group by RXNAME.
SELECT
RXNAME as Name,
DUPERSID as ID,
COUNT(RXNAME) as Number
FROM
`OmniHealth.PrescriptionsMEPS`
GROUP BY
ID,
RXNAME
ORDER BY
Number
I think you want:
SELECT DUPERSID as ID, COUNT(DISTINCT RXNAME) as Number
FROM `OmniHealth.PrescriptionsMEPS`
GROUP BY ID
ORDER BY Number;
This assumes that "same string" means "same value for RXNAME".
I am using Access with a table having over 200k rows of data. I am looking for counts on a column which is broken down by job descriptions. For example, I want to return the total count (id) for a location where a person is status = "active" and position like "cook" [should equal 20] also another output where I get a count (id) for the same location where a person is status = "active" and position = "Lead Cook" [should equal 5]. So, one is a partial of the total population.
I have a few others to do just like this (# Bakers, # Lead Bakers...). How can I do this with one grand query/subquery or one query for each grouping.
My attempt is more like this:
SELECT
a.location,
Count(a.EMPLOYEE_NUMBER) AS [# Cook Total], --- should equal 20
(SELECT count(b.EMPLOYEE_ID) FROM Table_abc AS b where b.STATUS="Active Assignment" AND b.POSITION Like "*cook*" AND b.EMPLOYEE_ID=a.EMPLOYEE_ID) AS [# Lead Cook], --- should equal 5
FROM Table_abc AS a
ORDER BY a.location;
Results should be similar to:
Location Total Cooks Lead Cooks Total Bakers Lead Bakers
1 20 4 15 2
2 45 7 12 2
3 22 2 16 1
4 19 2 17 2
5 5 1 9 1
Try using conditional aggregation -- no need for sub queries.
Something like this should work (although I may not understand your desired results completely):
select location,
count(EMPLOYEE_NUMBER) as CookTotal,
sum(IIf(POSITION Like "*cook*",1,0)) as AllCooks,
sum(IIf(POSITION = "Lead Cook",1,0)) as LeadCooks
from Table_abc
where STATUS="Active Assignment"
group by location
I have result set like -
id achieved
1 0
2 1
3 1
4 0
5 0
The Percentage should be 2/5 i.e. 40 %. How can I write a SQL Query to achieve something like this ? I would prefer not to use and nested select as the actual query is already doing quite a bit. Thanks !
select avg(achieved) from ...
Note that you will have to use a group by function to include categories:
select gender, avg(achieved) from ... group by gender