Adding an aggregate condition to get total count of sub-group - sql

Thanks for the help on this matter, I'm new with SQL. I'm trying to get a sub-count of Jedi who had more than 2 padawans last month. I tried putting the condition in WHERE but I get an error saying I can't include aggregates in it. I also tried using a CASE but kept getting a syntax error there too. Any help on this would be incredible. Thank you so much!
SELECT COUNT(DISTINCT old_republic.jedi_id), old_republic.region_id
FROM jedi_archives.old_repulicdata old_republic
WHERE old_republic.republic_date >= '2022-06-01' AND old_republic.republic_date <= '2022-06-30' AND COUNT(old_republic.padawan)>2
GROUP BY old_republic.region_id
ORDER BY old_republic.region_id
SELECT old_republic.jedi_id CASE (
WHEN Count(old_republic.padawan)>2
THEN 1
ELSE 0 End), old_republic.region_id
FROM jedi_archives.old_repulicdata old_republic
WHERE old_republic.republic_date >= '2022-06-01' AND old_republic.republic_date <= '2022-06-30'
GROUP BY old_republic.region_id
ORDER BY old_republic.region_id

I can't comment to ask for a fiddle, but from what you've written, you're probably looking for the HAVING clause.
Assuming that padawan denotes the number of Padawans:
SELECT region_id, jedi_id, sum(padawan)
FROM jedi_archives.old_republicdata
WHERE republic_date >= '2022-06-01'
AND republic_date <= '2022-06-30'
GROUP BY region_id, jedi_id
HAVING sum(padawan) > 2;
This query will return the sum of Padawans for each Jedi per region who had more than two Padawans last month in one region (if you don't want to take the region into account, remove it from the SELECT and GROUP BY clause). Other Jedis won't appear in the result.
You can use the CASE expression, too, in order to indicate whether a Jedi had more than two padawans:
SELECT region_id, jedi_id,
CASE WHEN sum(padawan) > 2 THEN 1 ELSE 0 END AS more_than_2_padawans
FROM jedi_archives.old_republicdata
WHERE republic_date >= '2022-06-01'
AND republic_date <= '2022-06-30'
GROUP BY region_id, jedi_id;

I'm not entirely sure without sample data. But I think using the HAVING clause could solve your question.
SELECT COUNT(jedi_id) as jedi_id, region_id FROM tableA
WHERE republic_date between '2022-05-20' and '2022-05-25'
GROUP BY region_id
having padawan > 2
db fiddle

Related

How to use CASE WHEN in group by

I want to use group by for the table NRW_MONTH_DATA.
SELECT [OBJECT_ID]
,[YEAR_MONTH]
,[SELLING_AMOUNT]
,[DEFAULT_SELLING_DATA]
,[LOCK_SELLING_AMOUNT]
,[RGCB]
,[ICKZ]
,[YCKZ]
FROM [dbo].[NRW_MONTH_DATA]
IF LOCK_SELLING_AMOUNT is 0 then group by OBJECT_ID and calculate the sum of [RGCB],[ICKZ] and [YCKZ]
SELECT #SELLING_AMOUNT=(ISNULL(SUM(YCKZ),0)+ISNULL(SUM(RGCB),0)+ ISNULL(SUM(ICKZ),0))
FROM [dbo].[NRW_MONTH_DATA]
WHERE OBJECT_ID=#OBJECT_ID
AND YEAR_MONTH >=#SELLING_CENSUS_START_YM
AND YEAR_MONTH <=#SELLING_CENSUS_END_YM
GROUP BY OBJECT_ID
Now I want to add a condition that if LOCK_SELLING_AMOUNT is 1 , I need to
SELECT #SELLING_AMOUNT=ISNULL(SUM(DEFAULT_SELLING_DATA),0)
ELSE use original result to calculate the sum of the 3 columns.
I use CASE WHEN but is seems that I could not use it in group by
SELECT #SELLING_AMOUNT=
CASE LOCK_SELLING_AMOUNT WHEN 1 THEN SELLING_AMOUNT
ELSE (ISNULL(SUM(YCKZ),0)+ISNULL(SUM(RGCB),0)+ ISNULL(SUM(ICKZ),0))
END
The error is like
The column'dbo.NRW_MONTH_DATA.LOCK_SELLING_AMOUNT' in the select list is invalid because the column is not included in the aggregate function or GROUP BY clause.
Thank you in advance.
I need the group by to calculate the sum of them. Each row has an object_id and a LOCK_SELLING_AMOUNT and other columns for one month, I want to use group to calculate the sum during month span.
It works well when I do not consider the LOCK_SELLING_AMOUNT
First, you don't want GROUP BY. So just use:
SELECT #SELLING_WATER = (COALESCE(SUM(YCKZ), 0) + COALESCE(SUM(RGCB), 0)+ COALESCE(SUM(ICKZ), 0))
FROM [dbo].[NRW_MONTH_DATA]
WHERE OBJECT_ID=#OBJECT_ID AND
YEAR_MONTH >= #SELLING_CENSUS_START_YM
YEAR_MONTH <= #SELLING_CENSUS_END_YM;
Now, the problem is that a column can change values on different rows. So, what row does LOCK_SELLING_AMOUNT come from? We could assume it is the same on all rows. Or perhaps you want an aggregation function:
SELECT #SELLING_WATER = (CASE WHEN MAX(LOCK_SELLING_AMOUNT) = 1
THEN MAX(CASE WHEN LOCK_SELLING_AMOUNT = 1 THEN SELLING_AMOUNT END)
ELSE (COALESCE(SUM(YCKZ), 0) + COALESCE(SUM(RGCB), 0)+ COALESCE(SUM(ICKZ), 0))
END)
FROM [dbo].[NRW_MONTH_DATA]
WHERE OBJECT_ID=#OBJECT_ID AND
YEAR_MONTH >= #SELLING_CENSUS_START_YM
YEAR_MONTH <= #SELLING_CENSUS_END_YM;

Creating average for specific timeframe

I'm setting up a time series with each row = 1 hr.
The input data has sometimes multiple values per hour. This can vary.
Right now the specific code looks like this:
select
patientunitstayid
, generate_series(ceil(min(nursingchartoffset)/60.0),
ceil(max(nursingchartoffset)/60.0)) as hr
, avg(case when nibp_systolic >= 1 and nibp_systolic <= 250 then
nibp_systolic else null end) as nibp_systolic_avg
from nc
group by patientunitstayid
order by patientunitstayid asc;
and generates this data:
It takes the average of the entire time series for each patient instead of taking it for each hour. How can I fix this?
I'm expecting something like this:
select nc.patientunitstayid, gs.hr,
avg(case when nc.nibp_systolic >= 1 and nc.nibp_systolic <= 250
then nibp_systolic
end) as nibp_systolic_avg
from (select nc.*,
min(nursingchartoffset) over (partition by patientunitstayid) as min_nursingchartoffset,
max(nursingchartoffset) over (partition by patientunitstayid) as max_nursingchartoffset
from nc
) nc cross join lateral
generate_series(ceil(min_nursingchartoffset/60.0),
ceil(max_nursingchartoffset/60.0)
) as gs(hr)
group by nc.patientunitstayid, hr
order by nc.patientunitstayid asc, hr asc;
That is, you need to be aggregating by hr. I put this into the from clause, to highlight that this generates rows. If you are using an older version of Postgres, then you might not have lateral joins. If so, just use a subquery in the from clause.
EDIT:
You can also try:
from (select nc.*,
generate_series(ceil(min(nursingchartoffset) over (partition by patientunitstayid) / 60.0),
ceil(max(nursingchartoffset) over (partition by patientunitstayid)/ 60.0)
) hr
from nc
) nc
And adjust the references to hr in the outer query.

Getting error Invalid column name display_name in SQL Server

I am new to SQL Server, when I run the query shown below, I am getting error
Invalid column name display_name
Can anyone please help me? How can I resolve this problem?
Here is my query:
SELECT
'Nbr of RAPs' AS display_name,
MonthRap AS MonthStart,
COUNT(DISTINCT tb_Episode.id) AS total
FROM
tb_Episode
WHERE
(BranchID = '244' or BranchID = '242' or BranchID = '240' or BranchID = '243')
AND RAPClaimDate IS NOT NULL
AND RAPClaimDate >= '2017-01-01'
AND RAPClaimDate < '2018-2-01'
AND tb_Episode.CustID = '27'
AND tb_Episode.DTR >= 0
AND tb_Episode.DTR <= '30'
AND tb_Episode.PayorType = 'Medicare Traditional'
GROUP BY
display_name, MonthRap
ORDER BY
tb_Episode.MonthRap ASC
It is a constant, so you do not need to aggregate by it. The reason for this error is that column aliases are not allowed in the group by. There are other issues as well with the query. I would suggest:
SELECT 'Nbr of RAPs' as display_name,
e.MonthRap as MonthStart,
count(*) as total
FROM tb_Episode e
WHERE e.BranchID IN (244, 242, 240, 243) AND
e.RAPClaimDate is not null AND
e.RAPClaimDate >= '2017-01-01' AND
e.RAPClaimDate < '2018-02-01' AND
e.tb_Episode.CustID = 27 AND
e.tb_Episode.DTR >= 0 AND
e.tb_Episode.DTR <= 30 AND
e.tb_Episode.PayorType = 'Medicare Traditional'
GROUP BY MonthRap
ORDER BY e.MonthRap asc;
Notes:
I removed the DISTINCT. Presumably, id is already unique in the table. If it is not, then use DISTINCT. However, COUNT(DISTINCT) is typically a bit slower than COUNT(*).
I replaced the first condition with IN -- simpler to read to to write.
I removed single quotes around what look like number constants. Don't turn numbers into strings, if you intend a number.
I changed the format of the dates to be YYYY-MM-DD. Use a fixed length format, rather than dispensing with 0s for low numbered months and days.
I also added a table alias and qualified all column names.
The query should be :
SELECT 'Nbr of RAPs' as display_name,
MonthRap as MonthStart,
count(DISTINCT tb_Episode.id) as total
FROM tb_Episode
WHERE (BranchID = '244' or BranchID = '242' or BranchID = '240' or BranchID = '243')
AND RAPClaimDate is not null
AND RAPClaimDate >= '2017-01-01'
AND RAPClaimDate < '2018-2-01'
AND tb_Episode.CustID = '27'
AND tb_Episode.DTR >= 0
AND tb_Episode.DTR <= '30'
AND tb_Episode.PayorType = 'Medicare Traditional'
GROUP BY MonthRap
ORDER BY tb_Episode.MonthRap asc
Please notice the aliased column 'display_name' is removed from the group by. You can't add the column name display_name in group by because it is created and aliased in the SELECT query.

Using boolean logic inside SUM function

In SQL Server 2008, this query works:
SELECT
SUM(CAST(isredeemed AS TINYINT)) AS totalredeemed,
FROM rewards
GROUP BY merchantid
It gives you the number of redeemed rewards by merchant. The TINYINT cast is need to avoid the error Operand data type bit is invalid for sum operator.
Now I'd like to do a similar query, but one that only finds rewards redeemed in the last few days. I tried this:
SELECT
SUM(CAST((isredeemed & ( MIN(dateredeemed) > '2014-01-10 05:00:00')) AS TINYINT)) AS claimedthisweek,
FROM rewards
GROUP BY merchantid
and I get the error
Incorrect syntax near '>'.
I also tried replacing & with && and also with AND. But those don't work either.
How can I make the example work?
This question lacks detail to give an exact answer but you need to use a derived table or subquery for the calculation.
something like this
SELECT r1.merchantid, r2.claimedthisweek
FROM rewards r1
JOIN (
SELECT merchantid, SUM(CAST(isredeemed AS INT)) claimedthisweek
FROM rewards
GROUP BY merchantid
HAVING MIN(dateredeemed) > '20140101'
) r2
This should be on the having clause, like this:
SELECT
SUM(CAST((isredeemed AS TINYINT)) AS claimedthisweek
FROM rewards
GROUP BY merchantid
HAVING MIN(dateredeemed) > '2014-01-10 05:00:00'
Any reason not to do the filtering at the "where" clause level? That should work so long as all rows you're aggregating match the same criteria:
SELECT
SUM(CAST(isredeemed AS TINYINT)) AS claimedthisweek
FROM rewards
WHERE dateredeemed > '2014-01-10 05:00:00'
GROUP BY merchantid
I think you want this would work but it would count some twice:
SELECT
SUM(case when MIN(dateredeemed) > '2014-01-10 05:00:00' then 1 else 0 end )
FROM rewards
GROUP BY merchantid

alternatives to "Having"

I have a SELECT statement that counts the number of instances and then saves in a variable. It has a HAVING clause that does a SUM and a COUNT. However since you have to have a GROUP BY in order to use having, the select statement returns 4 lines that are 1 instead of the total being 4. This then doesn't save the count into the variable as 4 but as 1 which obviously is not what I need so I am looking for an alternative work around.
select count(distinct p1.community)
from
"Database".prospect p1
where
p1.visit_date >= '2013-07-01'
and p1.visit_date <= '2013-09-30'
and p1.division = '61'
group By
p1.community
having
sum(p1.status_1) / count(p1.control_code) >= .16;
This is a reasonable alternative:
select count(*)
from (
select p1.community , sum(p1.status_1) / count(p1.control_code) SomeColumn
from
"Database".prospect p1
where
p1.visit_date >= '2013-07-01'
and p1.visit_date <= '2013-09-30'
and p1.division = '61'
Group By
p1.community
) A
where A.SomeColumn >= .16;