How to create an additional column with the percentages related to a count distinct statement - sql

I'm trying to query each distinct medical speciality (e.g. oncologist, pediatrician, etc.) in a table and then count the number of times a claim (claim_id) is linked to it, which I've done using this:
select distinct specialization, count(distinct claim_id) AS Claim_Totals
from table1
group by specialization
order by Claim_Totals DESC
However, I also want to include an additional column which lists the % that each speciality makes up in the table (based on the number of claim_id related to it). So for instance, if there were 100 total claims and "cardiologist" had 25 claim_id records related to it, "oncologist" had 15, "general surgeon" had 10, and so forth, I want the output to look like this:
specialization | Claims_Totals | PERCENTAGE
___________________________________________
cardiologist 25 25%
oncologist 15 15%
general surgeon 10 10%

Could do this? I'm not familiar with Barbaros's syntax. If that works its more concise and better.
select specialization, count(distinct claim_id) AS Claim_Totals, count(distinct claim_id)/total_claims
from table1
INNER JOIN ( SELECT COUNT(DISTINCT claim_id)*1.0000 total_claims AS total_claims
FROM table1 ) TMP
ON 1 = 1
group by specialization
order by Claim_Totals DESC
select specialization,
count(distinct claim_id) AS claim_by_spec,
count(distinct claim_id)/
( SELECT COUNT(DISTINCT claim_id)*1.0000
FROM table1 ) AS percentage_calc
from table1
group by specialization
order by Claim_Totals DESC

You can use sum(count(distinct)) over() to get the overall claims and use it in the denominator to get the percentage.
select specialization
,count(distinct claim_id) AS Claim_Totals
,round(100*count(distinct claim_id)/sum(count(distinct claim_id)) over(),3) as percentage
from table1
group by specialization

You can use
,concat_ws('',count(distinct claim_id),'%') as percentage
or
,concat(count(distinct claim_id),'%') as percentage
as added to the select list's tail
Btw, distinct before specialization in the select list is redundant, since already included in the group by list.

Because you are using count(distinct), window functions are less useful. You can try:
select t1.specialization,
count(distinct t1.claim_id) AS Claim_Totals,
count(distinct t1.claim_id) / tt1.num_claims
from table1 t1 cross join
(select count(distinct claim_id) as num_claims
from table1
) tt1
group by t1.specialization
order by Claim_Totals DESC

Related

Can I Select DISTINCT on 2 columns and Sum grouped by 1 column in one query?

Is it possible to write one query, where I would group by 2 columns in a table to get the count of total members plus get a sum of one column in that same table, but grouped by one column?
For example, the data looks like this
I want to get a count on distinct combinations of columns "OHID" and "MemID" and get the SUM of the "Amount" column grouped by OHID. The result is supposed to look like this
I was able to get the count correct using this query below
SELECT count(*) as TotCount
from (Select DISTINCT OHID, MemID
from #temp) AS TotMembers
However, when I try to use this query below to get all the results together, I am getting a count of 15 and a totally different total sum.
SELECT t.OHID,
count(TotMembers.MemID) as TotCount,
sum(t.Amount) as TotalAmount
from (Select DISTINCT OHID, MemID
from #temp) AS TotMembers
join #temp t on t.OHID = TotMembers .OHID
GROUP by t.OHID
If I understand correctly, you want to consider NULL as a valid value. The rest is just aggregation:
select t.ohid,
(count(distinct t.memid) +
(case when count(*) <> count(t.memid) then 1 else 0 end)
) as num_memid,
sum(t.amount) as total_amount
from #temp t
group by t.ohid,
The case logic might be a bit off-putting. It is just adding 1 if any values are NULL.
You might find this easier to follow with two levels of aggregation:
select t.ohid, count(*), sum(amount)
from (select t.ohid, t.memid, sum(t.amount) as amount
from #temp t
group by t.ohid, t.memid
) t
group by t.ohid

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

How to limit duplicated rows

I would like help regarding an SQL query.
Looking around the site, I found several code snippets to return duplicate rows.
Here is the one I went with:
select unumber, name, localid
from table1
where unumber
in (select unumber from table1 group by unumber having count (*) > 1 )
order by unumber
which works fine, however, in the table I have other columns as well, like timestamp etc.
As such, when I run the query I indeed get the duplicate rows, however, I get the duplicates several times due to different timestamps for example.
Is there any way to limit the results to 'unique' duplicate rows only?
Hope this makes sense!
Thank you in advance!
For what you describe, you can just use select distinct:
select distinct unumber, name, localid
from table1
where unumber in (select unumber from table1 group by unumber having count (*) > 1 )
order by unumber;
However, I would be more likely to write this using window functions:
select unumber, name, localid
from (select t1.*,
count(*) over (partition by unumber) as cnt,
row_number() over (partition by unumber, name, localid order by unumber) as seqnum
from table1 t1
) t1
where cnt > 1 and seqnum = 1;

SQL query - percentage of sub sample

I got a SQL statement:
Select
ID, GroupID, Profit
From table
I now want to add a fourth column percentage of group profits.
Therefore the query should sum all the profits for the same group id and then have that number divided by the profit for the unique ID.
Is there a way to do this? The regular sum function does not seem to do the trick.
Thanks
select t1.ID,
t1. GroupID,
(t1.Profit * 1.0) / t2.grp_profit as percentage_profit
from table t1
inner join
(
select GroupID, sum(Profit) as grp_profit
from table
group by GroupID
) t2 on t1.groupid = t2.groupid
One more option with window function
select ID, GroupID, Profit * 1. / SUM(profit) OVER(PARTITION BY GroupID)
from t1
An alternative solution using scalar sub-queries is as follows:
select t1.ID, t1.GroupID, (select sum(t2.Profit) * 1.0 / t1.Profit
from table t2
where t2.GroupID = t1.GroupID) as percentage_profit
from table t1;
To provide an alternate answer, albeit less efficient, is to use a scalar subquery.
SELECT ID, GroupId, Profit, (Profit/(SELECT sum(Profit)
FROM my_table
WHERE GroupId= mt.GroupId))*100 as pct
FROM my_table as mt
From the way it reads I'm not sure if you want "percentage of group profits" or you or want group_profit / individual profit
That's the way this sounds "Therefore the query should sum all the profits for the same group id and then have that number divided by the profit for the unique ID"
Either way just switch the divisor for what you want!
Also if you're using Postgresql >= 8.4 you can use a window function.
SELECT ID, GroupId, Profit, (Profit/ (sum(Profit) OVER(partition by GroupId)))*100 as pct
FROM core_dev.my_table as mt

Distinct Aggregate Query in Microsoft Access with Group By on a Different Field

Among the fields in my table are 2 fields, CTN_NUM and PO_NUM. Each PO_NUM has at least one CTN_NUM, possibly more—distinct or repeated. Any given CTN_NUM cannot have more than 1 PO_NUM. In other words, it's a one to many relationship. I want to create a query which shows the number of unique CTN_NUM's per PO_NUM. I've seen other threads on this forum as well as http://blogs.office.com/b/microsoft-access/archive/2007/09/19/writing-a-count-distinct-query-in-access.aspx and none seem to address this exact issue.
Here's what I tried:
A)
SELECT PO_NUM, Count(CTN_NUM) AS CountOfCTN_NUM
FROM tempSpring_ASN
GROUP BY PO_NUM;
This returns the count of ALL CTN_NUMs per PO_NUM, even if they are not unique.
B)
SELECT PO_NUM, Count(DISTINCT CTN_NUM) AS CountOfCTN_NUM
FROM tempSpring_ASN
GROUP BY PO_NUM;
While this may work in other RDBMS’s, in Access I get a syntax error.
C)
SELECT COUNT(*)
FROM
(SELECT DISTINCT CTN_NUM AS cn
FROM tempSpring_ASN);
This returns the sum of CTN_NUM’s which are distinct to the table, not distinct to the PO_NUM.
D) Same as C, but with a GROUP BY:
SELECT COUNT(*)
FROM
(SELECT DISTINCT CTN_NUM AS cn
FROM tempSpring_ASN)
GROUP BY PO_NUM;
This prompts me for the PO_NUM.
Can you please advise? Thanks.
Try this one:
SELECT PO_NUM, COUNT(*) AS CountOfCTN_NUM
FROM (
SELECT PO_NUM, CTN_NUM
FROM tempSpring_ASN
GROUP BY PO_NUM, CTN_NUM
) a
GROUP BY PO_NUM;
This should also work:
SELECT PO_NUM, COUNT(*) AS CountOfCTN_NUM
FROM (
SELECT DISTINCT PO_NUM, CTN_NUM
FROM tempSpring_ASN
) a
GROUP BY PO_NUM;
Letter D has a syntax error on it. PO_NUM is not present on subquery so you can't use group by on it. It might be that the subquery misses PO_NUM.