Distinct Aggregate Query in Microsoft Access with Group By on a Different Field - sql

Among the fields in my table are 2 fields, CTN_NUM and PO_NUM. Each PO_NUM has at least one CTN_NUM, possibly more—distinct or repeated. Any given CTN_NUM cannot have more than 1 PO_NUM. In other words, it's a one to many relationship. I want to create a query which shows the number of unique CTN_NUM's per PO_NUM. I've seen other threads on this forum as well as http://blogs.office.com/b/microsoft-access/archive/2007/09/19/writing-a-count-distinct-query-in-access.aspx and none seem to address this exact issue.
Here's what I tried:
A)
SELECT PO_NUM, Count(CTN_NUM) AS CountOfCTN_NUM
FROM tempSpring_ASN
GROUP BY PO_NUM;
This returns the count of ALL CTN_NUMs per PO_NUM, even if they are not unique.
B)
SELECT PO_NUM, Count(DISTINCT CTN_NUM) AS CountOfCTN_NUM
FROM tempSpring_ASN
GROUP BY PO_NUM;
While this may work in other RDBMS’s, in Access I get a syntax error.
C)
SELECT COUNT(*)
FROM
(SELECT DISTINCT CTN_NUM AS cn
FROM tempSpring_ASN);
This returns the sum of CTN_NUM’s which are distinct to the table, not distinct to the PO_NUM.
D) Same as C, but with a GROUP BY:
SELECT COUNT(*)
FROM
(SELECT DISTINCT CTN_NUM AS cn
FROM tempSpring_ASN)
GROUP BY PO_NUM;
This prompts me for the PO_NUM.
Can you please advise? Thanks.

Try this one:
SELECT PO_NUM, COUNT(*) AS CountOfCTN_NUM
FROM (
SELECT PO_NUM, CTN_NUM
FROM tempSpring_ASN
GROUP BY PO_NUM, CTN_NUM
) a
GROUP BY PO_NUM;
This should also work:
SELECT PO_NUM, COUNT(*) AS CountOfCTN_NUM
FROM (
SELECT DISTINCT PO_NUM, CTN_NUM
FROM tempSpring_ASN
) a
GROUP BY PO_NUM;

Letter D has a syntax error on it. PO_NUM is not present on subquery so you can't use group by on it. It might be that the subquery misses PO_NUM.

Related

how do I do a count distinct group by in ms access?

SELECT [dataset].[ID], count(distinct([dataset].dates)) AS distinct_dates
FROM [dataset]
GROUP BY [dataset].[ID];
this gives the error UnDefined function Distinct in expression
You can use two levels of aggregation:
SELECT [dataset].[ID], COUNT(*) AS distinct_dates
FROM (SELECT DISTINCT id, dates
FROM [dataset]
) as d
GROUP BY [dataset].[ID];
Strictly speaking this is not 100% the same, because this would count NULL as a separate value (but count(distinct) ignores NULL values). So, to be identical, you would need to use:
SELECT [dataset].[ID], COUNT(dates) AS distinct_dates

best way to get count and distinct count of rows in single query

What is the best way to get count of rows and distinct rows in a single query?
To get distinct count we can use subquery like this:
select count(*) from
(
select distinct * from table
)
I have 15+ columns and have many duplicates rows as well and I want to calculate count of rows as well as distinct count of rows in one query.
More if I use this
select count(*) as Rowcount , count(distinct *) as DistinctCount from table
This will not give accurate results as count(distinct *) doesn't work.
Why don't you just put the subquery inside another query?
select count(*),
(select count(*) from (select distinct * from table))
from table;
create table tbl
(
col int
);
insert into tbl values(1),(2),(1),(3);
select count(*) as distinct_count, sum(sum) as all_count
from (
select count(col) sum from tbl group by col
)A
I think I have understood what you are looking for. You need to use some window function. So, you query should be look like =>
Select COUNT(*) OVER() YourRowcount ,
COUNT(*) OVER(Partition BY YourColumnofGroup) YourDistinctCount --Basic of the distinct count
FROM Yourtable
NEW Update
select top 1
COUNT(*) OVER() YourRowcount,
DENSE_RANK() OVER(ORDER BY YourColumn) YourDistinctCount
FROM Yourtable ORDER BY TT DESC
Note: This code is written sql server. Please check the code and let me know.

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

How to create an additional column with the percentages related to a count distinct statement

I'm trying to query each distinct medical speciality (e.g. oncologist, pediatrician, etc.) in a table and then count the number of times a claim (claim_id) is linked to it, which I've done using this:
select distinct specialization, count(distinct claim_id) AS Claim_Totals
from table1
group by specialization
order by Claim_Totals DESC
However, I also want to include an additional column which lists the % that each speciality makes up in the table (based on the number of claim_id related to it). So for instance, if there were 100 total claims and "cardiologist" had 25 claim_id records related to it, "oncologist" had 15, "general surgeon" had 10, and so forth, I want the output to look like this:
specialization | Claims_Totals | PERCENTAGE
___________________________________________
cardiologist 25 25%
oncologist 15 15%
general surgeon 10 10%
Could do this? I'm not familiar with Barbaros's syntax. If that works its more concise and better.
select specialization, count(distinct claim_id) AS Claim_Totals, count(distinct claim_id)/total_claims
from table1
INNER JOIN ( SELECT COUNT(DISTINCT claim_id)*1.0000 total_claims AS total_claims
FROM table1 ) TMP
ON 1 = 1
group by specialization
order by Claim_Totals DESC
select specialization,
count(distinct claim_id) AS claim_by_spec,
count(distinct claim_id)/
( SELECT COUNT(DISTINCT claim_id)*1.0000
FROM table1 ) AS percentage_calc
from table1
group by specialization
order by Claim_Totals DESC
You can use sum(count(distinct)) over() to get the overall claims and use it in the denominator to get the percentage.
select specialization
,count(distinct claim_id) AS Claim_Totals
,round(100*count(distinct claim_id)/sum(count(distinct claim_id)) over(),3) as percentage
from table1
group by specialization
You can use
,concat_ws('',count(distinct claim_id),'%') as percentage
or
,concat(count(distinct claim_id),'%') as percentage
as added to the select list's tail
Btw, distinct before specialization in the select list is redundant, since already included in the group by list.
Because you are using count(distinct), window functions are less useful. You can try:
select t1.specialization,
count(distinct t1.claim_id) AS Claim_Totals,
count(distinct t1.claim_id) / tt1.num_claims
from table1 t1 cross join
(select count(distinct claim_id) as num_claims
from table1
) tt1
group by t1.specialization
order by Claim_Totals DESC

Over clause in SQL Server

I have the following query
select * from
(
SELECT distinct
rx.patid
,rx.fillDate
,rx.scriptEndDate
,MAX(datediff(day, rx.filldate, rx.scriptenddate)) AS longestScript
,rx.drugClass
,COUNT(rx.drugName) over(partition by rx.patid,rx.fillDate,rx.drugclass) as distinctFamilies
FROM [I 3 SCI control].dbo.rx
where rx.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
GROUP BY rx.patid, rx.fillDate, rx.scriptEndDate,rx.drugName,rx.drugClass
) r
order by distinctFamilies desc
which produces results that look like
This should mean that between the two dates in the table the patID that there should be 5 unique drug names. However, when I run the following query:
select distinct *
from rx
where patid = 1358801781 and fillDate between '2008-10-17' and '2008-11-16' and drugClass='H4B'
I have a result set returned that looks like
You can see that while there are in fact five rows returned for the second query between the dates of 2008-10-17 and 2009-01-15, there are only three unique names. I've tried various ways of modifying the over clause, all with different levels of non-success. How can I alter my query so that I only find unique drugNames within the timeframe specified for each row?
Taking a shot at it:
SELECT DISTINCT
patid,
fillDate,
scriptEndDate,
MAX(DATEDIFF(day, fillDate, scriptEndDate)) AS longestScript,
drugClass,
MAX(rn) OVER(PARTITION BY patid, fillDate, drugClass) as distinctFamilies
FROM (
SELECT patid, fillDate, scriptEndDate, drugClass,rx.drugName,
DENSE_RANK() OVER(PARTITION BY patid, fillDate, drugClass ORDER BY drugName) as rn
FROM [I 3 SCI control].dbo.rx
WHERE drugClass IN ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
)x
GROUP BY x.patid, x.fillDate, x.scriptEndDate,x.drugName,x.drugClass,x.rn
ORDER BY distinctFamilies DESC
Not sure if DISTINCT is really necessary - left it in since you've used it.