sql aggregations - sql

any ideas why this doesn't work?
select [column_name_2], max(count(distinct([column_name_1])))
from [table_name]
group by [column_name_2]
but it works if done like this
select [column_name_2], count(distinct([column_name_1])) as [x]
into #temp_table
from [table_name]
group by [column_name_2]
select max(x)
from #temp_table

Well, that's just the way SQL (the language) is defined to work. When you use GROUP BY, the corresponding SELECT list will produce a row for each group in the result. You're trying to take that result and aggregate twice, once with GROUP BY [column_name_2] and a second time with GROUP BY (), as defined by standard SQL. We can't do that in the same query expression.
The good news is you can break this up into more than one query expression:
WITH cte1 AS (
SELECT count(distinct([column_name_1])) AS cnt
FROM [table_name]
GROUP BY [column_name_2]
)
SELECT MAX(cnt) FROM cte1
;
or use a derived table.
You can even order the initial query result by cnt DESC and limit the result to the first row.
In your case, you may not want just the MAX, but also the other column.
With SQL Server, which you may be using. Note: You should add a database specific tag to the question.
SELECT TOP 1 [column_name_2], count(distinct([column_name_1])) AS cnt
FROM [table_name]
GROUP BY [column_name_2]
ORDER BY cnt DESC
;

I don't understand "this doesn't work", what were you expecting and what did you get? Normally you include the GROUP BY value in the result set. So it would be:
select [column_name_2], max(cnt) cnt
from (select [column_name_2], count(distinct [column_name_1]) cnt
from [table_name]
group by [column_name_2]) x
group by [column_name_2]
Ok, after reading your comment I think above is what you are looking for.

SELECT [column_name_two]
, max(x) x
FROM (
SELECT [column_name_two]
, COUNT(DISTINCT [column_name_one]
FROM table_name
GROUP BY [column_name_two]
) AS Tbl
GROUP BY [column_name_two]

Related

best way to get count and distinct count of rows in single query

What is the best way to get count of rows and distinct rows in a single query?
To get distinct count we can use subquery like this:
select count(*) from
(
select distinct * from table
)
I have 15+ columns and have many duplicates rows as well and I want to calculate count of rows as well as distinct count of rows in one query.
More if I use this
select count(*) as Rowcount , count(distinct *) as DistinctCount from table
This will not give accurate results as count(distinct *) doesn't work.
Why don't you just put the subquery inside another query?
select count(*),
(select count(*) from (select distinct * from table))
from table;
create table tbl
(
col int
);
insert into tbl values(1),(2),(1),(3);
select count(*) as distinct_count, sum(sum) as all_count
from (
select count(col) sum from tbl group by col
)A
I think I have understood what you are looking for. You need to use some window function. So, you query should be look like =>
Select COUNT(*) OVER() YourRowcount ,
COUNT(*) OVER(Partition BY YourColumnofGroup) YourDistinctCount --Basic of the distinct count
FROM Yourtable
NEW Update
select top 1
COUNT(*) OVER() YourRowcount,
DENSE_RANK() OVER(ORDER BY YourColumn) YourDistinctCount
FROM Yourtable ORDER BY TT DESC
Note: This code is written sql server. Please check the code and let me know.

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

How to use to functions - MAX(smthng) and after COUNT(MAX(smthng)

I don't understand why I can't use this in my code :
SELECT MAX(SMTHNG), COUNT(MAX(SMTHNG))
FROM SomeTable;
Searched for an answer but didn't find it in documentation about these aggregate functions.
Also I get an SQL-compiler error "Invalid column name "SMTHNG"".
You want to know what the maximum SMTHNG in the table is with:
SELECT MAX(SMTHNG) FROM SomeTable;
This is an aggregation without GROUP BY and hence results in one single row containing the maximum SMTHNG.
Now you also want to know how often this SMTHNG occurs and you add COUNT(MAX(SMTHNG)). This, however, does not work, because you can not aggregate an aggregate directly.
This doesn't work either:
SELECT ANY_VALUE(max_smthng), COUNT(*)
FROM (SELECT MAX(smthng) AS max_smthng FROM sometable) t;
because the sub query only contains one row, so it's too late to count.
So, either use a sub query and select from the table again:
SELECT ANY_VALUE(smthng), COUNT(*)
FROM sometable
WHERE smthng = (SELECT MAX(smthng) FROM sometable);
Or count per SMTHNG before looking for the maximum. Here is how to get the counts:
SELECT smthng, COUNT(*)
FROM sometable
GROUP BY smthng;
And the easiest way to get the maximum from this result is:
SELECT TOP(1) smthng, COUNT(*)
FROM sometable
GROUP BY smthng
ORDER BY COUNT(*) DESC;
First of all, please read my comment.
Depending on what you're trying to achieve, the statement have to be changed.
If you want to count the highest values in SMTHNG field, you may try this:
SELECT T1.SMTHNG, COUNT(T1.SMTHNG)
FROM SomeTable T1 INNER JOIN
(
SELECT MAX(SMTHNG) AS A
FROM SomeTable
) T2 ON T1.SMTHNG = T2.A
GROUP BY T1.SMTHNG;
use cte like below or subquery
with cte as
(
select count(*) as cnt ,col from table_name
group by col
) select max(cnt) from cte
you can not use double aggregate function at a time on same column

MAX and COUNT function doesn't work together

I want to count id_r and then return the maxim value of count using
MAX(COUNT(id_r))
but shows me this error
the error
Thanks :)
You can only use one aggregation function at a time.
The ANSI standard way to do what you want is:
select count(*)
from t
group by ?
order by count(*) desc
fetch first 1 row only;
Or alternatively a subquery:
select max(cnt)
from (select count(*) as cnt
from t
group by ?
) x;
Note that you want a group by of something, perhaps id_r.
Try this:
SELECT MAX(e1) as Expr1 FROM (
SELECT COUNT(id_r) as e1
FROM Angajat) as t1
COUNT(id_r) wil return only 1 result since there is no group by clause. Hence, there is no use of max.
You need to add a group by clause in subquery:
SELECT MAX(e1) as Expr1 FROM (
SELECT column1, COUNT(id_r) as e1
FROM Angajat
GROUP BY column1
) as t1

How to avoid duplicating statements with grouping functions?

I have a t-sql query where sum function is duplicated.
How to avoid duplicating those statements?
select
Id,
sum(Value)
from
SomeTable
group by
Id
having
sum(Value) > 1000
It look like table aliasing is not supported.
I think with should work:
with tmptable (id,sumv)
as
(select
Id,
sum(Value) as sumv
from
SomeTable
group by
Id
)
select
id,
sumv
from
tmptable
where
sumv>1000
And a fiddle:
http://sqlfiddle.com/#!6/0d3f2/2
You need to remove the sum(Value) from the group by clause.
You can use GROUP BY 1, 2 to group by (in this case) the first and second columns, and thus avoid duplication in the GROUP BY clause.