MAX on columns generated by SUM and GROUP BY - sql

I'm trying to get the MAX on a column which is generated dynamically using the SUM statement. The SUM statement is used together with the 'GROUP by' syntax.
This is the original query, however it needs to be modified to work with grouping, sums and of course MAX.
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
As you can see SUM is adding all the values inside video_plays as total_video_plays..
But I SIMPLY want to get the MAX of total_video_plays
My attempts are below, however they do not work..
SELECT SUM(video_plays) AS MAX(total_video_plays)
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
How would you get the MAX on a column made dynamically without using subqueries - Because the above is already placed within one.

Something like
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id`
ORDER BY total_video_plays DESC
LIMIT 1
Hat Tip OMG Ponies for proper MySQL dialect.

You can not do what you're asking without a subquery, because you can't run two aggregate functions, one on top of the other.

Will this work for you?
SELECT MAX(total_video_plays) from table (
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC )
It contains a subquery, but maybe not in the sense you were thinking.

This works for me.
select video_id, sum(video_plays) as sum_video_plays
from (
select video_id, video_plays
, row_number() over (partition by video_id
order by video_id desc) as rn
from video_statistics
) as T
where rn = 1
group by video_id;

can't you just do this?:
SELECT video_id, max(video_plays) AS max_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
There's an example here:
http://dev.mysql.com/doc/refman/5.1/de/select.html
SELECT user, MAX(salary) FROM users
GROUP BY user HAVING MAX(salary) > 10;
EDIT: second attempt, albeit using a subquery:
select max(sum_video_plays)
from (
SELECT video_id, sum(video_plays) AS sum_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id`
) x
Your outer query may well be selecting from a much smaller set, and (depending on your data distribution etc.) may be quite performant.

Related

LIMITING MAX VALUES IN SQL

I am completely rewriting this question, I just cant crack it
IDB DB2 SQL
(from a Chicago Crime Dataset)
Which community area is most crome prone?
When I use this code, it does correctly count and sort the data
select community_area_number as community_area_number, count(community_area_number) as total_area_crime
from chicago_crime_data
group by community_area_number
order by total_area_crime desc;
the problem is, it lists all the data descending, but no matter what MAX statement I use, either in the select or the order by statement, it wont show just the max values.
The max values are 43, so I would like to to show both 'community_area_numbers' that have 43.
Instead it shows the entire list.
Here is a screenshot
also, yes I understand I can just do a LIMIT 2 command, but that would be cheating since I manually checked that there are 2 max values, but if this data changed or i didnt know that, it doesnt solve anything
thanks in advance
What you would be looking for is the standard SQL clause FETCH WITH TIES;
select community_area_number, count(*) as total_area_crime
from chicago_crime_data
group by community_area_number
order by total_area_crime desc
fetch first row with ties;
Unfortunately, though, DB2 doesn't support WITH TIES in FETCH FIRST.
The classic way (that is before we had the window functions RANK and DENSE_RANK) is to use a subquery: Get the maximum value, then get all rows with that maximum. I am using a CTE (aka WITH clause) here in order not to have to write everything twice.
with counted as
(
select community_area_number, count(*) as total_area_crime
from chicago_crime_data
group by community_area_number
)
select community_area_number, total_area_crime
from counted
where total_area_crime = (select max(total_area_crime) from counted);
(Please note that this is a mere COUNT(*), because we want to count rows per community_area_number.)
Like #topsail mentioned. You could use a rank function.
From the table you have above you could do the following
SELECT t.* FROM
(
SELECT *,
RANK() OVER (Order by Total_Area_Crime DESC) rnk
from
table1
)t
WHERE t.rnk = 1
db fiddle
So your full query should look something like this:
With cte AS (
SELECT MAX(COMMUNITY_AREA_NUMBER) AS COMMUNITY_AREA_NUMBER,
COUNT(COMMUNITY_AREA_NUMBER) AS TOTAL_AREA_CRIME
FROM CHICAGO_CRIME_DATA
GROUP BY COMMUNITY_AREA_NUMBER
ORDER BY TOTAL_AREA_CRIME DESC;
)
SELECT t.* FROM
(
SELECT *,
RANK() OVER (Order by Total_Area_Crime DESC) rnk
from
cte
)t
WHERE t.rnk = 1
It turns out the professor did want us to use the Limit command.
Here is the final answer:
SELECT COMMUNITY_AREA_NUMBER, COUNT(ID) AS CRIMES_RECORDED
FROM CHICAGO_CRIME_DATA
GROUP BY COMMUNITY_AREA_NUMBER
ORDER BY CRIMES_RECORDED DESC LIMIT 1;
thanks to all those who responded :D

sql aggregations

any ideas why this doesn't work?
select [column_name_2], max(count(distinct([column_name_1])))
from [table_name]
group by [column_name_2]
but it works if done like this
select [column_name_2], count(distinct([column_name_1])) as [x]
into #temp_table
from [table_name]
group by [column_name_2]
select max(x)
from #temp_table
Well, that's just the way SQL (the language) is defined to work. When you use GROUP BY, the corresponding SELECT list will produce a row for each group in the result. You're trying to take that result and aggregate twice, once with GROUP BY [column_name_2] and a second time with GROUP BY (), as defined by standard SQL. We can't do that in the same query expression.
The good news is you can break this up into more than one query expression:
WITH cte1 AS (
SELECT count(distinct([column_name_1])) AS cnt
FROM [table_name]
GROUP BY [column_name_2]
)
SELECT MAX(cnt) FROM cte1
;
or use a derived table.
You can even order the initial query result by cnt DESC and limit the result to the first row.
In your case, you may not want just the MAX, but also the other column.
With SQL Server, which you may be using. Note: You should add a database specific tag to the question.
SELECT TOP 1 [column_name_2], count(distinct([column_name_1])) AS cnt
FROM [table_name]
GROUP BY [column_name_2]
ORDER BY cnt DESC
;
I don't understand "this doesn't work", what were you expecting and what did you get? Normally you include the GROUP BY value in the result set. So it would be:
select [column_name_2], max(cnt) cnt
from (select [column_name_2], count(distinct [column_name_1]) cnt
from [table_name]
group by [column_name_2]) x
group by [column_name_2]
Ok, after reading your comment I think above is what you are looking for.
SELECT [column_name_two]
, max(x) x
FROM (
SELECT [column_name_two]
, COUNT(DISTINCT [column_name_one]
FROM table_name
GROUP BY [column_name_two]
) AS Tbl
GROUP BY [column_name_two]

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

SQL Server: How can I use the COUNT clause without GROUPing?

I'm looking get two things from a query, given a set of contraints:
The first match
The total number of matches
I can get the first match by:
SELECT TOP 1
ID,
ReportYear,
Name,
SignDate,
...
FROM Table
WHERE
...
ORDER BY ... //I can put in here which one I want to take
And then I can get the match count if I use
SELECT
MIN(ID),
MIN(ReportYear),
MIN(Name),
MIN(SignDate),
... ,
COUNT(*) as MatchCount
FROM Table
WHERE
...
GROUP BY
??? // I don't really want any grouping
I really want to avoid both grouping and using an aggregate function on all my results. This question SQL Server Select COUNT without using aggregate function or group by suggests the answer would be
SELECT TOP 1
ID,
ReportYear,
Name,
SignDate,
... ,
##ROWCOUNT as MatchCount
FROM Table
This works without the TOP 1, but when it's in there, ##ROWCOUNT = number of rows returned, which is 1. How can I get essentially the output of COUNT(*) (whatever's left after the where clause) without any grouping or need to aggregate all the columns?
What I don't want to do is repeat each of these twice, once for the first row and then again for the ##ROWCOUNT. I'm not finding a way I can properly use GROUP BY, because I strictly want the number of items that match my criteria, and I want columns that if I GROUPed them would throw this number off - unless I'm misunderstanding GROUP BY.
Assuming you are using a newish version of SQL Server (2008+ from memory) then you can use analytic functions.
Simplifying things somewhat, they are a way of way of doing an aggregate over a set of data instead of a group - an extension on basic aggregates.
Instead of this:
SELECT
... ,
COUNT(*) as MatchCount
FROM Table
WHERE
...
You do this:
SELECT
... ,
COUNT(*) as MatchCount OVER (PARTITION BY <group fields> ORDER BY <order fields> )
FROM Table
WHERE
...
GROUP BY
Without actually running some code, I can't recall exactly which aggregates that you can't use in this fashion. Count is fine though.
Well, you can use OVER clause, which is an window function.
SELECT TOP (1)
OrderID, CustID, EmpID,
COUNT(*) OVER() AS MatchCount
FROM Sales.Orders
WHERE OrderID % 2 = 1
ORDER BY OrderID DESC
Try next query:
select top 1
*, count(*) over () rowsCount
from
(
select
*, dense_rank() over (order by ValueForOrder) n
from
myTable
) t
where
n = 1