Calculate contribution - sql

It is not legal to do max(count()), so how do I accomplish calculating the contribution as shown here (and also get the other columns)
SELECT id,
Avg(time) AS avgSec,
Stdev(time) AS stdevSec,
Count(time) AS cnt,
Avg(time)*Count(time)/max(Count(time)) AS contribution
FROM ...very long and complex query...

Use MAX()OVER() window aggregate function to get maximum count out of all records
Here is the correct way
Avg(time)*Count(time)/max(Count(time)) over()

Related

When can aggregate functions be nested in standard SQL?

I know it wasn't allowed in SQL-92. But since then it may have changed, particularly when there's a window applied. Can you explain the changes and give the version (or versions if there were more) in which they were introduced?
Examples
Is SUM(COUNT(votes.option_id)) OVER() valid syntax per standard SQL:2016 (or earlier)?
This is my comment (unanswered, an probably unlikely to in such an old question) in Why can you nest aggregate functions when using a window function in PostgreSQL?.
The Calculating Running Total (SQL) kata at Codewars has as its most upvoted solution (using PostgreSQL 13.0, a highly standard compliant engine, so the code is likely to be standard) this one:
SELECT
CREATED_AT::DATE AS DATE,
COUNT(CREATED_AT) AS COUNT,
SUM(COUNT(CREATED_AT)) OVER (ORDER BY CREATED_AT::DATE ROWS UNBOUNDED PRECEDING)::INT AS TOTAL
FROM
POSTS
GROUP BY
CREATED_AT::DATE
(Which could be simplified to:
SELECT
created_at::DATE date,
COUNT(*) COUNT,
SUM(COUNT(*)) OVER (ORDER BY created_at::DATE)::INT total
FROM posts
GROUP BY created_at::DATE
I assume the ::s are a new syntax for casting I didn't know of. And that casting from TIMESTAMP to DATE is now allowed (in SQL-92 it wasn't).)
As this SO answer explains, Oracle Database allows it even without a window, pulling in the GROUP BY from context. I don't know if the standard allows it.
You already noticed the difference yourself: It's all about the window. COUNT(*) without an OVER clause for instance is an aggregation function. COUNT(*) with an OVER clause is a window function.
By using aggregation functions you condense the original rows you get after the FROM clause and WHERE clause are applied to either the specified group in GROUP BY or to one row in the absence of a GROUP BY clause.
Window functions, aka analytic functions, are applied afterwards. They don't change the number of result rows, but merely add information by looking at all or some rows (the window) of the selected data.
In
SELECT
options.id,
options.option_text,
COUNT(votes.option_id) as vote_count,
COUNT(votes.option_id) / SUM(COUNT(votes.option_id)) OVER() * 100.0 as vote_percentage
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id;
we first join votes to options and then count the votes per option by aggregating the joined rows down to one result row per option (GROUP BY options.id). We count on a non-nullable column in the votes table (COUNT(votes.option_id), so we get a zero count in case there are no votes, because in an outer joined row this column is set to null.
After aggregating all rows and getting thus one row per option we apply a window function (SUM() OVER) on this result set. We apply the analytic SUM on the vote count (SUM(COUNT(votes.option_id)) by looking at the whole result set (empty OVER clause), thus getting the same total vote count in every row. We use this value for a calculation: option's vote count diveded by total vote count times 100, which is the option's percentage of total votes.
The PostgreSQL query is very similar. We select the number of posts per date (COUNT(created_at) is nothing else than a mere COUNT(*)) along with a running total of these counts (by using a window that looks at all rows up to the current row).
So, while this looks like we are nesting two aggregate functions, this is not really the case, because SUM OVER is not considered an agregation function but an analytic/window function.
Oracle does allow applying an aggregate function directly on another, thus invoking a final aggregation on a previous grouped by aggregation. This allows us to get one result row of, say, the average of sums without having to write a subquery for this. This is not compliant with the SQL standard, however, and very unpopular even among Oracle developers at that.

Calculate max value by a dimension over a number of days in SQL (Presto)

I am looking to get the max value by a specific dimension over a number of days in SQL, as per example below:
I have this initial dataset:
And I am looking to calculate the maximum of nr of items and sales by product type across the number of days, as in the example below:
Expected output:
Any advise on best way to get this? I tried Max function and Max_by to get the max by product id but it didnt work.
Thank you in advance.
Use window functions:
select t.*,
max(items) over (partition by product_type),
max(sales) over (partition by product_type)
from t;

BigQuery: Using threshold with COUNT DISTINCT in WINDOW function returns error

With COUNT DISTINCT, I often make use of a threshold to make it more precise. E.g. COUNT(DISTINCT users, 100000).
If I am using a WINDOW function though I get an error when trying to use a threshold COUNT_DISTINCT must have at most 1 argument(s), found 2. E.g. here's a made-up query that demonstrates the problem:
SELECT
day,
COUNT(DISTINCT state, 100000) OVER (PARTITION BY year, month, day)
FROM [publicdata:samples.natality]
LIMIT 1000
Is this by design? Is there a workaround?
COUNT(DISTINCT) is documented as approximation when used as aggregation function, but when it is used as analytic function - it is actually the exact implementation, so you don't need extra parameter - you will get the exact result without it.

How does the aggregation function work with group by

I do not understand the following (returns numbers of comments for articles with the newest ones dates):
SELECT `id_comment`,COUNT(*) AS `number`, MAX(`date`) AS `newest`
FROM `page_comments`
WHERE TO_DAYS( NOW() )-TO_DAYS(`date`) < 90
GROUP BY `id_comment`
ORDER BY `count` DESC,`newest` DESC
I dont understand how come that the MAX function will not return the MAX value of all the page_comments table? That it automatically takes only the max for the given group. When using MAX, I would expect it to return the highest value of the column. I dont understand how it works together with groupig.
You described the behavior yourself quite correctly already: it automatically takes only the max for the given group.
If you group, you do it (per usual) on every column in the result set, that is not aggregated (not using COUNT, SUM, MIN, MAX...)
That way you get distinct values for all non aggregated columns and the aggregated ones will yield a result that only takes the 'current' group into account.
I am just explaining it to the ground.
MAX() - An aggregate function(Works over the group of data).
If ""group by"" clause is NOT specified, the database implicitly groups the data(column specified) considering the entire result set as group.
If specified, it just groups the data(column) in the group logic specified.
It all boils down to analysis order:
FROM
ON
OUTER
WHERE
GROUP BY
CUBE | ROLLUP
HAVING
SELECT
DISTINCT
10 ORDER BY
TOP
so you first have the from clause
Then you cut the relevant rows via where ( so here your sentence : *I don't understand how come that the MAX function will not return the MAX value of all the page_comments* --fails)
then group it
Then you select it.
The max and aggregate functions apply on the data which is already filtered!

How do I Sum the values of a Count Distinct Aggregate of Contained Groups

I've got a bit of a problem with a grouping. At the moment I have a grouping that does a CountDistinct(Fields!AccountName.Value). This grouping is on a Financial Time Period.
I need to be able to do a Sum on the values brought forward by this CountDistinct at the end of this report, but I can't put an aggregate function within an aggregate function.
Don't suppose you guys have any idea's / help?
Thanks.
You can do a select and aggregate your set of aggregates. Something like:
SELECT sum(total)
FROM (Select count(Fields!AccountName.Value) as total, name FROM x group by name) totals