Count()over() have repeated records - sql

I often use sum() over() to calculate cumulative value,but today,I tried count ()over(),the result is out of my expectation,can someone explain why the result have repeated records on the same day?
I know the regular way is to count (distinct I'd) group by date,and then sum()over(order by date),just curious for the result of "count(id)over(order by date)"
Select pre.date,count(person_id) over (order by pre.date)
From (select distinct person_id, date from events) pre
The result will be repeated records for the same day.

Because your outer query has not filtered or aggregated the results from the inner query. It returns the same number of rows.
You want aggregation:
select pre.date, count(*) as cnt_on_date,
sum(count(*)) over (order by pre.date) as running_count
from (select distinct person_id, date from events) pre
group by pre.date;

Almost all analytical functions, except row_number() which comes to mind, do not differentiate ties for the same value of columns in order by clause. In some documentation it is stated directly:
Oracle
If you specify a logical window with the RANGE keyword, then the function returns the same result for each of the rows
Postgresql
By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.
My SQL
With 'ORDER BY': The default frame includes rows from the partition start through the current row, including all peers of the current row (rows equal to the current row according to the ORDER BY clause).
But in general, the addition of ORDER BY in analytical clause implicitly sets window specification to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. As window calculation is made for each row in the defined window, with default to RANGE rows with the same value of ORDER BY columns will come into the same window and will produce the same result. So to have a real running total, there should be ROWS BETWEEN or more detail column in ORDER BY part of analytic clause. Functions that does not support windowing clause are exception of this rule, but it sometimes not documented directly, so I will not try to list them here. Functions that can be used as aggregate are not exception in general and produce the same value.

Related

Cumulating value of previous row in Column FINAL_VALUE

My table name is "fundt" and my question is:
how to cumulative sum of previous row in Column FINAL_VALUE?"
I think it possible with cross join but I don't know how.
I suspect that you want window functions with a window frame:
select
t.*,
sum(final_value) over(
order by it_month
rows between unbounded preceding and 1 preceding
) cumulative_final_value
from mytable t
This gives you a cumulative sum() of previous rows (not including the current row), using column it_month for ordering. You might need to adapt that to your exact requirement, but this seems to be the logic that you are looking for.

Counting the repetition of Call ID in the previous rows, sorting by date

Is there a way to count how many times a call ID is repeated in all rows prior to this row or the current field in a SQL query. attached is a sample of the data table.
I guess the simplest solution you want is a RANK() window function. Considering If a row have a rank as 5 means, Same value has occurs previously 4 times in the same column. If your DBMS product supports window function, You may use -
SELECT *, RANK() OVER(PARTITION BY CALLID) - 1
FROM YOUR_TABLE;

What is the execution order of the PARTITION BY clause compared to other SQL clauses?

I cannot find any source mentioning execution order for Partition By window functions in SQL.
Is it in the same order as Group By?
For example table like:
Select *, row_number() over (Partition by Name)
from NPtable
Where Name = 'Peter'
I understand if Where gets executed first, it will only look at Name = 'Peter', then execute window function that just aggregates this particular person instead of entire table aggregation, which is much more efficient.
But when the query is:
Select top 1 *, row_number() over (Partition by Name order by Date)
from NPtable
Where Date > '2018-01-02 00:00:00'
Doesn't the window function need to be executed against the entire table first then applies the Date> condition otherwise the result is wrong?
Window functions are executed/calculated at the same stage as SELECT, stage 5 in your table. In other words, window functions are applied to all rows that are "visible" in the SELECT stage.
In your second example
Select top 1 *,
row_number() over (Partition by Name order by Date)
from NPtable
Where Date > '2018-01-02 00:00:00'
WHERE is logically applied before Partition by Name of the row_number() function.
Note, that this is logical order of processing the query, not necessarily how the engine physically processes the data.
If query optimiser decides that it is cheaper to scan the whole table and later discard dates according to the WHERE filter, it can do it. But, any kind of these transformations must be performed in such a way that the final result is consistent with the order of the logical steps outlined in the table you showed.
It is part of the SELECT phase of the query execution. There are different types of SELECT clauses, based on the query.
SELECT FOR
SELECT GROUP BY
SELECT ORDER BY
SELECT OVER
SELECT INTO
SELECT HAVING
PARTITION BY comes in the SELECT OVER clause. Here, a window of the result set is generated out of the result set generated in the previous stages: FROM, WHERE, GROUP BY etc.
The OVER clause defines a window or user-specified set of rows within
a query result set. A window function then computes a value for each
row in the window. You can use the OVER clause with functions to
compute aggregated values such as moving averages, cumulative
aggregates, running totals, or a top N per group results.
OVER ( [ PARTITION BY value_expression ] [ order_by_clause ] )
Arguments
PARTITION BY Divides the query result set into partitions. The window
function is applied to each partition separately and computation
restarts for each partition.
value_expression Specifies the column by which the rowset is
partitioned. value_expression can only refer to columns made available
by the FROM clause. value_expression cannot refer to expressions or
aliases in the select list. value_expression can be a column
expression, scalar subquery, scalar function, or user-defined
variable.
Defines the logical order of the rows within each
partition of the result set. That is, it specifies the logical order
in which the window functioncalculation is performed.
order_by_expression Specifies a column or expression on which to sort.
order_by_expression can only refer to columns made available by the
FROM clause. An integer cannot be specified to represent a column name
or alias.
You can read more about it SELECT-OVER
row_number() (and other window functions) are allowed in two clauses:
SELECT
ORDER BY
The function is parsed along with the rest of the clause. After all, it is a function present in the clause. In both cases, the WHERE clause would be -- logically -- applied first, so the results would be after filtering.
Do note that this is a logical parsing of the query. The actual execution may have little to do with the structure of the query.

SQL - How to return min and max values for each quintile

In my query I am selecting the current balances of loans. I also created a column that returns which quintile each loan balance falls into.
I used this statement
NTILE(5) OVER (ORDER BY CurrLoanBal)
From here, how do I return the min and max values for each quintile? I don't want to group any rows, I want each row to show the min and max for that particular quintile.
You need to use aggregate functions and partitioning
select s.col1, s.col2, ... s.colN,
max(s.col1) over (partition by s.col2, ... ),
min(s.col1) over (partition by s.col2, ... )
from stuff as s;
where your partition by clause decides how the data is partitioned ("grouped") without grouping it the same way a group by would. This way the window functions are applied to each partition to give you the data you want.
The columns in the partition by bit determine how the rows are partitioned (what separates one quintile from another in your case).
See https://msdn.microsoft.com/en-us/library/ms189461.aspx for more information.

SQL Row_Count function with Partition

I have a query that returns a set of results as a table called DATA, from several UNION ALL joined queries.
I am then doing ROW_NUMBER() on this, to get the row number for a specific grouping (WorksOrderNo)
ROW_NUMBER() Over(partition by Data.WorksOrderNo order by Data.WorksOrderNo) as RowNo,
Is there an equivalent ROW_Count function where I can specify a partition, and return the count of rows for that partition?
ROW_Count() Over(partition by Data.WorksOrderNo order by Data.WorksOrderNo) as RowNo ???
Reason being, this is query being used to drive a report layout.
As part of this, I need to format based on whether the total row count for each WorksOrderNo is >1 or not.
So for instance if there were three rows for a works order, the row_number function currently returns 1, 2 and 3, where the row count would return 3 on each row.
The function is simply COUNT(). In SQL Server, all the aggregation functions can be used as window functions, as long as they do not use DISTINCT.
Note that for the total count, you do not want the ORDER BY:
COUNT(*) Over (partition by Data.WorksOrderNo) as cnt
If you include the ORDER BY, then the COUNT() is cumulative, rather than constant for all rows in the partition.
It looks like you just need group by and count:
select WorksOrderNo, count(*) as Row_Count
from Data
group by WorksOrderNo