LAST_VALUE() rows between unbounded preceding and unbounded following - sql

What is the use of this statement?
Please elaborate with an example. I came across it while using the LAST_VALUE function.

from https://forums.oracle.com/forums/thread.jspa?threadID=1018352
when you ORDER a set of records in analytic functions you can specify a range of rows to consider,ignoring the others.
You can do this using the ROWS clause
UNBOUNDED PRECEDING
The range starts at the first row of the partition.
UNBOUNDED FOLLOWING
The range ends at the last row of the partition.
CURRENT ROW
range begins at the current row or ends at the current row
n PRECEDING or n FOLLOWING
The range starts or ends n rows before or after the current row

This is explained quite well in the manual:
http://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

Related

get Last value of a column in sql server

I want to get the last value of a column(it is not an identity column) and increment it to the value of corresponding row number generated.
Select isnull(LAST_VALUE(ColumnA) over(order by ColumnA), 0) +
ROW_NUMBER() OVER (ORDER BY ColumnA)
from myTable
I am calling my sp recursively hence why I thought of this logic.
But it is not working.
I basically wanted, for first time 1-9 for second run (if recursively being called 2 times) 10-19 and so on.
Total stab in the dark, but I suspect "not working" means "returning the current row's value." Don't forget that an OVER clause defaults to the window RANGE BETWEEN PRECEDING AND CURRENT ROW when one isn't explicitly specified and there is an ORDER BY (see SELECT - OVER Clause (Transact-SQL) - ORDER BY).
ORDER BY
ORDER BY *order_by_expression* [COLLATE *collation_name*] [ASC|DESC]
Defines the logical order of the rows within each partition of the result set. That is, it specifies the logical order in which the window function calculation is performed.
If it is not specified, the default order is ASC and window function will use all rows in partition.
If it is specified, and a ROWS/RANGE is not specified, then default RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame by the functions that can accept optional ROWS/RANGE specification (for example min or max).
As you haven't defined the window, that's what your LAST_VALUE function is using. Define that you want the whole lot for the partition:
SELECT ISNULL(LAST_VALUE(ColumnA) OVER (ORDER BY ColumnA ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 0) +
ROW_NUMBER() OVER (ORDER BY ColumnA)
FROM dbo.myTable;
Though what Gordon says in their comment is the real solution:
You should be using an identity column or sequence.
This type of solution can (and will) end up suffering race conditions, as well as end up reusing "identities" when it shouldn't be.

where days between 28 PRECEDING and 1 PRECEDING -> what data range is taking

I am creating a code to check the last 28 days of a table.
I came across with this code
WINDOW w AS (PARTITION BY abs_wd.abs_wd_employee_id ORDER BY abs_wd.date_diff RANGE BETWEEN 28 PRECEDING AND 1 PRECEDING)
My question is: Is this taking from yesterdays day to 28s back from yesterday? In another way, if I use only "PRECEDING", is this already counting from Yesterday or I have to add the "1" for that"
Thanks! :)
In hive window clause, the word PRECEDING means the number of rows previous to the current row. Similarly the word FOLLOWING would me the rows following after the current row.
For e.g. in your query BETWEEN 28 PRECEDING would basically mean 28 rows behind the current row.
So answer your question, by default the window works on the current row but if you want to start it from the previous window you would need to have a range which would start from 1 row preceding (as you have correctly done in the question), like below:
WINDOW w AS (PARTITION BY abs_wd.abs_wd_employee_id ORDER BY abs_wd.date_diff ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING)
Example:
Based on the sample cloudera customers dataset, I ran a sum aggregate on the orders table with a window of 2 PRECEDING and 1 PRECEDING. If you look into the row 4, the window function returns a sum of total orders in a window from yesterday's date and day before yesterday's date.
Query used in the example:
sum(total_orders) over(order by order_date rows between 2 preceding and 1 preceding) as window_result
More on window function in hive doc.

Redshift SQL - Running Sum using Unbounded Proceding and Following

When we use the window function to calculate the running sum like SUM(sales) over (partition by dept order by date), if we don't specify the range/window, is the default setting as between unbounded proceding and current row, basically from the first row until the current row?
According to this doc it seems to be the case, but I wanted to double check.
Thanks!
The problem you are running into is 'what does the database engine assume in ambiguous circumstances?' I've run into this exact case before when porting from SQLserver to Redshift - SQL server assumes that is you order but don't specify a frame that you want unbounded preceding to current row. Other DBs do not make the same assumption - if you don't specify a frame it will be unbounded preceding to unbounded following and yet other will throw an error if you specify and "order by" but don't specify a frame. Bottom line - don't let the DB engine guess what you want, be specific.
Gordon is correct that this is based on rows, not ranges. If you want a running sum by date (not row), you can group by date and run the window function - windows execute after group by in a single query.

How to understand the results of rows between 2 preceding and current row?

My SQL query is:
SELECT time, buy,
avg(buy) OVER (ORDER BY time rows between 1 preceding and current row) as average_2,
avg(buy) OVER (ORDER BY time rows between 2 preceding and current row) as average_3
FROM my_table;
I'm trying to understand these window functions. I used some test data and got results:
TIME BUY AVERAGE_2 AVERAGE_3
------------------- ---------- ---------- ----------
2019-05-05 10:05:19 1 1 1
2019-05-05 10:05:22 2 1.5 1.5
2019-05-05 10:05:25 3 2.5 2
2019-05-05 10:05:27 4 3.5 3
I need to know: how do I get these results? Specially average_3?
What is the difference between ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW and rows between 2 preceding and current row? I read many explanations from the internet, now I'm confused because they have explained with different syntax.
For the first row (earliest time), there are no preceding rows, so both between 1 preceding and current row and between 2 preceding and current row only actually find the current row. Both averages are therefore the average of a single value, 1, which is of course 1.
For the second row, there is only one preceding row, so both between 1 preceding and current row and between 2 preceding and current row only actually find the current row (2) and that single preceding row (1). Both averages are therefor the average of the same two values, 2 and 1, which is 1.5 (i.e. (2+1)/2)).
For the third row, there are now two preceding rows. This time:
between 1 preceding and current row finds the current row (3) and the immediately preceding row (2), and that average is calculated as (3+2)/2 which is 2.5. Any earlier preceding rows are ignored, so 1 isn't included in the calculation.
between 2 preceding and current row finds the current row (3) and both preceding rows (2 and 1), and that average is calculated as (3+2+1)/3 which is 2.
For the fourth row, there are again two preceding rows. This time:
between 1 preceding and current row finds the current row (4) and the immediately preceding row (3), and that average is calculated as (4+3)/2 which is 3.5. Any earlier preceding rows are ignored, so neither 2 nor 1 are included in the calculation.
between 2 preceding and current row finds the current row (4) and both preceding rows (3 and 2), and that average is calculated as (4+3+2)/3 which is 3. Any earlier preceding rows are ignored, so 1 isn't included in the calculation.
If you were also calculating between unbounded preceding and current row, which is the default if you don't specify that at all, then all preceding rows are included. That makes no difference for the first two rows; but for the third and fourth the 'any earlier preceding rows are ignored' part would not be true. The average would therefore still be 1 for row 1 and 1.5 for row 2; and would be 2 for row 3 ((3+2+1)/3; and would be 2.5 for row 4 ((4+3+2+1)/4).
Read more.
for your question "What is the difference between ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW and rows between 2 preceding and current row ?"
In average_3 you get the average between the two previous rows and the current row, the same happens in average_2 but only with a previous row, but it is better to see a good example of that.
This post by Steve Stedman is really good, and it gives you a good example of that.

RANGE BETWEEN function with Timestamp values in SQL Impala

I am trying to calculate a moving number of events (impressions) that happen per minute. How can I use the range between function with timestamp values to define the 1-minute interval?
I have something like this:
count(impression) over (partition by user
ORDER BY trunc(cast(entrytime as TIMESTAMP), "MI")
RANGE BETWEEN interval 1 minutes Preceding
and interval 1 minutes Following) as densityperminute
but this doesn't seem to work. Any ideas on how to fix this?
I believe that's not supported, unfortunately. From the documentation for 6.1:
Currently, Impala supports only some combinations of arguments to the
RANGE clause:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (the default when
ORDER BY is specified and the window clause is omitted)
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Source
(Forgive me answering an old question, but I'm currently looking into this for a school project and this came up in my search!)