Oracle LAST_VALUE only with order by in analytic clause - sql

I have schema (Oracle 11g R2):
CREATE TABLE users (
id INT NOT NULL,
name VARCHAR(30) NOT NULL,
num int NOT NULL
);
INSERT INTO users (id, name, num) VALUES (1,'alan',5);
INSERT INTO users (id, name, num) VALUES (2,'alan',4);
INSERT INTO users (id, name, num) VALUES (3,'julia',10);
INSERT INTO users (id, name, num) VALUES (4,'maros',77);
INSERT INTO users (id, name, num) VALUES (5,'alan',1);
INSERT INTO users (id, name, num) VALUES (6,'maros',14);
INSERT INTO users (id, name, num) VALUES (7,'fero',1);
INSERT INTO users (id, name, num) VALUES (8,'matej',8);
INSERT INTO users (id, name, num) VALUES (9,'maros',55);
And i execute following queries - using LAST_VALUE analytic function only with ORDER BY analytic clause :
My assumption is that this query executes over one partition - whole table (as partition by clause is missing). It will sort rows by name in given partition (whole table) and it will use default windowing clause RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
select us.*,
last_value(num) over (order by name) as lv
from users us;
But the query executed above will give exactly same results as following one. My assumption concerning second query is that this query firstly partition table rows by name then sort rows in every partition by num and then apply windowing clause RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING over each partition to get LAST_VALUE.
select us.*,
last_value(num) over (partition by name order by num RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lv
from users us;
One of my assumption is clearly wrong because two above mentioned queries give the same result. It looks like the first query orders records also by num behind curtains. Could you please suggest what is wrong with my assumptions and why these queries return same results?

The answer is simple. For whatever reason, Oracle chose to make LAST_VALUE deterministic when a logical (RANGE) offset is used in the windowing clause (explicitly or implicitly - by default). Specifically, in such cases, the HIGHEST value of the measured expression is selected from among a set of rows tied by the order by sorting.
https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/LAST_VALUE.html#GUID-A646AF95-C8E9-4A67-87BA-87B11AEE7B79
Towards the bottom of that page in the Oracle documentation, we can read:
When duplicates are found for the ORDER BY expression, the LAST_VALUE
is the highest value of expr [...]
Why does the documentation say that in the examples section, and not in the explanation of the function? Because, as is very often the case, the documentation doesn't seem to be written by qualified people.

From this blog in Oracle magazine, here is what happens if you use an ORDER BY clause in a window function without specifying anything else:
An ORDER BY clause, in the absence of any further windowing clause parameters, effectively adds a default windowing clause: RANGE UNBOUNDED PRECEDING, which means, “The current and previous rows in the current partition are the rows that should be used in the computation.” When an ORDER BY clause isn’t accompanied by a PARTITION clause, the entire set of rows used by the analytic function is the default current partition.
So, your first query is actually the same as this:
SELECT us.*, LAST_VALUE(num) OVER (ORDER BY name RANGE UNBOUNDED PRECEDING) AS lv
FROM users us;
If you run the above query, you will get the current behavior you are seeing, which will return a separate last value for each name. This differs from the following query:
SELECT
us.*,
LAST_VALUE(num) OVER (ORDER BY name
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv
FROM users us;
This just generates the value 8 for the last value of num, which corresponds to the value for matej, who is the last name when sorting name ascending.

Here is a db<>fiddle, in case anyone wants to play with them.
Let me assume that you think that the second query is returning the correct results.
select us.*,
last_value(num) over (partition by name
order by num
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as lv
from users us;
Let me also point out that this is more succinctly written as:
select us.*,
max(num) over (partition by name
order by num
) as lv
from users us;
That is irrelevant to your question, but I want to point it out.
Now, why does this give the same results?
select us.*,
last_value(num) over (order by name) as lv
from users us;
Well, with no windowing clause, this is equivalent to:
select us.*,
last_value(num) over (order by name
range between unbounded preceding and current row
) as lv
from users us;
The range is very important here. It does not go to the current row. It goes to all rows with the same value in name.
In my understanding of the documentation around order by, any num value from rows with the same name could be chosen. Why? Sorting in SQL (and in Oracle) is not stable. That means that it is not guaranteed to preserve the original ordering of the rows.
In this particular case, it might be coincidence that the last value happens to be the largest value. Or, for some reason Oracle might be adding num to the ordering for some reason.

Related

Partition by rearranges table on each query run

The below query always rearranges my table (2021-01-01 not followed by 2021-01-02 but any other random date ) at each run and messes up the average calculation. If I remove the partition by the table will get ordered by EventTime(date) correctly...but I have 6 kinds of Symbols I would like the average of. What am I doing wrong?
select ClosePrice, Symbol, EventTime, AVG(ClosePrice) over(
partition by Symbol
order by EventTime
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) [SMA]
from ClosePrices
The query is missing an ORDER BY clause for the final results. The ORDER BY inside the OVER() expression only applies to that window.
The SQL language is based on relational set theory concepts, which explicitly deny any built-in order for tables. That is, there is no guaranteed order for your tables or queries unless you deliberately set one via an ORDER BY clause.
In the past it may have seemed like you always get rows back in a certain order, but if so it's because you've been lucky. There are lots of things that can cause a database to return results in a different order, sometimes even for different runs of the same query.
If you care about the order of the results, you MUST use ORDER BY:
select ClosePrice, Symbol, EventTime, AVG(ClosePrice) over(
partition by Symbol
order by EventTime
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) [SMA]
from ClosePrices
ORDER BY EventTime

Do we always need to use UNBOUNDED FOLLOWING while using last_value() function?

In SQL, do we always need to use UNBOUNDED FOLLOWING while using last_value function in analytical functions?
I saw this example where if we just used:
SELECT employee_id, salary, last_value(salary) over(order by salary ASC)
FROM employees;
This did not give the expected output which was supposed to give the employee with the highest salary.
Instead they had to use windowing:
SELECT employee_id, salary,
last_value(salary) over (
order by salary ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM employees;
So I was just wondering if while using last_value function, do we always need to use UNBOUNDED FOLLOWING?
The answer is basically "yes", because the default window frame is range between unbounded preceding and current row. And the last value would normally be on the current row.
One exception is when you are using ignore nulls. Then last_value(ignore nulls) is very useful for getting the most recent row -- including the current row -- with a non-NULL value.
By the way, this is the reason that I usually use first_value() with a reversed order by:
first_value(salary) over (order by salary desc)
Or for this example, max() is simpler:
max(salary) over ()

How to get the preceding values in Redshift based on Where condition?

I have three columns a student_name, column_1, column_2. I want to print the preceding value wherever the 0's exist in column 2.
I want the output like the below one, I used lag function. Probably I might be using it the wrong way.
From what I can tell, you want to count the number of 0 values up to and including each row. If this interpretation is correct, you would use a conditional cumulative sum:
select t.*,
sum( (column1 = 0)::int ) over (partition by student
order by <ordering column>
rows between unbounded preceding and current row
)
from t;
Note: This assumes that you have an ordering column which you have not included in the question.

Count()over() have repeated records

I often use sum() over() to calculate cumulative value,but today,I tried count ()over(),the result is out of my expectation,can someone explain why the result have repeated records on the same day?
I know the regular way is to count (distinct I'd) group by date,and then sum()over(order by date),just curious for the result of "count(id)over(order by date)"
Select pre.date,count(person_id) over (order by pre.date)
From (select distinct person_id, date from events) pre
The result will be repeated records for the same day.
Because your outer query has not filtered or aggregated the results from the inner query. It returns the same number of rows.
You want aggregation:
select pre.date, count(*) as cnt_on_date,
sum(count(*)) over (order by pre.date) as running_count
from (select distinct person_id, date from events) pre
group by pre.date;
Almost all analytical functions, except row_number() which comes to mind, do not differentiate ties for the same value of columns in order by clause. In some documentation it is stated directly:
Oracle
If you specify a logical window with the RANGE keyword, then the function returns the same result for each of the rows
Postgresql
By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.
My SQL
With 'ORDER BY': The default frame includes rows from the partition start through the current row, including all peers of the current row (rows equal to the current row according to the ORDER BY clause).
But in general, the addition of ORDER BY in analytical clause implicitly sets window specification to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. As window calculation is made for each row in the defined window, with default to RANGE rows with the same value of ORDER BY columns will come into the same window and will produce the same result. So to have a real running total, there should be ROWS BETWEEN or more detail column in ORDER BY part of analytic clause. Functions that does not support windowing clause are exception of this rule, but it sometimes not documented directly, so I will not try to list them here. Functions that can be used as aggregate are not exception in general and produce the same value.

Cumulating value of previous row in Column FINAL_VALUE

My table name is "fundt" and my question is:
how to cumulative sum of previous row in Column FINAL_VALUE?"
I think it possible with cross join but I don't know how.
I suspect that you want window functions with a window frame:
select
t.*,
sum(final_value) over(
order by it_month
rows between unbounded preceding and 1 preceding
) cumulative_final_value
from mytable t
This gives you a cumulative sum() of previous rows (not including the current row), using column it_month for ordering. You might need to adapt that to your exact requirement, but this seems to be the logic that you are looking for.