How to get the preceding values in Redshift based on Where condition?

How to get the preceding values in Redshift based on Where condition? - sql

I have three columns a student_name, column_1, column_2. I want to print the preceding value wherever the 0's exist in column 2.
I want the output like the below one, I used lag function. Probably I might be using it the wrong way.

From what I can tell, you want to count the number of 0 values up to and including each row. If this interpretation is correct, you would use a conditional cumulative sum:
select t.*,
sum( (column1 = 0)::int ) over (partition by student
order by <ordering column>
rows between unbounded preceding and current row
)
from t;
Note: This assumes that you have an ordering column which you have not included in the question.

Related

Copy value in previous row if not set

I would like to write a SELECT query for BigQuery that sets the value of a column to the value in the previous row, if in the current row it is set to NULL.
I have something like this for now:
SELECT *, IFNULL(tag, LAG(tag) OVER(ORDER BY id)) as new_tag FROM tags
...but it only copies values into adjacent NULL rows. Is there some way of doing this?

LAG window function doesn't support IGNORE NULLS clause, so use LAST_VALUE function along with IGNORE NULLS instead. If applied to your query,
SELECT *, LAST_VALUE(tag IGNORE NULLS) OVER(ORDER BY id) as new_tag FROM tags

Best way to get 1st record per partition: FIRST_VALUE vs ROW_NUMBER

I am looking for the fastest way to get the 1st record (columns a,b,c ) for every partition (a,b) using SQL. Table is ~10, 000, 000 rows.
Approach #1:
SELECT * FROM (
SELECT a,b,c,
ROW_NUMBER() OVER ( PARTITION by a, b ORDER BY date DESC) as row_num
FROM T
) WHERE row_num =1
But it probably does extra work behind the scene - I need only 1st row per partition.
Approach #2 using FIRST_VALUE(). Since FIRST_VALUE() returns expression
let pack/concatenate a,b,c using some separator into single expression, e.g.:
SELECT FIRST_VALUE(a+','+'b'+','+c)
OVER ( PARTITION by a, b ORDER BY date DESC rows unbounded preceding) FROM T
But in this case I need to unpack the result, which is extra step.
Approach #3 using FIRST_VALUE() - repeat OVER (...) for a , b :
SELECT
FIRST_VALUE(a)
OVER ( PARTITION by a, b ORDER BY date DESC rows unbounded preceding),
FIRST_VALUE(b)
OVER ( PARTITION by a, b ORDER BY date DESC rows unbounded preceding),
c
FROM T
In approach #3 I do not know if database engine (Redshift) smart enough to partition only once

The first query is different from the other two. The first only returns one row per group. The other two return the same rows as in the original query.
You should use the version that does what you want, which I presume is the first one. If you add select distinct or group by to the other queries, that will probably add overhead that will make them slower -- but you can test on your data to see if that is true.
Your intuition is correct that the first query does unnecessary work. In databases that support indexes fully, a correlated subquery is often faster. I don't think that would be the case in Redshift, however.

Cumulating value of previous row in Column FINAL_VALUE

My table name is "fundt" and my question is:
how to cumulative sum of previous row in Column FINAL_VALUE?"
I think it possible with cross join but I don't know how.

I suspect that you want window functions with a window frame:
select
t.*,
sum(final_value) over(
order by it_month
rows between unbounded preceding and 1 preceding
) cumulative_final_value
from mytable t
This gives you a cumulative sum() of previous rows (not including the current row), using column it_month for ordering. You might need to adapt that to your exact requirement, but this seems to be the logic that you are looking for.

Oracle LAST_VALUE only with order by in analytic clause

I have schema (Oracle 11g R2):
CREATE TABLE users (
id INT NOT NULL,
name VARCHAR(30) NOT NULL,
num int NOT NULL
);
INSERT INTO users (id, name, num) VALUES (1,'alan',5);
INSERT INTO users (id, name, num) VALUES (2,'alan',4);
INSERT INTO users (id, name, num) VALUES (3,'julia',10);
INSERT INTO users (id, name, num) VALUES (4,'maros',77);
INSERT INTO users (id, name, num) VALUES (5,'alan',1);
INSERT INTO users (id, name, num) VALUES (6,'maros',14);
INSERT INTO users (id, name, num) VALUES (7,'fero',1);
INSERT INTO users (id, name, num) VALUES (8,'matej',8);
INSERT INTO users (id, name, num) VALUES (9,'maros',55);
And i execute following queries - using LAST_VALUE analytic function only with ORDER BY analytic clause :
My assumption is that this query executes over one partition - whole table (as partition by clause is missing). It will sort rows by name in given partition (whole table) and it will use default windowing clause RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
select us.*,
last_value(num) over (order by name) as lv
from users us;
But the query executed above will give exactly same results as following one. My assumption concerning second query is that this query firstly partition table rows by name then sort rows in every partition by num and then apply windowing clause RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING over each partition to get LAST_VALUE.
select us.*,
last_value(num) over (partition by name order by num RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lv
from users us;
One of my assumption is clearly wrong because two above mentioned queries give the same result. It looks like the first query orders records also by num behind curtains. Could you please suggest what is wrong with my assumptions and why these queries return same results?

The answer is simple. For whatever reason, Oracle chose to make LAST_VALUE deterministic when a logical (RANGE) offset is used in the windowing clause (explicitly or implicitly - by default). Specifically, in such cases, the HIGHEST value of the measured expression is selected from among a set of rows tied by the order by sorting.
https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/LAST_VALUE.html#GUID-A646AF95-C8E9-4A67-87BA-87B11AEE7B79
Towards the bottom of that page in the Oracle documentation, we can read:
When duplicates are found for the ORDER BY expression, the LAST_VALUE
is the highest value of expr [...]
Why does the documentation say that in the examples section, and not in the explanation of the function? Because, as is very often the case, the documentation doesn't seem to be written by qualified people.

From this blog in Oracle magazine, here is what happens if you use an ORDER BY clause in a window function without specifying anything else:
An ORDER BY clause, in the absence of any further windowing clause parameters, effectively adds a default windowing clause: RANGE UNBOUNDED PRECEDING, which means, “The current and previous rows in the current partition are the rows that should be used in the computation.” When an ORDER BY clause isn’t accompanied by a PARTITION clause, the entire set of rows used by the analytic function is the default current partition.
So, your first query is actually the same as this:
SELECT us.*, LAST_VALUE(num) OVER (ORDER BY name RANGE UNBOUNDED PRECEDING) AS lv
FROM users us;
If you run the above query, you will get the current behavior you are seeing, which will return a separate last value for each name. This differs from the following query:
SELECT
us.*,
LAST_VALUE(num) OVER (ORDER BY name
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv
FROM users us;
This just generates the value 8 for the last value of num, which corresponds to the value for matej, who is the last name when sorting name ascending.

Here is a db<>fiddle, in case anyone wants to play with them.
Let me assume that you think that the second query is returning the correct results.
select us.*,
last_value(num) over (partition by name
order by num
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as lv
from users us;
Let me also point out that this is more succinctly written as:
select us.*,
max(num) over (partition by name
order by num
) as lv
from users us;
That is irrelevant to your question, but I want to point it out.
Now, why does this give the same results?
select us.*,
last_value(num) over (order by name) as lv
from users us;
Well, with no windowing clause, this is equivalent to:
select us.*,
last_value(num) over (order by name
range between unbounded preceding and current row
) as lv
from users us;
The range is very important here. It does not go to the current row. It goes to all rows with the same value in name.
In my understanding of the documentation around order by, any num value from rows with the same name could be chosen. Why? Sorting in SQL (and in Oracle) is not stable. That means that it is not guaranteed to preserve the original ordering of the rows.
In this particular case, it might be coincidence that the last value happens to be the largest value. Or, for some reason Oracle might be adding num to the ordering for some reason.

SQL statement to update a column

I have a Table T1 with following values
I need a result table with additional column which is the average of upto date.
i.e.,
x1= 1000.45
x2= (1000.45+2000.00)/2
x3= (1000.45+2000.00+3000.50)/3
x4= (1000.45+2000.00+3000.50+4000.24)/4
The result table should look like the following:
I need to write SQL statement in Oracle database to add a column to result table with column values x1, x2, x3, x4.

You need to use an analytic function for this. My untested SQL is as follows:
SELECT
date,
division,
sum_sales,
AVG( sum_sales ) OVER ( ORDER BY date ROWS UNBOUNDED PRECEDING )
FROM
table;
date is a reserved word in Oracle, so if you are using that as your real column name you will need to include it in quotes.

select date,division,sum_sales,avg(sum_sales) over ( order by sum_sales ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
from table
group by date,division,sum_sales

You need to use AVG function OVER ordering by date. As each row is an aggregation result of all the preceding rows, you need to define the window of the aggregation as UNBOUNDED PRECEDING
By following these guidelines, the resultant statement would be like:
SELECT date_d,
division,
sum_sales,
AVG(sum_sales)
over (
ORDER BY date_d ROWS unbounded preceding ) avrg
FROM supplier;
You can test that in FIDDLE
Good two pieces of information about analytical functions in these two articles:
Introduction to Analytic Functions (Part 1)
Introduction to Analytic Functions (Part 2)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get the preceding values in Redshift based on Where condition? - sql

I have three columns a student_name, column_1, column_2. I want to print the preceding value wherever the 0's exist in column 2. I want the output like the below one, I used lag function. Probably I might be using it the wrong way.

Related

Copy value in previous row if not set

Best way to get 1st record per partition: FIRST_VALUE vs ROW_NUMBER

Cumulating value of previous row in Column FINAL_VALUE

Oracle LAST_VALUE only with order by in analytic clause

SQL statement to update a column

Categories

Resources