PostgreSQL last_value ignore nulls - sql

I know this already been asked, but why doesn't the solution below work? I want to fill value with the last non-null value ordered by idx.
What I see:
idx | coalesce
-----+----------
1 | 2
2 | 4
3 |
4 |
5 | 10
(5 rows)
What I want:
idx | coalesce
-----+----------
1 | 2
2 | 4
3 | 4
4 | 4
5 | 10
(5 rows)
Code:
with base as (
select 1 as idx
, 2 as value
union
select 2 as idx
, 4 as value
union
select 3 as idx
, null as value
union
select 4 as idx
, null as value
union
select 5 as idx
, 10 as value
)
select idx
, coalesce(value
, last_value(value) over (order by case when value is null then -1
else idx
end))
from base
order by idx

What you want is lag(ignore nulls). Here is one way to do what you want, using two window functions. The first defines the grouping for the NULL values and the second assigns the value:
select idx, value, coalesce(value, max(value) over (partition by grp))
from (select b.*, count(value) over (order by idx) as grp
from base b
) b
order by idx;
You can also do this without subqueries by using arrays. Basically, take the last element not counting NULLs:
select idx, value,
(array_remove(array_agg(value) over (order by idx), null))[count(value) over (order by idx)]
from base b
order by idx;
Here is a db<>fiddle.

Well the last_value here doesn't make sense to me unless you can point out to me. Looking at the example you need the last non value which you can get it by:
I am forming a group with the nulls and previous non null value so that I can get the first non value.
with base as (
select 1 as idx , 2 as value union
select 2 as idx, -14 as value union
select 3 as idx , null as value union
select 4 as idx , null as value union
select 5 as idx , 1 as value
)
Select idx,value,
first_value(value) Over(partition by rn) as new_val
from(
select idx,value
,sum(case when value is not null then 1 end) over (order by idx) as rn
from base
) t
here is the code
http://sqlfiddle.com/#!15/fcda4/2

To see why your solution doesn't work, just look at the output if you order by the ordering in your window frame:
with base as (
select 1 as idx
, 2 as value
union
select 2 as idx
, 4 as value
union
select 3 as idx
, null as value
union
select 4 as idx
, null as value
union
select 5 as idx
, 10 as value
)
select idx, value from base
order by case when value is null then -1
else idx
end;
idx | value
-----+-------
3 |
4 |
1 | 2
2 | 4
5 | 10
The last_value() window function will pick the last value in the current frame. Without changing any of the frame defaults, this will be the current row.

Related

Add a columns who tags the max and min value of a column

I want to add an extra column, where the min and max values of each group (ID) will appear.
Here how the table looks like:
select ID, VALUE from mytable
ID VALUE
1 4
1 1
1 7
2 2
2 5
3 7
3 3
Here is the result I want to get:
ID VALUE min_max_values
1 4 NULL
1 1 min
1 7 max
2 2 min
2 5 max
3 7 max
3 3 min
4 1 both
5 2 min
5 3 max
Thank you for your help in advance!
You can use window functions and a case expression:
select id, value,
case
when value = min_value and min_value = max_value then 'both'
when value = min_value then 'min'
when value = max_value then 'max'
end as min_max_values
from (
select t.*,
min(value) over(partition by id) as min_value,
max(value) over(partition by id) as max_value
from mytable t
) t
The subquery is not strictly necessary, we could use the window min() and max() directly in the outer query. It is just there to avoid repeatedly typing the window function expressions in the outer query.

oracle dates group

How to get optimized query for this
date_one | date_two
------------------------
01.02.1999 | 31.05.2003
01.01.2004 | 01.01.2010
02.01.2010 | 10.10.2011
11.10.2011 | (null)
I need to get this
date_one | date_two | group
------------------------------------
01.02.1999 | 31.05.2003 | 1
01.01.2004 | 01.01.2010 | 2
02.01.2010 | 10.10.2011 | 2
11.10.2011 | (null) | 2
The group number is assigned as follows. Order the rows by date_one ascending. First row gets group = 1. Then for each row if date_one is the date immediately following date_two of the previous row, the group number stays the same as in the previous row, otherwise it increases by one.
You can do this using left join and a cumulative sum:
select t.*, sum(case when tprev.date_one is null then 1 else 0 end) over (order by t.date_one) as grp
from t left join
t tprev
on t.date_one = tprev.date_two + 1;
The idea is to find where the gaps begin (using the left join) and then do a cumulative sum of such beginnings to define the group.
If you want to be more inscrutable, you could write this as:
select t.*,
count(*) over (order by t.date_one) - count(tprev.date_one) over (order by t.date_one) as grp
from t left join
t tprev
on t.date_one = tprev.date_two + 1;
One way is using window function:
select
date_one,
date_two,
sum(x) over (order by date_one) grp
from (
select
t.*,
case when
lag(date_two) over (order by date_one) + 1 =
date_one then 0 else 1 end x
from t
);
It finds the date_two from the last row using analytic function lag and check if it in continuation with date_one from this row (in increasing order of date_one).
How it works:
lag(date_two) over (order by date_one)
(In the below explanation, when I say first, next, previous or last row, it's based on increasing order of date_one with null values at the end)
The above produces produces NULL for the first row as there is no row before it to get date_two from and previous row's date_two for the subsequent rows.
case when
lag(date_two)
over (order by date_one) + 1 = date_one then 0
else 1 end
Since, the lag produces NULL for the very first row (since NULL = anything expression always finally evaluates to false), output of case will be 1.
For further rows, similar check will be done to produce a new column x in the query output which has value 1 when the previous row's date_two is not in continuation with this row's date_one.
Then finally, we can do an incremental sum on x to find the required group values. See the value of x below for understanding:
SQL> with t (date_one,date_two) as (
2 select to_date('01.02.1999','dd.mm.yyyy'),to_date('31.05.2003','dd.mm.yyyy') from dual union
all
3 select to_date('01.01.2004','dd.mm.yyyy'),to_date('01.01.2010','dd.mm.yyyy') from dual union
all
4 select to_date('02.01.2010','dd.mm.yyyy'),to_date('10.10.2011','dd.mm.yyyy') from dual union
all
5 select to_date('11.10.2011','dd.mm.yyyy'),null from dual
6 )
7 select
8 date_one,
9 date_two,
10 x,
11 sum(x) over (order by date_one) grp
12 from (
13 select
14 t.*,
15 case when
16 lag(date_two) over (order by date_one) + 1 =
17 date_one then 0 else 1 end x
18 from t
19 );
DATE_ONE DATE_TWO X GRP
--------- --------- ---------- ----------
01-FEB-99 31-MAY-03 1 1
01-JAN-04 01-JAN-10 1 2
02-JAN-10 10-OCT-11 0 2
11-OCT-11 0 2
SQL>

SQL grouping interescting/overlapping rows

I have the following table in Postgres that has overlapping data in the two columns a_sno and b_sno.
create table data
( a_sno integer not null,
b_sno integer not null,
PRIMARY KEY (a_sno,b_sno)
);
insert into data (a_sno,b_sno) values
( 4, 5 )
, ( 5, 4 )
, ( 5, 6 )
, ( 6, 5 )
, ( 6, 7 )
, ( 7, 6 )
, ( 9, 10)
, ( 9, 13)
, (10, 9 )
, (13, 9 )
, (10, 13)
, (13, 10)
, (10, 14)
, (14, 10)
, (13, 14)
, (14, 13)
, (11, 15)
, (15, 11);
As you can see from the first 6 rows data values 4,5,6 and 7 in the two columns intersects/overlaps that need to partitioned to a group. Same goes for rows 7-16 and rows 17-18 which will be labeled as group 2 and 3 respectively.
The resulting output should look like this:
group | value
------+------
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
Assuming that all pairs exists in their mirrored combination as well (4,5) and (5,4). But the following solutions work without mirrored dupes just as well.
Simple case
All connections can be lined up in a single ascending sequence and complications like I added in the fiddle are not possible, we can use this solution without duplicates in the rCTE:
I start by getting minimum a_sno per group, with the minimum associated b_sno:
SELECT row_number() OVER (ORDER BY a_sno) AS grp
, a_sno, min(b_sno) AS b_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno;
This only needs a single query level since a window function can be built on an aggregate:
Get the distinct sum of a joined table column
Result:
grp a_sno b_sno
1 4 5
2 9 10
3 11 15
I avoid branches and duplicated (multiplicated) rows - potentially much more expensive with long chains. I use ORDER BY b_sno LIMIT 1 in a correlated subquery to make this fly in a recursive CTE.
Create a unique index on a non-unique column
Key to performance is a matching index, which is already present provided by the PK constraint PRIMARY KEY (a_sno,b_sno): not the other way round (b_sno, a_sno):
Is a composite index also good for queries on the first field?
WITH RECURSIVE t AS (
SELECT row_number() OVER (ORDER BY d.a_sno) AS grp
, a_sno, min(b_sno) AS b_sno -- the smallest one
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp
, (SELECT b_sno -- correlated subquery
FROM data
WHERE a_sno = c.sno
AND a_sno < b_sno
ORDER BY b_sno
LIMIT 1)
FROM cte c
WHERE c.sno IS NOT NULL
)
SELECT * FROM cte
WHERE sno IS NOT NULL -- eliminate row with NULL
UNION ALL -- no duplicates
SELECT grp, a_sno FROM t
ORDER BY grp, sno;
Less simple case
All nodes can be reached in ascending order with one or more branches from the root (smallest sno).
This time, get all greater sno and de-duplicate nodes that may be visited multiple times with UNION at the end:
WITH RECURSIVE t AS (
SELECT rank() OVER (ORDER BY d.a_sno) AS grp
, a_sno, b_sno -- get all rows for smallest a_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp, d.b_sno
FROM cte c
JOIN data d ON d.a_sno = c.sno
AND d.a_sno < d.b_sno -- join to all connected rows
)
SELECT grp, sno FROM cte
UNION -- eliminate duplicates
SELECT grp, a_sno FROM t -- add first rows
ORDER BY grp, sno;
Unlike the first solution, we don't get a last row with NULL here (caused by the correlated subquery).
Both should perform very well - especially with long chains / many branches. Result as desired:
SQL Fiddle (with added rows to demonstrate difficulty).
Undirected graph
If there are local minima that cannot be reached from the root with ascending traversal, the above solutions won't work. Consider Farhęg's solution in this case.
I want to say another way, it may be useful, you can do it in 2 steps:
1. take the max(sno) per each group:
select q.sno,
row_number() over(order by q.sno) gn
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
result:
sno gn
7 1
14 2
15 3
2. use a recursive cte to find all related members in groups:
with recursive cte(sno,gn,path,cycle)as(
select q.sno,
row_number() over(order by q.sno) gn,
array[q.sno],false
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
union all
select d.a_sno,c.gn,
d.a_sno || c.path,
d.a_sno=any(c.path)
from data d
join cte c on d.b_sno=c.sno
where not cycle
)
select distinct gn,sno from cte
order by gn,sno
Result:
gn sno
1 4
1 5
1 6
1 7
2 9
2 10
2 13
2 14
3 11
3 15
here is the demo of what I did.
Here is a start that may give some ideas on an approach. The recursive query starts with a_sno of each record and then tries to follow the path of b_sno until it reaches the end or forms a cycle. The path is represented by an array of sno integers.
The unnest function will break the array into rows, so a sno value mapped to the path array such as:
4, {6, 5, 4}
will be transformed to a row for each value in the array:
4, 6
4, 5
4, 4
The array_agg then reverses the operation by aggregating the values back into a path, but getting rid of the duplicates and ordering.
Now each a_sno is associated with a path and the path forms the grouping. dense_rank can be used to map the grouping (cluster) to a numeric.
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
Output:
cluster | sno
--------------+-----
{4,5,6,7} | 4
{4,5,6,7} | 5
{4,5,6,7} | 6
{4,5,6,7} | 7
{9,10,13,14} | 9
{9,10,13,14} | 10
{9,10,13,14} | 13
{9,10,13,14} | 14
{11,15} | 11
{11,15} | 15
(10 rows)
Wrap it one more time for the ranking:
SELECT dense_rank() OVER(order by cluster) AS rank
,sno
FROM (
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
) z
Output:
rank | sno
------+-----
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
(10 rows)

how to assign a rank for null values with previous first non-null value in oracle

I need to assign a rank to some null values over ordered rows.
My query is like this :
with sub as
(
select 10 as id, 1 as inx,2 as num from dual
union all
select 10 as id, 2 as inx,null as num from dual
union all
select 10 as id, 3 as inx,8 as num from dual
union all
select 10 as id, 4 as inx,null as num from dual
)
select *
from sub order by inx
and result set is like this :
id inx num
----------
10 1 2
10 2 null
10 3 8
10 4 null
i'm tring to set null values with previous first non-null value
for example : num null value should be "2" where inx = 2
and num null value should be "8" where inx = 4 and so on.
thx for any idea..
If you know that the values are increasing, you can just use max():
select id, inx, max(num) over (partition by id order by inx) as num
If they are not increasing and multiple nulls never appear in a sequence, you can use lag():
select id, inx,
(case when num is null
then lag(num) over (partition by id order by inx)
else num
end)as null;
If nulls do appear in a sequence, you can use the ignore nulls option to lag():
select id, inx,
(case when num is null
then lag(num ignore nulls) over (partition by id order by inx)
else num
end)as null

how to assign a rank for non null values in oracle

I need to assign a rank in such a way that it ignores null value.
select root_cause_desc,
case
when root_cause_desc is null
then null
else rank() over ( order by excess_value desc)
end gap_rank
from table.
where root_cause_desc is not null
gives
ROOT_CAUSE_DESC EXCESS_VALUE TOTAL_EXCESS_VALUE_WK GAP_RANK
advanced shipment 120.9750138 -760356.4054 10
dfdfdfdf222 0 -1696000.946 11
Root Cause -0.0760554 -760356.4054 12
test one more -656.277192 -760356.4054 13
earlier truck -77099.35 720093.3712 14
It ignores the null value and assign it the rank even for null root cause. I want gap_rank as 1,2,3,4. Please let me now how to do this.
The problem is that RANK() is independent of your case statement; it's ordering the entire query by the ORDER BY clause you give it.
Utilise the NULLS LAST keywords to put the NULL values at then end of the order and then your CASE statement will work. For instance:
with the_data as (
select level as a
, nullif(nullif(level, 5), 8) as b
from dual
connect by level <= 10
)
select a
, b
, case when b is null then null
else rank() over ( order by case when b is not null then 1
end nulls last
, a )
end as "rank"
from the_data
order by a;
A B rank
---------- ---------- ----------
1 1 1
2 2 2
3 3 3
4 4 4
5
6 6 5
7 7 6
8
9 9 7
10 10 8
10 rows selected.
SQL Fiddle
I think there is not need to put a check of root_cause_desc is null in the select clause.
The where ,group by order by clause executes first ,then the analytical function is processed.So ,before processing your rank , it will eliminates the null root_cause_desc.
WITH tab
AS (SELECT NULL root_cause, 5 AS val FROM DUAL
UNION ALL
SELECT 'A' root_cause, 1 AS val FROM DUAL
UNION ALL
SELECT NULL root_cause, 4 AS val FROM DUAL
UNION ALL
SELECT 'A' root_cause, 2 AS val FROM DUAL
UNION ALL
SELECT NULL root_cause, 3 AS val FROM DUAL)
SELECT root_cause, val, RANK () OVER (ORDER BY val DESC) rnk
FROM tab
WHERE root_cause IS NOT NULL;
root_casue val rnk
=========================
A 2 1
A 1 2
===========================