How to get optimized query for this
date_one | date_two
------------------------
01.02.1999 | 31.05.2003
01.01.2004 | 01.01.2010
02.01.2010 | 10.10.2011
11.10.2011 | (null)
I need to get this
date_one | date_two | group
------------------------------------
01.02.1999 | 31.05.2003 | 1
01.01.2004 | 01.01.2010 | 2
02.01.2010 | 10.10.2011 | 2
11.10.2011 | (null) | 2
The group number is assigned as follows. Order the rows by date_one ascending. First row gets group = 1. Then for each row if date_one is the date immediately following date_two of the previous row, the group number stays the same as in the previous row, otherwise it increases by one.
You can do this using left join and a cumulative sum:
select t.*, sum(case when tprev.date_one is null then 1 else 0 end) over (order by t.date_one) as grp
from t left join
t tprev
on t.date_one = tprev.date_two + 1;
The idea is to find where the gaps begin (using the left join) and then do a cumulative sum of such beginnings to define the group.
If you want to be more inscrutable, you could write this as:
select t.*,
count(*) over (order by t.date_one) - count(tprev.date_one) over (order by t.date_one) as grp
from t left join
t tprev
on t.date_one = tprev.date_two + 1;
One way is using window function:
select
date_one,
date_two,
sum(x) over (order by date_one) grp
from (
select
t.*,
case when
lag(date_two) over (order by date_one) + 1 =
date_one then 0 else 1 end x
from t
);
It finds the date_two from the last row using analytic function lag and check if it in continuation with date_one from this row (in increasing order of date_one).
How it works:
lag(date_two) over (order by date_one)
(In the below explanation, when I say first, next, previous or last row, it's based on increasing order of date_one with null values at the end)
The above produces produces NULL for the first row as there is no row before it to get date_two from and previous row's date_two for the subsequent rows.
case when
lag(date_two)
over (order by date_one) + 1 = date_one then 0
else 1 end
Since, the lag produces NULL for the very first row (since NULL = anything expression always finally evaluates to false), output of case will be 1.
For further rows, similar check will be done to produce a new column x in the query output which has value 1 when the previous row's date_two is not in continuation with this row's date_one.
Then finally, we can do an incremental sum on x to find the required group values. See the value of x below for understanding:
SQL> with t (date_one,date_two) as (
2 select to_date('01.02.1999','dd.mm.yyyy'),to_date('31.05.2003','dd.mm.yyyy') from dual union
all
3 select to_date('01.01.2004','dd.mm.yyyy'),to_date('01.01.2010','dd.mm.yyyy') from dual union
all
4 select to_date('02.01.2010','dd.mm.yyyy'),to_date('10.10.2011','dd.mm.yyyy') from dual union
all
5 select to_date('11.10.2011','dd.mm.yyyy'),null from dual
6 )
7 select
8 date_one,
9 date_two,
10 x,
11 sum(x) over (order by date_one) grp
12 from (
13 select
14 t.*,
15 case when
16 lag(date_two) over (order by date_one) + 1 =
17 date_one then 0 else 1 end x
18 from t
19 );
DATE_ONE DATE_TWO X GRP
--------- --------- ---------- ----------
01-FEB-99 31-MAY-03 1 1
01-JAN-04 01-JAN-10 1 2
02-JAN-10 10-OCT-11 0 2
11-OCT-11 0 2
SQL>
Related
I have the following example dataset:
ID
Value
Row index (for reference purposes only, does not need to exist in final output)
a
4
1
a
7
2
a
12
3
a
12
4
a
13
5
b
1
6
b
2
7
b
3
8
b
4
9
b
5
10
I would like to write a SQL script that returns the next row which has a Value of N or more than the previously returned row starting from the first row per ID and ordered ascending by [Value]. An example of the final table for N = 3 should look like the following:
ID
Value
Row index
a
4
1
a
7
2
a
12
3
b
1
6
b
4
9
Can this script be written in a vectorised manner? Or must a loop be utilised? Any advice would be greatly appreciated. Thanks!
SQL tables represent unordered sets. There is no definition of "previous" value, unless you have a column that specifies the ordering. With such a column, you can use lag():
select t.*
from (select t.*,
lag(value) over (partition by id order by <ordering column>) as prev_value
from t
) t
where prev_value is null or prev_value <= value - 3;
EDIT:
I think I misunderstood what you want to do. You seem to want to start with the first row for each id. Then get the next row that is 3 or higher in value. Then hold onto that value and get the next that is 3 or higher than that. And so on.
You can do this in SQL using a recursive CTE:
with ts as (
select distinct t.id, t.value, dense_rank() over (partition by id order by value) as seqnum
from t
),
cte as (
select id, value, value as grp_value, 1 as within_seqnum, seqnum
from ts
where seqnum = 1
union all
select ts.id, ts.value,
(case when ts.value >= cte.grp_value + 3 then ts.value else cte.grp_value end),
(case when ts.value >= cte.grp_value + 3 then 1 else cte.within_seqnum + 1 end),
ts.seqnum
from cte join
ts
on ts.id = cte.id and ts.seqnum = cte.seqnum + 1
)
select *
from cte
where within_seqnum = 1
order by id, value;
Here is a db<>fiddle.
Row Input Output Output Explanation
1 14.93 6 6 because input value on rows 2 to 7 are smaller than row 1
2 9.74 0 0 because input value on row 3 is larger than row 2
3 12.89 0 0 because input value on row 4 is larger than row 3
4 13.09 2 2 because input value on rows 5 to 6 are smaller than row 4
5 7.84 0 0 because input value on row 6 is larger than row 5
6 12.81 0 0 because input value on row 7 is larger than row 6
7 13.15 0 0 because input value on row 8 is larger than row 7
8 18.15 0 0 because input value in row 8 is last in series
Please can you help me with defining the SQL server code for the logic in the table?
I have tried a number of different approaches including recursive CTEs, CAST, LEAD… OVER..., etc. My SQL skills are not up to this challenge, which seems to be easy to describe in words, but difficult to code!
Please not the logic in the last row is different from the rest.
MAX output value should be 244.
declare #t table
(
Row int,
Input decimal(5,2)
);
insert into #t(Row, Input)
values
(1, 14.93),
(2, 9.74),
(3, 12.89),
(4, 13.09),
(5, 7.84),
(6, 12.81),
(7, 13.15),
(8, 18.15);
select *,
case
when lead(a.Input) over(order by a.Row) < a.Input then
(
select count(*) - count(xyz)
from
(
select case when b.Input < a.Input then null else b.Input end as xyz
from #t as b
where b.Row > a.Row
) as c
)
else 0
end as Output
from #t as a;
I don't think this can easily be done with window functions. We need to iterate for each original row, while keeping track of the original value.
I would use a recursive query here:
with
data as (select t.*, row_number() over(order by row) rn from mytable t),
cte as (
select row, rn, input, 0 as output from data
union all
select c.row, d.rn, c.input, c.output + 1
from cte c
inner join data d on d.rn = c.rn + 1 and d.input < c.input
)
select input, max(output) as output
from cte
group by row, input
order by row
For each row, the logic is to iteratively check the following rows. It the following value is smaller than the one on the original row, we increment the output counter; if it is not, the recursion stops for that row. Then all that is left to do is keep the greatest counter per original row.
Demo on DB Fiddle:
input | output
----: | -----:
14.93 | 6
9.74 | 0
12.89 | 0
13.09 | 2
7.84 | 0
12.81 | 0
13.15 | 0
18.15 | 0
You can do this with apply:
with t as (
select t.*, row_number() over (order by row) as seqnum,
1 + count(*) over () as cnt
from mytable t
)
select t.*, coalesce(coalesce(t2.min_seqnum, t.cnt) - t.seqnum - 1, 0) as output
from t outer apply
(select min(t2.seqnum) as min_seqnum
from t t2
where t2.row > t.row and t2.input > t.input
) t2
order by row;
The idea is to find the next row that is bigger than the current row. The slight complication (why cnt is needed) is in case there is no larger row.
Here is a db<>fiddle.
You can use sub-query as follows:
WITH CTE AS
(SELECT T.*,
ROW_NUMBER() OVER (ORDER BY ROW) AS RN
FROM YOUR_TABLE T)
SELECT C.ROW, C.INPUT,
COALESCE((SELECT MIN(CC.RN) - C.RN - 1
FROM CTE CC
WHERE CC.INPUT > C.INPUT AND CC.RN > C.RN)
, 0) AS OUTPUT
FROM CTE C;
I have table like this:
value nextValue
1 2
2 3
3 20
20 21
21 22
22 23
23 NULL
Value is ordered ASC, nextValue is next row Value.
requirement is group by with condition nextValue-value>10, and count how many values in different groups.
For example, there should be two groups (1,2,3) and (20,21,22,23), first group count is 3, the second group count is 4.
I'm trying to mark each group with unique number, so I could group by these marked nums
value nextValue mark
1 2 1
2 3 1
3 20 1
20 21 2
21 22 2
22 23 2
23 NULL 2
But I don't know how to write mark column, I need an autocrement variable when nextValue-value>10.
Can I make it happen in Hive? Or there's better solution for the requirement?
If I understand correctly, you can use a cumulative sum. The idea is to set a flag when next_value - value > 10. This identifies the groups. So, this query adds a group number:
select t.*,
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc) as mark
from t
order by value;
You might not find this solution satisfying, because the numbering is in descending order. So, a bit more arithmetic fixes that:
select t.*,
(sum(case when nextvalue > value + 10 then 1 else 0 end) over () + 1 -
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc)
) as mark
from t
order by value;
Here is a db<>fiddle.
Calculate previous value, then calculate new_group_flag if value-prev_value >10, then calculate cumulative sum of new_group_flag to get group number (mark). Finally you can calculate group count using analytics function or group-by (in my example analytics count is used to show you the full dataset with all intermediate calculations). See comments in the code.
Demo:
with your_data as (--use your table instead of this
select stack(10, --the number of tuples generated
1 ,
2 ,
3 ,
20 ,
21 ,
22 ,
23 ,
40 ,
41 ,
42
) as value
)
select --4. Calculate group count, etc, etc
value, prev_value, new_group_flag, group_number,
count(*) over(partition by group_number) as group_count
from
(
select --3. Calculate cumulative sum of new group flag to get group number
value, prev_value, new_group_flag,
sum(new_group_flag) over(order by value rows between unbounded preceding and current row)+1 as group_number
from
(
select --2. calculate new_group_flag
value, prev_value, case when value-prev_value >10 then 1 else 0 end as new_group_flag
from
(
select --1 Calculate previous value
value, lag(value) over(order by value) prev_value
from your_data
)s
)s
)s
Result:
value prev_value new_group_flag group_number group_count
1 \N 0 1 3
2 1 0 1 3
3 2 0 1 3
20 3 1 2 4
21 20 0 2 4
22 21 0 2 4
23 22 0 2 4
40 23 1 3 3
41 40 0 3 3
42 41 0 3 3
This works for me
It needs "rows between unbounded preceding and current row" in my case.
select t.*,
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc rows between unbounded preceding and current row) as mark
from t
order by value;
I know this already been asked, but why doesn't the solution below work? I want to fill value with the last non-null value ordered by idx.
What I see:
idx | coalesce
-----+----------
1 | 2
2 | 4
3 |
4 |
5 | 10
(5 rows)
What I want:
idx | coalesce
-----+----------
1 | 2
2 | 4
3 | 4
4 | 4
5 | 10
(5 rows)
Code:
with base as (
select 1 as idx
, 2 as value
union
select 2 as idx
, 4 as value
union
select 3 as idx
, null as value
union
select 4 as idx
, null as value
union
select 5 as idx
, 10 as value
)
select idx
, coalesce(value
, last_value(value) over (order by case when value is null then -1
else idx
end))
from base
order by idx
What you want is lag(ignore nulls). Here is one way to do what you want, using two window functions. The first defines the grouping for the NULL values and the second assigns the value:
select idx, value, coalesce(value, max(value) over (partition by grp))
from (select b.*, count(value) over (order by idx) as grp
from base b
) b
order by idx;
You can also do this without subqueries by using arrays. Basically, take the last element not counting NULLs:
select idx, value,
(array_remove(array_agg(value) over (order by idx), null))[count(value) over (order by idx)]
from base b
order by idx;
Here is a db<>fiddle.
Well the last_value here doesn't make sense to me unless you can point out to me. Looking at the example you need the last non value which you can get it by:
I am forming a group with the nulls and previous non null value so that I can get the first non value.
with base as (
select 1 as idx , 2 as value union
select 2 as idx, -14 as value union
select 3 as idx , null as value union
select 4 as idx , null as value union
select 5 as idx , 1 as value
)
Select idx,value,
first_value(value) Over(partition by rn) as new_val
from(
select idx,value
,sum(case when value is not null then 1 end) over (order by idx) as rn
from base
) t
here is the code
http://sqlfiddle.com/#!15/fcda4/2
To see why your solution doesn't work, just look at the output if you order by the ordering in your window frame:
with base as (
select 1 as idx
, 2 as value
union
select 2 as idx
, 4 as value
union
select 3 as idx
, null as value
union
select 4 as idx
, null as value
union
select 5 as idx
, 10 as value
)
select idx, value from base
order by case when value is null then -1
else idx
end;
idx | value
-----+-------
3 |
4 |
1 | 2
2 | 4
5 | 10
The last_value() window function will pick the last value in the current frame. Without changing any of the frame defaults, this will be the current row.
I have table as following :
Table 1 ( Column1,column2,column3,date)
I want to get count(column 1) & sum(column2) and the same count / sum when date is greater than the minimum date value in date column.
Although the 4 required value can be controlled by one condition on column 3.
Column 1 , Column 2 , Column 3 , Date
A 1 1 5/5/2016
G 5 0 5/10/2016
B 1 2 5/10/2016
A 12 1 5/10/2016
D 1 1 5/5/2016
A 1 1 5/11/2016
C 7 1 5/5/2016
C 1 1 5/12/2016
E 10 2 5/10/2016
I want when filter on column 3 = 1 get the following result :
Count (1) , Sum(2) , Count(1) when date greater than minimum ,Count(2) when date greater than minimum
3 , 23 , 2 , 14
I tried to use case put I don't need to group by values.
how can I generate query fulfill above requirements in oracle
You can use case. As you describe:
select count(column1), sum(column2),
count(case when date > mindate then column1 end),
sum(case when date > mindate then column2 end)
from (select t.*, min(date) over () as mindate
from table1 t
) t;
This uses a window function to get the minimum date, which is used in the case.