Finding increasing subsequence n times in sql - sql

Followed by my previous question on finding the increasing subsequence in a data set.
Finding out the increasing subsequence in sql
To obtain the result
x | y
-----+-----
94 | 985
469 | 865
525 | 842
610 | 587
765 | 579
from
x | y
-----+-----
94 | 985
73 | 940
469 | 865
115 | 864
366 | 862
525 | 842
448 | 837
318 | 832
507 | 826
244 | 758
217 | 741
207 | 732
54 | 688
426 | 605
108 | 604
610 | 587
142 | 581
765 | 579
102 | 572
I can apply the query
select x, y
from (select max(x) over (order by y desc) as x_max, x, max(y) over (order by x desc) as y_max, y
from table
order by y desc, x desc) t
where t.x = t.x_max and t.y = t.y_max
order by y desc, x
Now my question is, how can I perform this operation n times, i.e. finding the 2nd, 3rd, ..., nth increasing subsequence of x.
I know the general idea is to take the result of the first operation from the original table and perform the the query on the remaining points.
So in my example, after the first operation, we have the remaining points,
x | y
-----+-----
73 | 940
115 | 864
366 | 862
448 | 837
318 | 832
507 | 826
244 | 758
217 | 741
207 | 732
54 | 688
426 | 605
108 | 604
142 | 581
102 | 572
and apply the query again, we get
x | y
-----+-----
73 | 940
115 | 864
366 | 862
448 | 837
507 | 826
And perform the operation on
x | y
-----+-----
318 | 832
244 | 758
217 | 741
207 | 732
54 | 688
426 | 605
108 | 604
142 | 581
102 | 572
so on and so forth. I would also like to union all the points from these query searches and order them by y desc, i.e.
x | y
-----+-----
73 | 940
94 | 985
115 | 864
366 | 862
448 | 837
469 | 865
507 | 826
525 | 842
610 | 587
765 | 579

This is not trivial & far from optimal, but you can indeed do this with recursive CTEs:
with recursive r as(
(select x, y, (running_x_max <> x)::int grp, 0 iteration
from (select *, max(x) over (order by y desc) running_x_max
from xy) t
order by 3, 2 desc)
union all
(select x, y, grp + (running_x_max <> x)::int, iteration + 1
from (select *, max(x) over (order by y desc) running_x_max
from r
where grp > iteration) t
order by 3, 2 desc)
)
select x, y, grp
from r
where grp = iteration
order by 3, 2 desc, 1
http://rextester.com/JDYJ58330

Related

How to create a SQL query for the below scenario

I am using Snowflake SQL, but I guess this can be solved by any sql. So I have data like this:
RA_MEMBER_ID YEAR QUARTER MONTH Monthly_TOTAL_PURCHASE CATEGORY
1000 2020 1 1 105 CAT10
1000 2020 1 1 57 CAT13
1000 2020 1 2 107 CAT10
1000 2020 1 2 59 CAT13
1000 2020 1 3 109 CAT11
1000 2020 1 3 61 CAT14
1000 2020 2 4 111 CAT11
1000 2020 2 4 63 CAT14
1000 2020 2 5 113 CAT12
1000 2020 2 5 65 CAT15
1000 2020 2 6 115 CAT12
1000 2020 2 6 67 CAT15
And I need data like this:
RA_MEMBER_ID YEAR QUARTER MONTH Monthly_TOTAL_PURCHASE CATEGORY Monthly_rank Quarterly_Total_purchase Quarter_category Quarter_rank Yearly_Total_purchase Yearly_category Yearly_rank
1000 2020 1 1 105 CAT10 1 105 CAT10 1 105 CAT10 1
1000 2020 1 1 57 CAT13 2 57 CAT13 2 57 CAT13 2
1000 2020 1 2 107 CAT10 1 212 CAT10 1 212 CAT10 1
1000 2020 1 2 59 CAT13 2 116 CAT13 2 116 CAT13 2
1000 2020 1 3 109 CAT11 1 212 CAT10 1 212 CAT10 1
1000 2020 1 3 61 CAT14 2 116 CAT13 2 116 CAT13 2
1000 2020 2 4 111 CAT11 1 111 CAT11 1 212 CAT10 1
1000 2020 2 4 63 CAT14 2 63 CAT14 2 124 CAT14 2
1000 2020 2 5 113 CAT12 1 113 CAT12 1 212 CAT10 1
1000 2020 2 5 65 CAT15 2 65 CAT15 2 124 CAT14 2
1000 2020 2 6 115 CAT12 1 228 CAT12 1 228 CAT12 1
1000 2020 2 6 67 CAT15 2 132 CAT15 2 132 CAT15 2
So basically, I have the top two categories by purchase amount for the first 6 months. I need the same for quarterly based on which month of the quarter it is. So let's say it is February, then the top 2 categories and amounts should be calculated based on both January and February. For March we have to get the quarter data by taking all three months. From April it will be the same as monthly rank, for May again calculate based on April and May. Similarly for Yearly also.
I have tried a lot of things but nothing seems to give me what I want.
The solution should be generic enough because there can be many other months and years.
I really need help in this.
Not sure if below is what you are after. I assume that everything is category based:
create or replace table test (
ra_member_id int,
year int,
quarter int,
month int,
monthly_purchase int,
category varchar
);
insert into test values
(1000, 2020, 1,1, 105, 'cat10'),
(1000, 2020, 1,1, 57, 'cat13'),
(1000, 2020, 1,2, 107, 'cat10'),
(1000, 2020, 1,2, 59, 'cat13'),
(1000, 2020, 1,3, 109, 'cat11'),
(1000, 2020, 1,3, 61, 'cat14'),
(1000, 2020, 2,4, 111, 'cat11'),
(1000, 2020, 2,4, 63, 'cat14'),
(1000, 2020, 2,5, 113, 'cat12'),
(1000, 2020, 2,5, 65, 'cat15'),
(1000, 2020, 2,6, 115, 'cat12'),
(1000, 2020, 2,6, 67, 'cat15');
WITH BASE as (
select
RA_MEMBER_ID,
YEAR,
QUARTER,
MONTH,
CATEGORY,
MONTHLY_PURCHASE,
LAG(MONTHLY_PURCHASE) OVER (PARTITION BY QUARTER, CATEGORY ORDER BY MONTH) AS QUARTERLY_PURCHASE_LAG,
IFNULL(QUARTERLY_PURCHASE_LAG, 0) + MONTHLY_PURCHASE AS QUARTERLY_PURCHASE,
LAG(MONTHLY_PURCHASE) OVER (PARTITION BY YEAR, CATEGORY ORDER BY MONTH) AS YEARLY_PURCHASE_LAG,
IFNULL(YEARLY_PURCHASE_LAG, 0) + MONTHLY_PURCHASE AS YEARLY_PURCHASE
FROM
TEST
),
BASE_RANK AS (
SELECT
RA_MEMBER_ID,
YEAR,
QUARTER,
MONTH,
CATEGORY,
MONTHLY_PURCHASE,
RANK() OVER (PARTITION BY MONTH ORDER BY MONTHLY_PURCHASE DESC) as MONTHLY_RANK,
QUARTERLY_PURCHASE,
RANK() OVER (PARTITION BY QUARTER ORDER BY QUARTERLY_PURCHASE DESC) as QUARTERLY_RANK,
YEARLY_PURCHASE,
RANK() OVER (PARTITION BY YEAR ORDER BY YEARLY_PURCHASE DESC) as YEARLY_RANK
FROM BASE
),
MAIN AS (
SELECT
RA_MEMBER_ID,
YEAR,
QUARTER,
MONTH,
CATEGORY,
MONTHLY_PURCHASE,
MONTHLY_RANK,
QUARTERLY_PURCHASE,
QUARTERLY_RANK,
YEARLY_PURCHASE,
YEARLY_RANK
FROM BASE_RANK
)
SELECT * FROM MAIN
ORDER BY YEAR, QUARTER, MONTH
;
Result:
+--------------+------+---------+-------+----------+------------------+--------------+--------------------+----------------+-----------------+-------------+
| RA_MEMBER_ID | YEAR | QUARTER | MONTH | CATEGORY | MONTHLY_PURCHASE | MONTHLY_RANK | QUARTERLY_PURCHASE | QUARTERLY_RANK | YEARLY_PURCHASE | YEARLY_RANK |
|--------------+------+---------+-------+----------+------------------+--------------+--------------------+----------------+-----------------+-------------|
| 1000 | 2020 | 1 | 1 | cat10 | 105 | 1 | 105 | 4 | 105 | 9 |
| 1000 | 2020 | 1 | 1 | cat13 | 57 | 2 | 57 | 6 | 57 | 12 |
| 1000 | 2020 | 1 | 2 | cat10 | 107 | 1 | 212 | 1 | 212 | 3 |
| 1000 | 2020 | 1 | 2 | cat13 | 59 | 2 | 116 | 2 | 116 | 6 |
| 1000 | 2020 | 1 | 3 | cat11 | 109 | 1 | 109 | 3 | 109 | 8 |
| 1000 | 2020 | 1 | 3 | cat14 | 61 | 2 | 61 | 5 | 61 | 11 |
| 1000 | 2020 | 2 | 4 | cat11 | 111 | 1 | 111 | 4 | 220 | 2 |
| 1000 | 2020 | 2 | 4 | cat14 | 63 | 2 | 63 | 6 | 124 | 5 |
| 1000 | 2020 | 2 | 5 | cat12 | 113 | 1 | 113 | 3 | 113 | 7 |
| 1000 | 2020 | 2 | 5 | cat15 | 65 | 2 | 65 | 5 | 65 | 10 |
| 1000 | 2020 | 2 | 6 | cat12 | 115 | 1 | 228 | 1 | 228 | 1 |
| 1000 | 2020 | 2 | 6 | cat15 | 67 | 2 | 132 | 2 | 132 | 4 |
+--------------+------+---------+-------+----------+------------------+--------------+--------------------+----------------+-----------------+-------------+

SQL 2008 Running average with group by

So i have a table containing below columns.
I want to compute an running average from positiondate and for example 3 days back, grouped on dealno.
I know how to do with "case by" but problem is that I have around 200 different DealNo so I do not want to write an own case by clause for every deal.
On dealNo 1 it desired output should be Average(149 243 440 + 149 224 446 + 149 243 451)
DealNo PositionDate MarketValue
1 | 2016-11-27 | 149 243 440
2 | 2016-11-27 | 21 496 418
3 | 2016-11-27 | 32 249 600
1 | 2016-11-26 | 149 243 446
2 | 2016-11-26 | 21 496 418
3 | 2016-11-26 | 32 249 600
1 | 2016-11-25 | 149 243 451
3 | 2016-11-25 | 32 249 600
2 | 2016-11-25 | 21 496 418
3 | 2016-11-24 | 32 249 600
1 | 2016-11-24 | 149 225 582
2 | 2016-11-24 | 21 498 120
1 | 2016-11-23 | 149 256 867
2 | 2016-11-23 | 21 504 181
3 | 2016-11-23 | 32 253 440
1 | 2016-11-22 | 149 256 873
2 | 2016-11-22 | 21 506 840
3 | 2016-11-22 | 32 253 440
1 | 2016-11-21 | 149 234 535
2 | 2016-11-21 | 21 509 179
3 | 2016-11-21 | 32 253 600
I tried below script but it was not very effective since my table contains around 300k rows and approx 200 different dealno.
Is there a more effective way to do this in SQL 2008?
with cte as (
SELECT ROW_NUMBER() over(order by dealno, positiondate desc) as Rownr,
dealno,
positiondate,
Currency,
MvCleanCcy
FROM T1
)
select
rownr, positiondate, DealNo, Currency,
mvcleanavg30d = (select avg(MvCleanCcy) from cte2 where Rownr between c.Rownr and c.Rownr+3)
from cte as c
You don't need window functions. You can do this using outer apply:
select t1.*, tt1.marketvalue_3day
from t1 outer apply
(select avg(tt1.marketvalue) as marketvalue_3day
from (select top 3 tt1.*
from t1 tt1
where tt1.deal1 = t1.deal1 and
tt1.positiondate <= t1.positiondate
order by tt1.positiondate desc
) tt1
) tt1;

Crosstab or transpose query

I have a result set from a query like this
mon-yar Count EB VC
Apr-11 34 1237 428
May-11 54 9834 87
Jun-11 23 9652 235
Jul-11 567 10765 1278
Aug-11 36 10234 1092
Sep-11 78 8799 987
Oct-11 23 10923 359
Nov-11 45 11929 346
Dec-11 67 9823 874
Jan-12 45 2398 245
Feb-12 90 3487 937
Mar-12 123 7532 689
Apr-12 109 1256 165
What I wish is this:
monthyear Apr-11 May-11 Jun-11 Jul-11 Aug-11 Sep-11 Oct-11 Nov-11 Dec-11 Jan-12 Feb-12 Mar-12 Apr-12
Count 34 54 23 567 36 78 23 45 67 45 90 123 109
EB 1237 9834 9652 10765 10234 8799 10923 11929 9823 2398 3487 7532 1256
VC 428 87 235 1278 1092 987 359 346 874 245 937 689 165
The Month Year values are dynamic. What can I do to generate it this way?
If you don't want to use PIVOT, you can use the solution below, as long as you don't mind using text to columns in Excel upon the result.
If you were to run:
with tbl as(
select 'Apr-11' as monyar, 34 as cnt, 1237 as eb, 428 as vc from dual union all
select 'May-11' as monyar, 54 as cnt, 9834 as eb, 87 as vc from dual union all
select 'Jun-11' as monyar, 23 as cnt, 9652 as eb, 235 as vc from dual union all
select 'Jul-11' as monyar, 567 as cnt, 10765 as eb, 1278 as vc from dual union all
select 'Aug-11' as monyar, 36 as cnt, 10234 as eb, 1092 as vc from dual union all
select 'Sep-11' as monyar, 78 as cnt, 8799 as eb, 987 as vc from dual union all
select 'Oct-11' as monyar, 23 as cnt, 10923 as eb, 359 as vc from dual union all
select 'Nov-11' as monyar, 45 as cnt, 11929 as eb, 346 as vc from dual union all
select 'Dec-11' as monyar, 67 as cnt, 9823 as eb, 874 as vc from dual union all
select 'Jan-12' as monyar, 45 as cnt, 2398 as eb, 245 as vc from dual union all
select 'Feb-12' as monyar, 90 as cnt, 3487 as eb, 937 as vc from dual union all
select 'Mar-12' as monyar, 123 as cnt, 7532 as eb, 689 as vc from dual union all
select 'Apr-12' as monyar, 109 as cnt, 1256 as eb, 165 as vc from dual
)
select 'Month' as lbl, listagg(monyar,' | ') within group (order by monyar) as list from tbl
union all
select 'Count' as lbl, listagg(cnt,' | ') within group (order by monyar) as list from tbl
union all
select 'EB' as lbl, listagg(eb,' | ') within group (order by monyar) as list from tbl
union all
select 'VC' as lbl, listagg(vc,' | ') within group (order by monyar) as list from tbl
Result:
LBL LIST
Month Apr-11 | Apr-12 | Aug-11 | Dec-11 | Feb-12 | Jan-12 | Jul-11 | Jun-11 | Mar-12 | May-11 | Nov-11 | Oct-11 | Sep-11
Count 34 | 109 | 36 | 67 | 90 | 45 | 567 | 23 | 123 | 54 | 45 | 23 | 78
EB 1237 | 1256 | 10234 | 9823 | 3487 | 2398 | 10765 | 9652 | 7532 | 9834 | 11929 | 10923 | 8799
VC 428 | 165 | 1092 | 874 | 937 | 245 | 1278 | 235 | 689 | 87 | 346 | 359 | 987
Using the pipe as the delimitter you can then split the 2nd column into however many columns there are.
LISTAGG is an Oracle function and I'm not sure there is a 1:1 equivalent in sql server, so you would have to mimic the vertical concatenation one way or another, if it has to be run in sql server.

SQL Server partitioning when null

I have a sql server table like this:
Value RowID Diff
153 48 1
68 49 1
50 57 NULL
75 58 1
65 59 1
70 63 NULL
66 64 1
79 66 NULL
73 67 1
82 68 1
85 69 1
66 70 1
118 88 NULL
69 89 1
67 90 1
178 91 1
How can I make it like this (note the partition after each null in 3rd column):
Value RowID Diff
153 48 1
68 49 1
50 57 NULL
75 58 2
65 59 2
70 63 NULL
66 64 3
79 66 NULL
73 67 4
82 68 4
85 69 4
66 70 4
118 88 NULL
69 89 5
67 90 5
178 91 5
It looks like you are partitioning over sequential values of RowID. There is a trick to do this directly by grouping on RowID - Row_Number():
select
value,
rowID,
Diff,
RowID - row_number() over (order by RowID) Diff2
from
Table1
Notice how this gets you similar groupings, except with distinct Diff values (in Diff2):
| VALUE | ROWID | DIFF | DIFF2 |
|-------|-------|--------|-------|
| 153 | 48 | 1 | 47 |
| 68 | 49 | 1 | 47 |
| 50 | 57 | (null) | 54 |
| 75 | 58 | 1 | 54 |
| 65 | 59 | 1 | 54 |
| 70 | 63 | (null) | 57 |
| 66 | 64 | 1 | 57 |
| 79 | 66 | (null) | 58 |
| 73 | 67 | 1 | 58 |
| 82 | 68 | 1 | 58 |
| 85 | 69 | 1 | 58 |
| 66 | 70 | 1 | 58 |
| 118 | 88 | (null) | 75 |
| 69 | 89 | 1 | 75 |
| 67 | 90 | 1 | 75 |
| 178 | 91 | 1 | 75 |
Then to get ordered values for Diff, you can use Dense_Rank() to produce a numbering over each separate partition - except when a value is Null:
select
value,
rowID,
case when Diff = 1
then dense_rank() over (order by Diff2)
else Diff end as Diff
from (
select
value,
rowID,
Diff,
RowID - row_number() over (order by RowID) Diff2
from
Table1
) T
The result is the expected result, except keyed off of RowID directly rather than off of the existing Diff column.
| VALUE | ROWID | DIFF |
|-------|-------|--------|
| 153 | 48 | 1 |
| 68 | 49 | 1 |
| 50 | 57 | (null) |
| 75 | 58 | 2 |
| 65 | 59 | 2 |
| 70 | 63 | (null) |
| 66 | 64 | 3 |
| 79 | 66 | (null) |
| 73 | 67 | 4 |
| 82 | 68 | 4 |
| 85 | 69 | 4 |
| 66 | 70 | 4 |
| 118 | 88 | (null) |
| 69 | 89 | 5 |
| 67 | 90 | 5 |
| 178 | 91 | 5 |

sum by rows (include all columns) in Oracle

I have this structure of table
-----------------------------------------------------------------
ID-Prod | Monday | Tuesday | Wednesday | Thursday | Friday |....
-----------------------------------------------------------------
012 213 879 516 213 435
013 953 837 361 862
014 123 583 879 519 573
015 963
016 798
ID-Prod is the primary key, i would like to sum all the values for each product
for example
---------------------------------------------------------------------------------------
ID-Prod | Monday | Tuesday | Wednesday | Thursday | Friday |....| Total
---------------------------------------------------------------------------------------
012 213 879 516 213 435 213+879+516+213+435
013 953 837 361 862 .....
014 123 583 879 519 573 .....
015 963
016 798
Thanks
SELECT t.*,
COALESCE(Monday, 0) +
COALESCE(Tuesday, 0) +
COALESCE(Wednesday, 0) +
COALESCE(Thursday, 0) +
COALESCE(Friday, 0) AS total
FROM mytable t