SQL Implementation with Sliding-Window functions or Recursive CTEs - sql

I have a problem that it's very easy to be solved in C# code for example, but I have no idea how to write in a SQL query with Recursive CTE-s or Sliding-Windows functions.
Here is the situation: let's say I have a table with 3 columns (ID, Date, Amount), and here is some data:
ID Date Amount
-----------------------
1 01.01.2016 -500
2 01.02.2016 1000
3 01.03.2016 -200
4 01.04.2016 300
5 01.05.2016 500
6 01.06.2016 1000
7 01.07.2016 -100
8 01.08.2016 200
The result I want to get from the table is this (ID, Amount .... Order By Date):
ID Amount
-----------------------
2 300
4 300
5 500
6 900
8 200
The idea is to distribute the amounts into installments, for each client separately, but the thing is when negative amount comes into play you need to remove amount from the last installment. I don't know how clear I am, so here is an example:
Let's say I have 3 Invoices for one client with amounts 500, 200, -300.
If i start distribute these Invoices, first i distribute the amount 500, then 200. But when i come to the third one -300, then i need to remove from the last Invoice. In other words 200 - 300 = -100, so the amount from second Invoice will disappear, but there are still -100 that needs to be substracted from first Invoice. So 500 - 100 = 400. The result i need is result with one row (first invoice with amount 400)
Another example when the first invoice is with negative amount (-500, 300, 500).
In this case, the first (-500) invoice will make the second disappear and another 200 will be substracted from the third. So the result will be: Third Invoice with amount 300.
This is something like Stack implementation in programming language, but i need to make it with sliding-window functions in SQL Server.
I need an implementation with Sliding Function or Recursive CTEs.
Not with cycles ...
Thanks.

Ok, think this is what you want. there are two recursive queries. One for upward propagation and the second one for the downward propagation.
with your_data_rn as
(
select *, row_number() over (order by date) rn
from your_data
), rcte_up(id, date, ammount, rn, running) as
(
select *, ammount as running
from your_data_rn
union all
select d.*,
d.ammount + rcte_up.running
from your_data_rn d
join rcte_up on rcte_up.running < 0 and d.rn = rcte_up.rn - 1
), data2 as
(
select id, date, min(running) ammount,
row_number() over (order by date) rn
from rcte_up
group by id, date, rn
having min(running) > 0 or rn = 1
), rcte_down(id, date, ammount, rn, running) as
(
select *, ammount as running
from data2
union all
select d.*, d.ammount + rcte_down.running
from data2 d
join rcte_down on rcte_down.running < 0 and d.rn = rcte_down.rn + 1
)
select id, date, min(running) ammount
from rcte_down
group by id, date
having min(running) > 0
demo
I can imagine that you use just the upward propagation and the downward propagation of the first row is done in some procedural language. Downward propagation is one scan through few first rows, therefore, the recursive query may be a hammer on a mosquito.

I add client ID in table for more general solution. Then I implemented the stack stored as XML in query field. And emulated a program cycle with Recursive-CTE:
with Data as( -- Numbering rows for iteration on CTE
select Client, id, Amount,
cast(row_number() over(partition by Client order by Date) as int) n
from TabW
),
CTE(Client, n, stack) as( -- Recursive CTE
select Client, 1, cast(NULL as xml) from Data where n=1
UNION ALL
select D.Client, D.n+1, (
-- Stack operations to process current row (D)
select row_number() over(order by n) n,
-- Use calculated amount in first positive and oldest stack cell
-- Else preserve value stored in stack
case when n=1 or (n=0 and last=1) then new else Amount end Amount,
-- Set ID in stack cell for positive and new data
case when n=1 and D.Amount>0 then D.id else id end id
from (
select Y.Amount, Y.id, new,
-- Count positive stack entries
sum(case when new<=0 or (n=0 and Amount<0) then 0 else 1 end) over (order by n) n,
row_number() over(order by n desc) last -- mark oldest stack cell by 1
from (
select X.*,sum(Amount) over(order by n) new
from (
select case when C.stack.value('(/row/#Amount)[1]','int')<0 then -1 else 0 end n,
D.Amount, D.id -- Data from new record
union all -- And expand current stack in XML to table
select node.value('#n','int') n, node.value('#Amount','int'), node.value('#id','int')
from C.stack.nodes('//row') N(node)
) X
) Y where n>=0 -- Suppress new cell if the stack contained a negative amount
) Z
where n>0 or (n=0 and last=1)
for xml raw, type
)
from Data D, CTE C
where D.n=C.n and D.Client=C.Client
) -- Expand stack into result table
select CTE.Client, node.value('#id','int') id, node.value('#Amount','int')
from CTE join (select Client, max(n) max_n from Data group by Client) X on CTE.Client=X.Client and CTE.n=X.max_n+1
cross apply stack.nodes('//row') N(node)
order by CTE.Client, node.value('#n','int') desc
Test on sqlfiddle.com
I think this method is slower than #RadimBača. And it is shown to demonstrate the possibilities of implementing a sequential algorithm on SQL.

Related

Progressive Select Query in Oracle Database

I want to write a select query that selects distinct rows of data progressively.
Explaining with an example,
Say i have 5000 accounts selected for repayment of loan, these accounts are ordered in descending order( Account 1st has highest outstanding while account 5000nd will have the lowest).
I want to select 1000 unique accounts 5 times such that the total outstanding amount of repayment in all 5 cases are similar.
i have tried out a few methods by trying to select rownums based on odd/even or other such way, but it's only good for upto 2 distributions. I was expecting more like a A.P. as in maths that selects data progressively.
A naïve method of splitting sets into (for example) 5 bins, numbered 0 to 4, is give each row a unique sequential numeric index and then, in order of size, assign the first 10 rows to bins 0,1,2,3,4,4,3,2,1,0 and then repeat for additional sets of 10 rows:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT bin,
COUNT(*) AS num_values,
SUM(value) AS bin_size
FROM assign_bins
GROUP BY bin
Which, for some random data:
CREATE TABLE table_name ( value ) AS
SELECT FLOOR(DBMS_RANDOM.VALUE(1, 1000001)) FROM DUAL CONNECT BY LEVEL <= 1000;
May output:
BIN
NUM_VALUES
BIN_SIZE
0
200
100012502
1
200
100004633
2
200
99980342
3
200
99976774
4
200
100005756
It will not get the bins to have equal values but it is relatively simple and will get a close approximation if your values are approximately evenly distributed.
If you want to select values from a certain bin then:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT value
FROM assign_bins
WHERE bin = 0
fiddle

Snowflake: Repeating rows based on column value

How to repeat rows based on column value in snowflake using sql.
I tried a few methods but not working such as dual and connect by.
I have two columns: Id and Quantity.
For each ID, there are different values of Quantity.
So if you have a count, you can use a generator:
with ten_rows as (
select row_number() over (order by null) as rn
from table(generator(ROWCOUNT=>10))
), data(id, count) as (
select * from values
(1,2),
(2,4)
)
SELECT
d.*
,r.rn
from data as d
join ten_rows as r
on d.count >= r.rn
order by 1,3;
ID
COUNT
RN
1
2
1
1
2
2
2
4
1
2
4
2
2
4
3
2
4
4
Ok let's start by generating some data. We will create 10 rows, with a QTY. The QTY will be randomly chosen as 1 or 2.
Next we want to duplicate the rows with a QTY of 2 and leave the QTY =1 as they are.
Obviously you can change all parameters above to suit your needs - this solution works super fast and in my opinion way better than table generation.
Simply stack SPLIT_TO_TABLE(), REPEAT() with a LATERAL() join and voila.
WITH TEN_ROWS AS (SELECT ROW_NUMBER()OVER(ORDER BY NULL)SOME_ID,UNIFORM(1,2,RANDOM())QTY FROM TABLE(GENERATOR(ROWCOUNT=>10)))
SELECT
TEN_ROWS.*
FROM
TEN_ROWS,LATERAL SPLIT_TO_TABLE(REPEAT('hire me $10/hour',QTY-1),'hire me $10/hour')ALTERNATIVE_APPROACH;

Alternative: Sql - SELECT rows until the sum of a row is a certain value

My question is very similar to my previous one posted here:
Sql - SELECT rows until the sum of a row is a certain value
To sum it up, I need to return the rows, until a certain sum is reached, but the difference this time, is that, I need to find the best fit for this sum, I mean, It doesn't have to be sequential. For example:
Let's say I have 5 unpaid receipts from customer 1:
Receipt_id: 1 | Amount: 110€
Receipt_id: 2 | Amount: 110€
Receipt_id: 3 | Amount: 130€
Receipt_id: 4 | Amount: 110€
Receipt_id: 5 | Amount: 190€
So, customer 1 ought to pay me 220€.
Now I need to select the receipts, until this 220€ sum is met and it might be in a straight order, like (receipt 1 + receipt 2) or not in a specific order, like (receipt 1 + receipt 4), any of these situations would be suitable.
I am using SQL Server 2016.
Any additional questions, feel free to ask.
Thanks in advance for all your help.
This query should solve it.
It is a quite dangerous query (containing a recursive CTE), so please be careful!
You can find some documentation here: https://www.essentialsql.com/recursive-ctes-explained/
WITH the_data as (
SELECT *
FROM (
VALUES (1, 1, 110),(1, 2,110),(1, 3,130),(1, 4,110),(1, 5,190),
(2, 1, 10),(2, 2,20),(2, 3,200),(2, 4,190)
) t (user_id, receipt_id, amount)
), permutation /* recursive used here */ as (
SELECT
user_id,
amount as sum_amount,
CAST(receipt_id as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id,
1 as i
FROM the_data
WHERE amount > 0 -- remove empty amount
UNION ALL
SELECT
the_data.user_id,
sum_amount + amount as sum_amount,
CAST(concat(visited_receipt_id, ',', CAST(receipt_id as varchar))as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id ,
i + 1
FROM the_data
JOIN permutation
ON the_data.user_id = permutation.user_id
WHERE i < 1000 -- max 1000 loops, means any permutation with less than 1000 different receipts
and receipt_id > max_receipt_id -- in order that sum in komutatif , we can check the sum in any unique order ( here we take the order of the reciept_id in fact we do not produce any duplicates )
-- AND sum_amount + amount <= 220 -- ignore everything that is bigger than the expected value (optional)
)
SELECT *
FROM permutation
WHERE sum_amount = 220
in order to select only one combination per user_id, replace the last three lines of the previous query by
SELECT *
FROM (
SELECT *, row_number() OVER (partition by user_id order by random() ) as r
FROM permutation
WHERE sum_amount = 220
) as t
WHERE r = 1
IF your target is to sum only 2 receipts in order to reach your value, this could be a solution:
DECLARE #TARGET INT = 220 --SET YOUR TARGET
, #DIFF INT
, #FIRSTVAL INT
SET #FIRSTVAL = (
SELECT TOP 1 AMOUNT
FROM myRECEIPTS
ORDER BY RECEIPT_ID ASC
)
SELECT TOP 1 *
FROM myRECEIPTS
WHERE AMOUNT = #TARGET - #FIRSTVAL
ORDER BY RECEIPT_ID ASC
this code will do it:
declare #sum1 int
declare #numrows int
set #numrows= 1
set #sum1 =0
while (#sum1 < 10)
begin
select top (#numrows) #sum1=sum(sum1) from receipts
set #numrows +=1
end
select top(#numrows) * from receipts

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

Find the longest sequence of consecutive increasing numbers in SQL

For this example say I have a table with two fields, AREA varchar(30) and OrderNumber INT.
The table has the following data
AREA | OrderNumber
Fontana | 32
Fontana | 42
Fontana | 76
Fontana | 12
Fontana | 3
Fontana | 99
RC | 32
RC | 1
RC | 8
RC | 9
RC | 4
I would like to return
The results I would like to return is for each area the longest length of increasing consecutive values. For Fontana it is 3 (32, 42, 76). For RC it is 2 (8,9)
AREA | LongestLength
Fontana | 3
RC | 2
How would I do this on MS Sql 2005?
One way is to use a recursive CTE that steps over each row. If the row meets the criteria (increasing order number for the same area), you increase the chain length by one. If it doesn't, you start a new chain:
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1
)
, recurse as
(
select rn
, area
, OrderNumber
, 1 as ChainLength
from numbered
where rn = 1
union all
select cur.rn
, cur.area
, cur.OrderNumber
, case
when cur.area = prev.area
and cur.OrderNumber > prev.OrderNumber
then prev.ChainLength + 1
else 1
end
from recurse prev
join numbered cur
on prev.rn + 1 = cur.rn
)
select area
, max(ChainLength)
from recurse
group by
area
Live example at SQL Fiddle.
An alternative way is to use a query to find "breaks", that is, rows that end a sequence of increasing order numbers for the same area. The number of rows between breaks is the length.
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1 t1
)
-- Select rows that break an increasing chain
, breaks as
(
select row_number() over (order by cur.rn) rn2
, cur.rn
, cur.Area
from numbered cur
left join
numbered prev
on cur.rn = prev.rn + 1
where cur.OrderNumber <= prev.OrderNumber
or cur.Area <> prev.Area
or prev.Area is null
)
-- Add a final break after the last row
, breaks2 as
(
select *
from breaks
union all
select count(*) + 1
, max(rn) + 1
, null
from breaks
)
select series_start.area
, max(series_end.rn - series_start.rn)
from breaks2 series_start
join breaks2 series_end
on series_end.rn2 = series_start.rn2 + 1
group by
series_start.area
Live example at SQL Fiddle.
You do not explain why RC's longest sequence does not include 1 while Fontana's does include 32. I take it, the 1 is excluded because it is a decrease: it comes after 32. The Fontana's 32, however, is the first ever item in the group, and I've got two ideas how to explain why it is considered an increase. That's either exactly because it's the group's first item or because it is also positive (i.e. as if coming after 0 and, therefore, an increase).
For the purpose of this answer, I'm assuming the latter, i.e. a group's first item is an increase if it is positive. The below script implements the following idea:
Enumerate the rows in every AREA group in the order of the eventtime column you nearly forgot to mention.
Join the enumerated set to itself to link every row with it's predecessor.
Get the sign of the difference between the row and its preceding value (defaulting the latter to 0). At this point the problem turns into a gaps-and-islands one.
Partition every AREA group by the signs determined in #3 and enumerate every subgroup's rows.
Find the difference between the row numbers from #1 and those found in #4. That would be a criterion to identify individual streaks (together with AREA).
Finally, group the results by AREA, the sign from #3 and the result from #5, count the rows and get the maximum count per AREA.
I implemented the above like this:
WITH enumerated AS (
SELECT
*,
row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
FROM atable
),
signed AS (
SELECT
this.eventtime,
this.AREA,
this.row,
sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
FROM enumerated AS this
LEFT JOIN enumerated AS last
ON this.AREA = last.AREA
AND this.row = last.row + 1
),
partitioned AS (
SELECT
AREA,
sgn,
grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
FROM signed
)
SELECT DISTINCT
AREA,
LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
FROM partitioned
WHERE sgn = 1
GROUP BY
AREA,
grp
;
A SQL Fiddle demo can be found here.
You can do some math by ROW_NUMBER() to figure out where you have consecutive items.
Here's the code sample:
;WITH rownums AS
(
SELECT [area],
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [ordernumber]) AS rid1,
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [eventtime]) AS rid2
FROM SomeTable
),
differences AS
(
SELECT [area],
[calc] = rid1 - rid2
FROM rownums
),
summation AS
(
SELECT [area], [calc], COUNT(*) AS lengths
FROM differences
GROUP BY [area], [calc]
)
SELECT [area], MAX(lengths) AS LongestLength
FROM differences
JOIN summation
ON differences.[calc] = summation.[calc]
AND differences.area = calc.area
GROUP BY [area]
So if I do one set of row numbers ordered by my ordernumber and another set of row numbers by my event time, the difference between those two numbers will always be the same, so long as their order is the same.
You can then get a count grouped by those differences and then pull the largest count to get what you need.
EDIT: ...
Ignore the first edit, what I get for rushing.