Query to transform data into opening-closing pair - sql

I have a table with 2 columns i.e. recordNum(int) and qty (float) and data in that table is like following:
and I am trying to come up with query which can transform it to following:
I am using SQL Server 2012.
We have to prepare a opening-closing record number pair. The logic is as follows:
1st row: 1001 record num is opening record with quantity = 100
2nd row: 50 out of 100 quantity is closed by record num 1002
3rd row: remaining 50 quantity of 1001 is closed by 1003(which is of quantity 60)
4th row: after closing 50, 1003 will still have -10 quantity available.
As per request, I am adding my attempt to description:
select
t.openRecordNum
,t.closingRecordNum
,t.qty
from
(
select *
, openRecordNum = case when t.openQ>0 then min(t.recordNumber) over( order by seqId)
else
max(t.recordNumber) over( order by seqId)
end
, closingRecordNum = Max(t.recordNumber) over( order by seqId)
from
(
select
*
,openQ = sum(qty) over( order by seqId)
from Table_1
) t
) t
But I am nowhere close to desired result. I am getting following:

Related

Progressive Select Query in Oracle Database

I want to write a select query that selects distinct rows of data progressively.
Explaining with an example,
Say i have 5000 accounts selected for repayment of loan, these accounts are ordered in descending order( Account 1st has highest outstanding while account 5000nd will have the lowest).
I want to select 1000 unique accounts 5 times such that the total outstanding amount of repayment in all 5 cases are similar.
i have tried out a few methods by trying to select rownums based on odd/even or other such way, but it's only good for upto 2 distributions. I was expecting more like a A.P. as in maths that selects data progressively.
A naïve method of splitting sets into (for example) 5 bins, numbered 0 to 4, is give each row a unique sequential numeric index and then, in order of size, assign the first 10 rows to bins 0,1,2,3,4,4,3,2,1,0 and then repeat for additional sets of 10 rows:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT bin,
COUNT(*) AS num_values,
SUM(value) AS bin_size
FROM assign_bins
GROUP BY bin
Which, for some random data:
CREATE TABLE table_name ( value ) AS
SELECT FLOOR(DBMS_RANDOM.VALUE(1, 1000001)) FROM DUAL CONNECT BY LEVEL <= 1000;
May output:
BIN
NUM_VALUES
BIN_SIZE
0
200
100012502
1
200
100004633
2
200
99980342
3
200
99976774
4
200
100005756
It will not get the bins to have equal values but it is relatively simple and will get a close approximation if your values are approximately evenly distributed.
If you want to select values from a certain bin then:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT value
FROM assign_bins
WHERE bin = 0
fiddle

How to assign a running value to a row depending on some conditions

I'm trying to give a certain number of point to my row depending on some condition, I can't figure out how to write it.
For a given Id, FormatCode and Price I want to give the value 1.
For subsequent rows, for the same Id and Price, if the FormatCode is a multiple of a previous row with the same Id and price, I want to give the same value.
For instance :
00010405, 100, 0.3218 = 1
00010405, 400, 0.3218 = 1 (400 % 100 = 0)
00010405, 500, 0.3126 = 2 (500 % 100 = 0, but the price is different)
00010405, 1000, 0.3126 = 2 (1000 % 500 and 1000 % 100 = 0, but the price of the format code 100 is different, hence it will take the value 2 because it has the same price)
Id
Format Code
Price
Value
Row
00010405
100
0.3218
1
1
00010405
400
0.3218
1
2
00010405
500
0.3126
2
3
00010405
1000
0.3126
2
4
SELECT
Id,
FormatCode,
Price,
Value,
ROW_NUMER() OVER (
PARTITION BY Id
ORDER BY FormatCode
)
FROM Table
If I understand you correctly, you want to prioritise formatCode multiplier over price.
Instead of ROW_NUMBER you should take a look at DENSE_RANK.
The description is still a bit unclear to me, but if I understood it correctly here is a working example:
--Setup some sample data
drop table if exists #tmp
select *
into #tmp
from (values
('00010405',100, 0.3218 ),
('00010405',400, 0.3218 ),
('00010405',500, 0.3126 ),
('00010405',1000, 0.3126 ),
('00010405',1333, 0.3126 ),--not a multiple
('00010405',2666, 0.3126 )--multiple of previous row
) as tab(id, formatcode, price)
--Make the calculation
select
t.id,
t.formatcode,
t.price,
DENSE_RANK() over(partition by id order by minMulti.formatCodeMin_multiplier, t.price) as Value
from #tmp t
cross apply(
select min(formatCode) as formatCodeMin_multiplier
from #tmp t2
where t.id = t2.id and t.price = t2.price
and t.formatcode % t2.formatcode = 0
) as minMulti
order by id, formatcode
The trick is to find the formatcode with the lowest value where the current row's value is a multiplier of.

Alternative: Sql - SELECT rows until the sum of a row is a certain value

My question is very similar to my previous one posted here:
Sql - SELECT rows until the sum of a row is a certain value
To sum it up, I need to return the rows, until a certain sum is reached, but the difference this time, is that, I need to find the best fit for this sum, I mean, It doesn't have to be sequential. For example:
Let's say I have 5 unpaid receipts from customer 1:
Receipt_id: 1 | Amount: 110€
Receipt_id: 2 | Amount: 110€
Receipt_id: 3 | Amount: 130€
Receipt_id: 4 | Amount: 110€
Receipt_id: 5 | Amount: 190€
So, customer 1 ought to pay me 220€.
Now I need to select the receipts, until this 220€ sum is met and it might be in a straight order, like (receipt 1 + receipt 2) or not in a specific order, like (receipt 1 + receipt 4), any of these situations would be suitable.
I am using SQL Server 2016.
Any additional questions, feel free to ask.
Thanks in advance for all your help.
This query should solve it.
It is a quite dangerous query (containing a recursive CTE), so please be careful!
You can find some documentation here: https://www.essentialsql.com/recursive-ctes-explained/
WITH the_data as (
SELECT *
FROM (
VALUES (1, 1, 110),(1, 2,110),(1, 3,130),(1, 4,110),(1, 5,190),
(2, 1, 10),(2, 2,20),(2, 3,200),(2, 4,190)
) t (user_id, receipt_id, amount)
), permutation /* recursive used here */ as (
SELECT
user_id,
amount as sum_amount,
CAST(receipt_id as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id,
1 as i
FROM the_data
WHERE amount > 0 -- remove empty amount
UNION ALL
SELECT
the_data.user_id,
sum_amount + amount as sum_amount,
CAST(concat(visited_receipt_id, ',', CAST(receipt_id as varchar))as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id ,
i + 1
FROM the_data
JOIN permutation
ON the_data.user_id = permutation.user_id
WHERE i < 1000 -- max 1000 loops, means any permutation with less than 1000 different receipts
and receipt_id > max_receipt_id -- in order that sum in komutatif , we can check the sum in any unique order ( here we take the order of the reciept_id in fact we do not produce any duplicates )
-- AND sum_amount + amount <= 220 -- ignore everything that is bigger than the expected value (optional)
)
SELECT *
FROM permutation
WHERE sum_amount = 220
in order to select only one combination per user_id, replace the last three lines of the previous query by
SELECT *
FROM (
SELECT *, row_number() OVER (partition by user_id order by random() ) as r
FROM permutation
WHERE sum_amount = 220
) as t
WHERE r = 1
IF your target is to sum only 2 receipts in order to reach your value, this could be a solution:
DECLARE #TARGET INT = 220 --SET YOUR TARGET
, #DIFF INT
, #FIRSTVAL INT
SET #FIRSTVAL = (
SELECT TOP 1 AMOUNT
FROM myRECEIPTS
ORDER BY RECEIPT_ID ASC
)
SELECT TOP 1 *
FROM myRECEIPTS
WHERE AMOUNT = #TARGET - #FIRSTVAL
ORDER BY RECEIPT_ID ASC
this code will do it:
declare #sum1 int
declare #numrows int
set #numrows= 1
set #sum1 =0
while (#sum1 < 10)
begin
select top (#numrows) #sum1=sum(sum1) from receipts
set #numrows +=1
end
select top(#numrows) * from receipts

SQL Implementation with Sliding-Window functions or Recursive CTEs

I have a problem that it's very easy to be solved in C# code for example, but I have no idea how to write in a SQL query with Recursive CTE-s or Sliding-Windows functions.
Here is the situation: let's say I have a table with 3 columns (ID, Date, Amount), and here is some data:
ID Date Amount
-----------------------
1 01.01.2016 -500
2 01.02.2016 1000
3 01.03.2016 -200
4 01.04.2016 300
5 01.05.2016 500
6 01.06.2016 1000
7 01.07.2016 -100
8 01.08.2016 200
The result I want to get from the table is this (ID, Amount .... Order By Date):
ID Amount
-----------------------
2 300
4 300
5 500
6 900
8 200
The idea is to distribute the amounts into installments, for each client separately, but the thing is when negative amount comes into play you need to remove amount from the last installment. I don't know how clear I am, so here is an example:
Let's say I have 3 Invoices for one client with amounts 500, 200, -300.
If i start distribute these Invoices, first i distribute the amount 500, then 200. But when i come to the third one -300, then i need to remove from the last Invoice. In other words 200 - 300 = -100, so the amount from second Invoice will disappear, but there are still -100 that needs to be substracted from first Invoice. So 500 - 100 = 400. The result i need is result with one row (first invoice with amount 400)
Another example when the first invoice is with negative amount (-500, 300, 500).
In this case, the first (-500) invoice will make the second disappear and another 200 will be substracted from the third. So the result will be: Third Invoice with amount 300.
This is something like Stack implementation in programming language, but i need to make it with sliding-window functions in SQL Server.
I need an implementation with Sliding Function or Recursive CTEs.
Not with cycles ...
Thanks.
Ok, think this is what you want. there are two recursive queries. One for upward propagation and the second one for the downward propagation.
with your_data_rn as
(
select *, row_number() over (order by date) rn
from your_data
), rcte_up(id, date, ammount, rn, running) as
(
select *, ammount as running
from your_data_rn
union all
select d.*,
d.ammount + rcte_up.running
from your_data_rn d
join rcte_up on rcte_up.running < 0 and d.rn = rcte_up.rn - 1
), data2 as
(
select id, date, min(running) ammount,
row_number() over (order by date) rn
from rcte_up
group by id, date, rn
having min(running) > 0 or rn = 1
), rcte_down(id, date, ammount, rn, running) as
(
select *, ammount as running
from data2
union all
select d.*, d.ammount + rcte_down.running
from data2 d
join rcte_down on rcte_down.running < 0 and d.rn = rcte_down.rn + 1
)
select id, date, min(running) ammount
from rcte_down
group by id, date
having min(running) > 0
demo
I can imagine that you use just the upward propagation and the downward propagation of the first row is done in some procedural language. Downward propagation is one scan through few first rows, therefore, the recursive query may be a hammer on a mosquito.
I add client ID in table for more general solution. Then I implemented the stack stored as XML in query field. And emulated a program cycle with Recursive-CTE:
with Data as( -- Numbering rows for iteration on CTE
select Client, id, Amount,
cast(row_number() over(partition by Client order by Date) as int) n
from TabW
),
CTE(Client, n, stack) as( -- Recursive CTE
select Client, 1, cast(NULL as xml) from Data where n=1
UNION ALL
select D.Client, D.n+1, (
-- Stack operations to process current row (D)
select row_number() over(order by n) n,
-- Use calculated amount in first positive and oldest stack cell
-- Else preserve value stored in stack
case when n=1 or (n=0 and last=1) then new else Amount end Amount,
-- Set ID in stack cell for positive and new data
case when n=1 and D.Amount>0 then D.id else id end id
from (
select Y.Amount, Y.id, new,
-- Count positive stack entries
sum(case when new<=0 or (n=0 and Amount<0) then 0 else 1 end) over (order by n) n,
row_number() over(order by n desc) last -- mark oldest stack cell by 1
from (
select X.*,sum(Amount) over(order by n) new
from (
select case when C.stack.value('(/row/#Amount)[1]','int')<0 then -1 else 0 end n,
D.Amount, D.id -- Data from new record
union all -- And expand current stack in XML to table
select node.value('#n','int') n, node.value('#Amount','int'), node.value('#id','int')
from C.stack.nodes('//row') N(node)
) X
) Y where n>=0 -- Suppress new cell if the stack contained a negative amount
) Z
where n>0 or (n=0 and last=1)
for xml raw, type
)
from Data D, CTE C
where D.n=C.n and D.Client=C.Client
) -- Expand stack into result table
select CTE.Client, node.value('#id','int') id, node.value('#Amount','int')
from CTE join (select Client, max(n) max_n from Data group by Client) X on CTE.Client=X.Client and CTE.n=X.max_n+1
cross apply stack.nodes('//row') N(node)
order by CTE.Client, node.value('#n','int') desc
Test on sqlfiddle.com
I think this method is slower than #RadimBača. And it is shown to demonstrate the possibilities of implementing a sequential algorithm on SQL.

Find the longest sequence of consecutive increasing numbers in SQL

For this example say I have a table with two fields, AREA varchar(30) and OrderNumber INT.
The table has the following data
AREA | OrderNumber
Fontana | 32
Fontana | 42
Fontana | 76
Fontana | 12
Fontana | 3
Fontana | 99
RC | 32
RC | 1
RC | 8
RC | 9
RC | 4
I would like to return
The results I would like to return is for each area the longest length of increasing consecutive values. For Fontana it is 3 (32, 42, 76). For RC it is 2 (8,9)
AREA | LongestLength
Fontana | 3
RC | 2
How would I do this on MS Sql 2005?
One way is to use a recursive CTE that steps over each row. If the row meets the criteria (increasing order number for the same area), you increase the chain length by one. If it doesn't, you start a new chain:
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1
)
, recurse as
(
select rn
, area
, OrderNumber
, 1 as ChainLength
from numbered
where rn = 1
union all
select cur.rn
, cur.area
, cur.OrderNumber
, case
when cur.area = prev.area
and cur.OrderNumber > prev.OrderNumber
then prev.ChainLength + 1
else 1
end
from recurse prev
join numbered cur
on prev.rn + 1 = cur.rn
)
select area
, max(ChainLength)
from recurse
group by
area
Live example at SQL Fiddle.
An alternative way is to use a query to find "breaks", that is, rows that end a sequence of increasing order numbers for the same area. The number of rows between breaks is the length.
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1 t1
)
-- Select rows that break an increasing chain
, breaks as
(
select row_number() over (order by cur.rn) rn2
, cur.rn
, cur.Area
from numbered cur
left join
numbered prev
on cur.rn = prev.rn + 1
where cur.OrderNumber <= prev.OrderNumber
or cur.Area <> prev.Area
or prev.Area is null
)
-- Add a final break after the last row
, breaks2 as
(
select *
from breaks
union all
select count(*) + 1
, max(rn) + 1
, null
from breaks
)
select series_start.area
, max(series_end.rn - series_start.rn)
from breaks2 series_start
join breaks2 series_end
on series_end.rn2 = series_start.rn2 + 1
group by
series_start.area
Live example at SQL Fiddle.
You do not explain why RC's longest sequence does not include 1 while Fontana's does include 32. I take it, the 1 is excluded because it is a decrease: it comes after 32. The Fontana's 32, however, is the first ever item in the group, and I've got two ideas how to explain why it is considered an increase. That's either exactly because it's the group's first item or because it is also positive (i.e. as if coming after 0 and, therefore, an increase).
For the purpose of this answer, I'm assuming the latter, i.e. a group's first item is an increase if it is positive. The below script implements the following idea:
Enumerate the rows in every AREA group in the order of the eventtime column you nearly forgot to mention.
Join the enumerated set to itself to link every row with it's predecessor.
Get the sign of the difference between the row and its preceding value (defaulting the latter to 0). At this point the problem turns into a gaps-and-islands one.
Partition every AREA group by the signs determined in #3 and enumerate every subgroup's rows.
Find the difference between the row numbers from #1 and those found in #4. That would be a criterion to identify individual streaks (together with AREA).
Finally, group the results by AREA, the sign from #3 and the result from #5, count the rows and get the maximum count per AREA.
I implemented the above like this:
WITH enumerated AS (
SELECT
*,
row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
FROM atable
),
signed AS (
SELECT
this.eventtime,
this.AREA,
this.row,
sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
FROM enumerated AS this
LEFT JOIN enumerated AS last
ON this.AREA = last.AREA
AND this.row = last.row + 1
),
partitioned AS (
SELECT
AREA,
sgn,
grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
FROM signed
)
SELECT DISTINCT
AREA,
LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
FROM partitioned
WHERE sgn = 1
GROUP BY
AREA,
grp
;
A SQL Fiddle demo can be found here.
You can do some math by ROW_NUMBER() to figure out where you have consecutive items.
Here's the code sample:
;WITH rownums AS
(
SELECT [area],
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [ordernumber]) AS rid1,
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [eventtime]) AS rid2
FROM SomeTable
),
differences AS
(
SELECT [area],
[calc] = rid1 - rid2
FROM rownums
),
summation AS
(
SELECT [area], [calc], COUNT(*) AS lengths
FROM differences
GROUP BY [area], [calc]
)
SELECT [area], MAX(lengths) AS LongestLength
FROM differences
JOIN summation
ON differences.[calc] = summation.[calc]
AND differences.area = calc.area
GROUP BY [area]
So if I do one set of row numbers ordered by my ordernumber and another set of row numbers by my event time, the difference between those two numbers will always be the same, so long as their order is the same.
You can then get a count grouped by those differences and then pull the largest count to get what you need.
EDIT: ...
Ignore the first edit, what I get for rushing.