Sliding window functions in SQL Server, advanced calculation - sql

I have a problem that it's very easy to be solved in C# code for example, but I have no idea how to write in a SQL query.
Here is the situation: let's say I have a table with 3 columns (ID, Date, Amount), and here is some data:
ID Date Amount
-----------------------
1 01.01.2016 -500
2 01.02.2016 1000
3 01.03.2016 -200
4 01.04.2016 300
5 01.05.2016 500
6 01.06.2016 1000
7 01.07.2016 -100
8 01.08.2016 200
The result I want to get from the table is this (ID, Amount .... Order By Date):
ID Amount
-----------------------
2 300
4 300
5 500
6 900
8 200
The idea is to distribute the amounts into installments, but the thing is when negative amount comes into play you need to remove amount from the last installment. I don't know how clear I am, so here is an example:
Let's say I have 3 Invoices with amounts 500, 200, -300.
If i start distribute these Invoices, first i distribute the amount 500, then 200. But when i come to the third one -300, then i need to remove from the last Invoice. In other workds 200 - 300 = -100, so the amount from second Invoice will disappear, but there are still -100 that needs to be substracted from first Invoice. So 500 - 100 = 400. The result i need is dataset with one row (first invoice with amount 400)
Another example when the first invoice is with negative amount (-500, 300, 500).
In this case, the first (-500) invoice will make the second disappear and another 200 will be substracted from the third. So the result will be: Third Invoice with amount 300.
This is something like Stack implementation in programming language, but i need to make it with sliding-window functions in SQL Server.
If anyone have any idea, please share.
Thanks.

I solved it using TSQL. But I think what this task also can solve using recursive CTE.
I used ID for finding a prev or next row.
-- create and fill test table
CREATE TABLE Invoices(
ID int,
[Date] date,
Amount float
)
INSERT Invoices(ID,Date,Amount) VALUES
(1,'20160101', -500),
(2,'20160201', 1000),
(3,'20160301', -200),
(4,'20160401', 300),
(5,'20160501', 500),
(6,'20160601', 1000),
(7,'20160701', -100),
(8,'20160801', 200)
My solution
-- copy all the data into temp table
SELECT *
INTO #Invoices
FROM Invoices
DECLARE
#nID int,
#nAmount float,
#pID int
-- run infinity loop
WHILE 1=1
BEGIN
-- set all the variables to NULL
SET #nID=NULL
SET #nAmount=NULL
SET #pID=NULL
-- get data from the last negative row
SELECT
#nID=ID,
#nAmount=Amount
FROM
(
SELECT TOP 1 *
FROM #Invoices
WHERE Amount<0
ORDER BY ID DESC
) q
-- get prev positive row
SELECT #pID=ID
FROM
(
SELECT TOP 1 *
FROM #Invoices
WHERE ID<#nID
AND Amount>0
ORDER BY ID DESC
) q
IF(#pID IS NULL)
BEGIN
-- get next positive row
SELECT #pID=ID
FROM
(
SELECT TOP 1 *
FROM #Invoices
WHERE ID>#nID
AND Amount>0
ORDER BY ID
) q
END
-- exit from loop
IF(#pID IS NULL) BREAK
-- substract amount from positive row
UPDATE #Invoices
SET
Amount+=#nAmount
WHERE ID=#pID
-- delete used negative row
DELETE #Invoices
WHERE ID=#nID
END
-- show result
SELECT *
FROM #Invoices
DROP TABLE #Invoices

Related

Given a table of numbers, can I get all the rows which add up to less than or equal to a number?

Say I have a table with an incrementing id column and a random positive non zero number.
id
rand
1
12
2
5
3
99
4
87
Write a query to return the rows which add up to a given number.
A couple rules:
Rows must be "consumed" in order, even if a later row makes it a a perfect match. For example, querying for 104 would be a perfect match for rows 1, 2, and 4 but rows 1-3 would still be returned.
You can use a row partially if there is more available than is necessary to add up to whatever is leftover on the number E.g. rows 1, 2, and 3 would be returned if your max number is 50 because 12 + 5 + 33 equals 50 and 90 is a partial result.
If there are not enough rows to satisfy the amount, then return ALL the rows. E.g. in the above example a query for 1,000 would return rows 1-4. In other words, the sum of the rows should be less than or equal to the queried number.
It's possible for the answer to be "no this is not possible with SQL alone" and that's fine but I was just curious. This would be a trivial problem with a programming language but I was wondering what SQL provides out of the box to do something as a thought experiment and learning exercise.
You didn't mention which RDBMS, but assuming SQL Server:
DROP TABLE #t;
CREATE TABLE #t (id int, rand int);
INSERT INTO #t (id,rand)
VALUES (1,12),(2,5),(3,99),(4,87);
DECLARE #target int = 104;
WITH dat
AS
(
SELECT id, rand, SUM(rand) OVER (ORDER BY id) as runsum
FROM #t
),
dat2
as
(
SELECT id, rand
, runsum
, COALESCE(LAG(runsum,1) OVER (ORDER BY id),0) as prev_runsum
from dat
)
SELECT id, rand
FROM dat2
WHERE #target >= runsum
OR #target BETWEEN prev_runsum AND runsum;

Limit rows by first two dates within column. How?

Is it possible to query thats only first and second date of the customer? I tried doing the UP TO 2 ROWS but it only limits the table only to 2 rows.
SELECT knvv~kunnr vbak~vbeln vbak~erdat FROM vbak INNER JOIN knvv ON vbak~kunnr = knvv~kunnr.
The sample result of the above query would be:
Customer no. Document No Date
1 100000 01/01/18
1 200000 01/02/18
1 300000 01/03/18
1 400000 01/04/18
2 100001 01/01/18
2 200000 01/04/18
2 100040 01/06/18
But what i need that it only limits the first two dates per customer. The result must be like this. It should only get like the first two dates of each customer just like the result below. Is it possible to do it in the query?
Customer no. Document No Date
1 100000 01/01/18
1 200000 01/02/18
2 100001 01/01/18
2 200000 01/04/18
SELECT CustomerNo,DocumentNo,Date,(#Count:= if(#TempID - CustomerNo = 0,#Count +1,1)) Counter,(#TempID:=CustomerNo) Tempid
FROM vbak, (Select #Count:=0) counter, (Select #TempID:=0) tempid
having Counter<= 2 order by CustomerNo;
you can try this. Basically I declared 2 variables (#Count and #TempID) and both set as 0.
Initially for the first row, #TempID - CustomerNo = -1 makes the condition false and sets it to 1 rather then increment it. Then, #TempID is set to the current CustomerNo of that row.
The next row would produce #TempID - CustomerNo = 0 and causes the condition to be true and increment #Count + 1.
So on and so forth,
The Having Statement selects Counter that is less or equal to 2 which then returns the desired results.
hopefully this would give you an idea for your application.
I couldn't find a way to do this with a single query in OpenSQL. It just doesn't seem to offer the kind of sub-query or window function that would be required.
However, I noticed that you added the hana tag. With SAP HANA, this can be quite easily realized with an ABAP-Managed Database Procedure (AMDP) or an equivalent scripted Calculation View:
METHOD select BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT
USING vbak.
lt_first_dates = SELECT kunnr,
min(erdat) AS erdat
FROM vbak
GROUP BY kunnr;
lt_second_dates = SELECT kunnr,
min(erdat) AS erdat
FROM vbak
WHERE (kunnr, erdat) NOT IN ( SELECT * FROM :lt_first_dates )
GROUP BY kunnr;
lt_first_two_dates = SELECT * FROM :lt_first_dates
UNION
SELECT * FROM :lt_second_dates;
et_result = SELECT src.kunnr,
src.vbeln,
src.erdat
FROM vbak AS src
WHERE (kunnr, erdat) IN ( SELECT * FROM :lt_first_two_dates )
ORDER BY kunnr, vbeln, erdat;
ENDMETHOD.

Complicate SQL Amount split by percentage in same row transpose (Pivot)?

I am struggling to split total amount field into percentage in the same row and then update the last column with Amount type for which the percentage is applied.
Example data
Total Amount | UF% | UFI% |RA% |RL% |NP% | AmountType
100 |0.00 |20 |9.15 |0.75 |70.01
1520.23 |64.4 |19.1 |15.5 |0.25 |0.75
158520.03|13.25 |35 |2.25 |19.28 |30.22
I have to get percentage of total amount column and then transpose insert them as additional rows in the same table and upate the last column what type of amount it is.
For example for 1st row I can get 5 new rows
Total Amount Amount type
0 UF%
20 UFI%
9.15 RA%
0.75 RL%
70.01 NP%
I am one step at a time to I have created 5 new columns to calculate the percentage as TotalAmount UF%, TotalAmount UFI%, TotalAmountRA% and so on…
Selec t [Total Amount]* UF% as [TotalAmount UF%] … and so on.
I am stuck here shall I use Pivot/unpivot? Or case ?
Or is it any other easier way to use row over partition by ?
Please suggest.
this should work for you. Just copy this into an empty query window and execute. Adapt to your needs...
EDIT: Calculate percentages...
declare #amounts table (TotalAmount decimal(8,2),[UF%] decimal(4,2), [UFI%] decimal(4,2)
,[RA%] decimal(4,2),[RL%] decimal(4,2)
,[NP%] decimal(4,2));
insert into #amounts values
(100,0.00,20,9.15,0.75,70.01)
,(1520.23,64.4,19.1,15.5,0.25,0.75)
,(158520.03,13.25,35,2.25,19.28,30.22);
select up.TotalAmount
,up.Percentag
,(up.TotalAmount/100)*up.Percentag AS AmountPercentage
,up.Amount AS AmountType
from
(
select *
from #amounts
) AS tbl
unpivot
(
Percentag FOR Amount IN([UF%],[UFI%],[RA%],[RL%],[NP%])
) AS up

SQL Rounding Percentages to make the sum 100% - 1/3 as 0.34, 0.33, 0.33

I am currently trying to split one value with percentage column. But as most of percentages values are 1/3, I am not able to get aboslute 100% with two decimal points in the value. For example:
Product Supplier percentage totalvalue customer_split
decimal(15,14) (decimal(18,2) decimal(18,2)
-------- -------- ------------ --------------- ---------------
Product1 Supplier1 0.33 10.00 3.33
Product1 Supplier2 0.33 10.00 3.33
Product1 Supplier3 0.33 10.00 3.33
So, here we are missing 0.01 in the value column and suppliers would like to put this missing 0.01 value against any one of the supplier randomly. I have been trying to get this done in a two sets of SQLs with temporary tables, but is there any simple way of doing this. If possible how can I get 0.34 in the percentage column itself for one of the above rows? 0.01 is negligible value, but when the value column is 1000000000 it is significant.
It sounds like you're doing some type of "allocation" here. This is a common problem any time you are trying to allocate something from a higher granulartiy to a lower granularity, and you need to be able to re-aggregate to the total value correctly.
This becomes a much bigger problem when dealing with larger fractions.
For example, if I try to divide a total value of, say $55.30 by eight, I get a decimal value of $6.9125 for each of the eight buckets. Should I round one to $6.92 and the rest to $6.91? If I do, I will lose a cent. I would have to round one to $6.93 and the others to $6.91. This gets worse as you add more buckets to divide by.
In addition, when you start to round, you introduce problems like "Should 33.339 be rounded to 33.34 or 33.33?"
If your business logic is such that you just want to take whatever remainder beyond 2 significant digits may exist and add it to one of the dollar values "randomly" so you don't lose any cents, #Diego is on the right track with this.
Doing it in pure SQL is a bit more difficult. For starters, your percentage isn't 1/3, it's .33, which will yield a total value of 9.9, not 10. I would either store this as a ratio or as a high-precision decimal field (.33333333333333).
P S PCT Total
-- -- ------------ ------
P1 S1 .33333333333 10.00
P2 S2 .33333333333 10.00
P3 S3 .33333333333 10.00
SELECT
BaseTable.P, BaseTable.S,
CASE WHEN BaseTable.S = TotalTable.MinS
THEN BaseTable.BaseAllocatedValue + TotalTable.Remainder
ELSE BaseTable.BaseAllocatedValue
END As AllocatedValue
FROM
(SELECT
P, S, FLOOR((PCT * Total * 100)) / 100 as BaseAllocatedValue,
FROM dataTable) BaseTable
INNER JOIN
(SELECT
P, MIN(S) AS MinS,
SUM((PCT * Total) - FLOOR((PCT * Total * 100)) / 100) as Remainder,
FROM dataTable
GROUP BY P) as TotalTable
ON (BaseTable.P = TotalTable.P)
It appears your calculation is an equal distribution based on the total number of products per supplier. If it is, it may be advantageous to remove the percentage and instead just store the count of items per supplier in the table.
If it is also possible to store a flag indicating the row that should get the remainder value applied to it, you could assign based on that flag instead of randomly.
run this, it will give an idea on how you can solve your problem.
I created a table called orders just with an ID to be easy to understand:
create table orders(
customerID int)
insert into orders values(1)
go 3
insert into orders values(2)
go 3
insert into orders values(3)
go 3
these values represent the 33% you have
1 33.33
2 33.33
3 33.33
now:
create table #tempOrders(
customerID int,
percentage numeric(10,2))
declare #maxOrder int
declare #maxOrderID int
select #maxOrderID = max(customerID) from orders
declare #total numeric(10,2)
select #total =count(*) from orders
insert into #tempOrders
select customerID, cast(100*count(*)/#total as numeric(10,2)) as Percentage
from orders
group by customerID
update #tempOrders set percentage = percentage + (select 100-sum(Percentage) from #tempOrders)
where customerID =#maxOrderID
this code will basically calculate the percentage and the order with the max ID, then it gets the diference from 100 to the percentage sum and add it to the order with the maxID (your random order)
select * from #tempOrders
1 33.33
2 33.33
3 33.34
This should be an easy task using Windowed Aggregate Functions. You probably use them already for the calculation of customer_split:
totalvalue / COUNT(*) OVER (PARTITION BY Product) as customer_split
Now sum up the customer_splits and if there's a difference to total value add (or substract) it to one random row.
SELECT
Product
,Supplier
,totalvalue
,customer_split
+ CASE
WHEN COUNT(*)
OVER (PARTITION BY Product
ROWS UNBOUNDED PRECEDING) = 1 -- get a random row, using row_number/order you might define a specific row
THEN totalvalue - SUM(customer_split)
OVER (PARTITION BY Product)
ELSE 0
END
FROM
(
SELECT
Product
,Supplier
,totalvalue
,totalvalue / COUNT(*) OVER (PARTITION BY Product) AS customer_split
FROM dropme
) AS dt
After more than one trial and test i think i found better solution
Idea
Get Count of all(Count(*)) based on your conditions
Get Row_Number()
Check if (Row_Number() value < Count(*))
Then select round(curr_percentage,2)
Else
Get sum of all other percentage(with round) and subtract it from 100
This steps will select current percentage every time EXCEPT Last one will be
100 - the sum of all other percentages
this is part of my code
Select your_cols
,(Select count(*) from [tbl_Partner_Entity] pa_et where [E_ID] =#E_ID)
AS cnt_all
,(ROW_NUMBER() over ( order by pe.p_id)) as row_num
,Case when (
(ROW_NUMBER() over ( order by pe.p_id)) <
(Select count(*) from [tbl_Partner_Entity] pa_et where [E_ID] =#E_ID))
then round(([partnership_partners_perc]*100),2)
else
100-
((select sum(round(([partnership_partners_perc]*100),2)) FROM [dbo].
[tbl_Partner_Entity] PEE where [E_ID] =#E_ID and pee.P_ID != pe.P_ID))
end AS [partnership_partners_perc_Last]
FROM [dbo].[tbl_Partner_Entity] PE
where [E_ID] =#E_ID

Giving Range to the SQL Column

I have SQL table in which I have column and Probability . I want to select one row from it with randomly but I want to give more chances to the more waighted probability. I can do this by
Order By abs(checksum(newid()))
But the difference between Probabilities are too much so it gives more chance to highest probability.Like After picking 74 times that value it pick up another value for once than again around 74 times.I want to reduce this .Like I want 3-4 times to it and than others and all. I am thinking to give Range to the Probabilies.Its Like
Row[i] = Row[i-1]+Row[i]
How can I do this .Do I need to create function?Is there any there any other way to achieve this.I am neewby.Any help will be appriciated.Thank You
EDIT:
I have solution of my problem . I have one question .
if I have table as follows.
Column1 Column2
1 50
2 30
3 20
can i get?
Column1 Column2 Column3
1 50 50
2 30 80
3 20 100
Each time I want to add value with existing one.Is there any Way?
UPDATE:
Finally get the solution after 3 hours,I just take square root of my probailities that way I can narrow the difference bw them .It is like I add column with
sqrt(sqrt(sqrt(Probability)))....:-)
I'd handle it by something like
ORDER BY rand()*pow(<probability-field-name>,<n>)
for different values of n you will distort the linear probabilities into a simple polynomial. Small values of n (e.g. 0.5) will compress the probabilities to 1 and thus make less probable choices more probable, big values of n (e.g. 2) will do the opposite and further reduce probability of already inprobable values.
Since the difference in probabilities is too great, you need to add a computed field with a revised weighting that has a more even probability distribution. How you do that depends on your data and preferred distribution. One way to do it is to "normalize" the weighting to an integer between 1 and 10 so that the lowest probability is never more than ten times smaller than the highest.
Answer to your recent question:
SELECT t.Column1,
t.Column2,
(SELECT SUM(Column2)
FROM table t2
WHERE t2.Column1 <= t.Column1) Column3
FROM table t
Here is a basic example how to select one row from the table with taking into account the assigned row weights.
Suppose we have table:
CREATE TABLE TableWithWeights(
Id int NOT NULL PRIMARY KEY,
DataColumn nvarchar(50) NOT NULL,
Weight decimal(18, 6) NOT NULL -- Weight column
)
Let's fill table with sample data.
INSERT INTO TableWithWeights VALUES(1, 'Frequent', 50)
INSERT INTO TableWithWeights VALUES(2, 'Common', 30)
INSERT INTO TableWithWeights VALUES(3, 'Rare', 20)
This is the query that returns one random row with taking into account given row weights.
SELECT * FROM
(SELECT tww1.*, -- Select original table data
-- Add column with the sum of all weights of previous rows
(SELECT SUM(tww2.Weight)- tww1.Weight
FROM TableWithWeights tww2
WHERE tww2.id <= tww1.id) as SumOfWeightsOfPreviousRows
FROM TableWithWeights tww1) as tww,
-- Add column with random number within the range [0, SumOfWeights)
(SELECT RAND()* sum(weight) as rnd
FROM TableWithWeights) r
WHERE
(tww.SumOfWeightsOfPreviousRows <= r.rnd)
and ( r.rnd < tww.SumOfWeightsOfPreviousRows + tww.Weight)
To check query results we can run it for 100 times.
DECLARE #count as int;
SET #count = 0;
WHILE ( #count < 100)
BEGIN
-- This is the query that returns one random row with
-- taking into account given row weights
SELECT * FROM
(SELECT tww1.*, -- Select original table data
-- Add column with the sum of all weights of previous rows
(SELECT SUM(tww2.Weight)- tww1.Weight
FROM TableWithWeights tww2
WHERE tww2.id <= tww1.id) as SumOfWeightsOfPreviousRows
FROM TableWithWeights tww1) as tww,
-- Add column with random number within the range [0, SumOfWeights)
(SELECT RAND()* sum(weight) as rnd
FROM TableWithWeights) r
WHERE
(tww.SumOfWeightsOfPreviousRows <= r.rnd)
and ( r.rnd < tww.SumOfWeightsOfPreviousRows + tww.Weight)
-- Increase counter
SET #count += 1
END
PS The query was tested on SQL Server 2008 R2. And of course the query can be optimized (it's easy to do if you get the idea)