Why does my cumulative column not work as expected?

Why does my cumulative column not work as expected? - sql

Here is my query , I have a column called cum_balance which is supposed to calculate the cumulative balance but after row number 10 there is an anamoly and it doesn't work as expected , all I notice is that from row number 10 onwards the hour column has same value. What's the right syntax for this?
[select
hour,
symbol,
amount_usd,
category,
sum(amount_usd) over (
order by
hour asc RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) as cum_balance
from
combined_transfers_usd_netflow
order by
hour][1]
I have tried removing RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW , adding a partition by hour and group by hour. None of them gave the expected result or errors
Row Number
Hour
SYMBOL
AMOUNT_USD
CATEGORY
CUM_BALANCE
1
2021-12-02 23:00:00
WETH
227.2795
in
227.2795
2
2021-12-03 00:00:00
WETH
-226.4801153
out
0.7993847087
3
2022-01-05 21:00:00
WETH
5123.716203
in
5124.515587
4
2022-01-18 14:00:00
WETH
-4466.2366
out
658.2789873
5
2022-01-19 00:00:00
WETH
2442.618599
in
3100.897586
6
2022-01-21 14:00:00
USDC
99928.68644
in
103029.584
7
2022-03-01 16:00:00
UNI
8545.36098
in
111574.945
8
2022-03-04 22:00:00
USDC
-2999.343
out
108575.602
9
2022-03-09 22:00:00
USDC
-5042.947675
out
103532.6543
10
2022-03-16 21:00:00
USDC
-4110.6579
out
98594.35101
11
2022-03-16 21:00:00
UNI
-3.209306045
out
98594.35101
12
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
13
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
14
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
15
2022-03-16 21:00:00
UNI
-6.418612089
out
98594.35101

The "problem" with your data in all the ORDER BY values after row 10 are the same.
So if we shrink the data down a little, and use for groups to repeat the experiment:
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,sum(val) over ( partition by grp order by date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as cum_val_1
,sum(val) over ( partition by grp order by date ) as cum_val_2
from data as d
order by 1,2;
we get:
GRP
DATE
VAL
CUM_VAL_1
CUM_VAL_2
1
2021-01-01
10
10
10
1
2021-01-02
11
21
21
1
2021-01-03
12
33
33
2
2021-01-01
20
20
20
2
2021-01-02
21
63
63
2
2021-01-02
22
63
63
2
2021-01-04
23
86
86
we see with group 1 that values accumulate as we expect. So for group 2 we put duplicate values as see those rows get the same value, but rows after "work as expected again".
This tells us how this function work across unstable data (values that sort the same) is that they are all stepped in one leap.
Thus if you want each row to be different they will need better ORDER distinctness. This could be forced by add random values of literal random nature, or feeling non random ROW_NUMBER, but really they would be random, albeit not explicit, AND if you use random, you might get duplicates, thus really should use ROW_NUMBER or SEQx to have unique values.
Also the second formula shows they are equal, and it's the ORDER BY problem not the framing of "which rows" are used.
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,seq8() as s
,sum(val) over ( partition by grp order by date ) as cum_val_1
,sum(val) over ( partition by grp order by date, s ) as cum_val_2
,sum(val) over ( partition by grp order by date, seq8() ) as cum_val_3
from data as d
order by 1,2;
gives:
GRP
DATE
VAL S
CUM_VAL_1
CUM_VAL_2
CUM_VAL_2_2
1
2021-01-01
10
0
10
10
1
2021-01-02
11
1
21
21
1
2021-01-03
12
2
33
33
2
2021-01-01
20
3
20
20
2
2021-01-02
21
4
63
41
2
2021-01-02
22
5
63
63
2
2021-01-04
23
6
86
86

Related

Rolling min/max in SQL Query?

The database engine is SQLite3. It's a simple table:
CREATE TABLE T (ID INTEGER, DATE STRING, VALUE NUMERIC);
-- rows of T:
id date value
1 2020-01-01 11
2 2020-01-01 23
3 2020-01-01 32
4 2020-01-01 41
5 2020-01-01 57
6 2020-01-01 62
How can I create a rolling min/max? Say of period 3:
id date val min3 max3
1 2020-01-01 11 11 11
2 2020-01-01 23 11 11
3 2020-01-01 32 11 32
4 2020-01-01 41 23 41
5 2020-01-01 57 32 57
5 2020-01-01 62 41 62
I keep getting min 11 Max 62 for everything because I don't know how to do the rolling min/max

You can use window functions:
select t.*,
min(val) over (order by date rows between 2 preceding and current row) min3,
max(val) over (order by date rows between 2 preceding and current row) max3
from t;

How to get a row number or ID of where the MAX() value was found

Good day community.
I'm having a hard time trying to figure out a way to achieve the results I try to get. As im not very skilled with SQL queries, I start to lose my mind. What I'm trying to do is to find the highest and lowest grade on a particular test, but I also wish to get the ID or the row number (they are matching) of the rows where the MAX() and MIN() were found.
The table "Results" looks like this:
ResultID|Test_UK|Test_US|TestUK_Scr|TestUS_Scr|TestTakenOn
1 1 3 85 14 2018-11-22 00:00:00.000
2 3 1 41 94 2018-11-23 00:00:00.000
3 2 4 71 54 2018-11-24 00:00:00.000
4 4 2 51 52 2018-12-25 00:00:00.000
5 6 3 74 69 2018-12-01 00:00:00.000
6 3 6 83 57 2018-12-02 00:00:00.000
7 7 4 91 98 2018-12-03 00:00:00.000
8 4 7 88 22 2018-12-04 00:00:00.000
9 5 8 41 76 2018-12-08 00:00:00.000
10 8 5 37 64 2018-12-09 00:00:00.000
The results I get when I run my query...
TestID|TopScore|LowScore|LastDateTestTaken
1 94 85 2018-11-23 00:00:00.000
2 71 52 2018-11-25 00:00:00.000
3 83 14 2018-12-02 00:00:00.000
4 98 51 2018-12-04 00:00:00.000
5 64 41 2018-12-09 00:00:00.000
6 74 57 2018-12-02 00:00:00.000
7 91 22 2018-12-04 00:00:00.000
8 76 37 2018-12-09 00:00:00.000
This is the queries I'm working on.
This query returns the results mentioned above
WITH
-- Combine the results of UK and US tests
Combined_Results_Both_Tests AS(
select ResultID as resultID, Test_UK as TestID, Test_UK_Scr as TestScore, TestTakenOn as TestDate from Results
union all
select ResultID as resultID, Test_US as TestID, Test_US_Scr as TestScore, TestTakenOn as TestDate from Results),
--Gets TOP and WORST results of the tests, LastDateTaken (Needs to add ResultID!)
Get_Best_and_Worst_Results_And_LastTestDate AS(
SELECT TestID ,max(TestScore) AS TopScore ,min(TestScore) AS LowScore ,max(TestDate) AS LastDateTestTaken
FROM Combined_Results_Both_Tests
GROUP BY TestID)
--Final query execution
SELECT * FROM Get_Best_and_Worst_Results_And_LastTestDate
I've tried to achieve my desired results with something like this, which doesn't work and is also very inefficient. What I mean that it doesn't work, it is filled with dublicates, whenever the match is found on US and UK tests.
--Gets ReslutID of Min and Max values
Get_ResultID_Of_Results AS(
SELECT * FROM Get_Best_and_Worst_Results_And_LastTestDate A
CROSS APPLY
(SELECT ResultID FROM Results res
WHERE (A.TestID = res.Test_UK AND A.TopScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_US AND A.TopScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_UK AND A.LowScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_US AND A.LowScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_UK AND A.TopScore = res.Test_US_Scr) OR
(A.TestID = res.Test_US AND A.TopScore = res.Test_US_Scr) OR
(A.TestID = res.Test_UK AND A.LowScore = res.Test_US_Scr) OR
(A.TestID = res.Test_US AND A.LowScore = res.Test_US_Scr)) D)
SELECT * FROM Get_ResultID_Of_Results
This is the results I'm trying to achieve (extra columns that would state where Max value and Min value was found) that would state the ResultID from Results table. Also, the row numbers match the ResultIDs in the table.
TestID|TopScore|LowScore|LastDateTestTaken |MaxValueLocID|MinValueLocID|
1 94 85 2018-11-23 00:00:00.000 2 1
2 71 52 2018-11-25 00:00:00.000 3 4
3 83 14 2018-12-02 00:00:00.000 6 1
4 98 51 2018-12-04 00:00:00.000 7 4
5 64 41 2018-12-09 00:00:00.000 10 9
6 74 57 2018-12-02 00:00:00.000 5 6
7 91 22 2018-12-04 00:00:00.000 7 8
8 76 37 2018-12-09 00:00:00.000 9 10
Asking for any help with the solution, theoretical or even practical. Thank you!

If I follow correctly, you want to unpivot the data and aggregate:
select v.testid, max(v.score), min(v.score) max(v.TestTakenOn)
from results r cross apply
(values (Test_UK, TestUK_Scr, TestTakenOn),
(Test_US, TestUS_Scr, TestTakenOn)
) v(testid, score, TestTakenOn)
group by v.testid;
Then you can modify this using window functions:
select v.testid, max(v.score), min(v.score) max(v.TestTakenOn),
max(case when seqnum_desc = 1 then resultid end) as resultid_max,
max(case when seqnum_asc = 1 then resultid end) as resultid_min
from (select r.resultid, v.*,
row_number() over (partition by v.testid order by v.score asc) as seqnum_asc,
row_number() over (partition by v.testid order by v.score desc) as seqnum_desc
from results r cross apply
(values (Test_UK, TestUK_Scr, TestTakenOn),
(Test_US, TestUS_Scr, TestTakenOn)
) v(testid, score, TestTakenOn)
) v
group by v.testid;

with allScores (TestId, Score, TestTakenOn, valueLoc) as
(
select [Test_UK], [TestUK_Scr],[TestTakenOn], ResultId from scores
union all
select [Test_US], [TestUS_Scr],[TestTakenOn], ResultId from scores
),
maxMin (TestId, MaxScore, MinScore, LastTestDate) as (
select TestId, Max(score), Min(score), Max(TestTakenOn)
from allScores
group by TestId
)
select mm.*, a1.valueLoc as MaxValueLoc, a2.ValueLoc as MinValueLoc
from maxMin mm
inner join allScores a1
on mm.TestId = a1.TestId and mm.MaxScore = a1.score
inner join allScores a2
on mm.TestId = a2.TestId and mm.MinScore = a2.score;
DBFiddle demo

How to calculate a running total that is a distinct sum of values

Consider this dataset:
id site_id type_id value date
------- ------- ------- ------- -------------------
1 1 1 50 2017-08-09 06:49:47
2 1 2 48 2017-08-10 08:19:49
3 1 1 52 2017-08-11 06:15:00
4 1 1 45 2017-08-12 10:39:47
5 1 2 40 2017-08-14 10:33:00
6 2 1 30 2017-08-09 07:25:32
7 2 2 32 2017-08-12 04:11:05
8 3 1 80 2017-08-09 19:55:12
9 3 2 75 2017-08-13 02:54:47
10 2 1 25 2017-08-15 10:00:05
I would like to construct a query that returns a running total for each date by type. I can get close with a window function, but I only want the latest value for each site to be summed for the running total (a simple window function will not work because it sums all values up to a date--not just the last values for each site). So I guess it could be better described as a running distinct total?
The result I'm looking for would be like this:
type_id date sum
------- ------------------- -------
1 2017-08-09 06:49:47 50
1 2017-08-09 07:25:32 80
1 2017-08-09 19:55:12 160
1 2017-08-11 06:15:00 162
1 2017-08-12 10:39:47 155
1 2017-08-15 10:00:05 150
2 2017-08-10 08:19:49 48
2 2017-08-12 04:11:05 80
2 2017-08-13 02:54:47 155
2 2017-08-14 10:33:00 147
The key here is that the sum is not a running sum. It should only be the sum of the most recent values for each site, by type, at each date. I think I can help explain it by walking through the result set I've provided above. For my explanation, I'll walk through the original data chronologically and try to explain the expected result.
The first row of the result starts us off, at 2017-08-09 06:49:47, where chronologically, there is only one record of type 1 and it is 50, so that is our sum for 2017-08-09 06:49:47.
The second row of the result is at 2017-08-09 07:25:32, at this point in time we have 2 unique sites with values for type_id = 1. They have values of 50 and 30, so the sum is 80.
The third row of the result occurs at 2017-08-09 19:55:12, where now we have 3 sites with values for type_id = 1. 50 + 30 + 80 = 160.
The fourth row is where it gets interesting. At 2017-08-11 06:15:00 there are 4 records with a type_id = 1, but 2 of them are for the same site. I'm only interested in the most recent value for each site so the values I'd like to sum are: 30 + 80 + 52 resulting in 162.
The 5th row is similar to the 4th since the value for site_id:1, type_id:1 has changed again and is now 45. This results in the latest values for type_id:1 at 2017-08-12 10:39:47 are now: 30 + 80 + 45 = 155.
Reviewing the 6th row is also interesting when we consider that at 2017-08-15 10:00:05, site 2 has a new value for type_id 1, which gives us: 80 + 45 + 25 = 150 for 2017-08-15 10:00:05.

You can get a cumulative total (running total) by including an ORDER BY clause in your window frame.
select
type_id,
date,
sum(value) over (partition by type_id order by date) as sum
from your_table;
The ORDER BY works because
The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

SELECT type_id,
date,
SUM(value) OVER (PARTITION BY type_id ORDER BY type_id, date) - (SUM(value) OVER (PARTITION BY type_id, site_id ORDER BY type_id, date) - value) AS sum
FROM your_table
ORDER BY type_id,
date

Difference between rows in sql based on rows number (SQL Server)

I have a problem. I have table with following columns and sample data:
RN Date Time
---------------------
1 2015-02-02 12
2 2015-02-02 25
3 2015-02-02 27
1 2015-02-08 42
2 2015-02-08 45
1 2015-03-01 60
2 2015-03-01 62
3 2015-03-01 63
4 2015-03-01 63
I need get a difference between time start and time end of every day.
For example:
27-12
45-42
63-60
Any suggestions? :)

select
Date, max(Time) as mx, min(Time) as mn, max(Time) - min(Time) as diff
from table_name
group by Date

Calculate wrong amount in query

I have a table with some records now want to repeat this table content with some logic. I have two date start date and termination date, means record start from start_date and end on termination date, it will working fine but problem is calculate amount on it,
Logic is amount calculation formula
basesalary / 12 * ( SUTARate / 100 ) * ( x.num+1)
if this amount is less than SUTAMaximumAmount this amount is used, else 0. And one more thing if amount will be remain and year is complete then restart calculation from next year.. x.num is temporary table which hold 90 number from 1 to 90
Table
BaseSalary| S_Date | T_Date | SUTARate| SUTAMaximumAmount |A_S_Percent
48000 | 7-1-2013 | 3-15-2015 | 1.1 | 300 | 5
My result is
DAte amount
2013-07-01 00:00:00.000 44
2013-08-01 00:00:00.000 44
2013-09-01 00:00:00.000 44
2013-10-01 00:00:00.000 44
2013-11-01 00:00:00.000 44
2013-12-01 00:00:00.000 44
2014-01-01 00:00:00.000 36
2014-02-01 00:00:00.000 -8
2014-03-01 00:00:00.000 -52
2014-04-01 00:00:00.000 -96
2014-05-01 00:00:00.000 -140
2014-06-01 00:00:00.000 -184
2014-07-01 00:00:00.000 -228
2014-08-01 00:00:00.000 -272
2014-09-01 00:00:00.000 -316
2014-10-01 00:00:00.000 -360
2014-11-01 00:00:00.000 -404
2014-12-01 00:00:00.000 -448
2015-01-01 00:00:00.000 -492
2015-02-01 00:00:00.000 -536
2015-03-01 00:00:00.000 -580
and I want result like this
Date | Amount
7-1-2013 44
8-1-2013 44
9-1-2013 44
10-1-2013 44
11-1-2013 44
12-1-2013 44
1-1-2014 44
2-1-2014 44
3-1-2014 44
4-1-2014 44
5-1-2014 44
6-1-2014 44
7-1-2014 36
1-1-2015 44
2-1-2015 44
3-1-2015 44
Query
SELECT dateadd(M, (x.num),d.StartDate) AS TheDate,
Round( case when ((convert(float,d.SUTARate)/100* convert(integer,d.BaseSalary) / 12)*(x.num+1)) <=CONVERT(money,d.SUTAMaximumAmount)
then (convert(float,d.SUTARate)/100* convert(integer,d.BaseSalary)* / 12)
else (CONVERT(money,d.SUTAMaximumAmount)-((convert(float,d.SUTARate)/100* (convert(integer,d.BaseSalary) / 12)*x.num)))*Power((1+convert(float,d.AnnualSalaryIncreasePercent)/100),Convert(int,x.num/12)) end, 2) AS Amount,
FROM #Table AS x, myTbl AS d
WHERE (x.num >= 0) AND (x.num <= (DateDiff(M, d.StartDate, d.TerminationDate)) )
temporary table
create TABLE #Table (
num int NOT NULL,
);
;WITH Nbrs ( n ) AS (
SELECT 0 UNION ALL
SELECT 1 + n FROM Nbrs WHERE n < 99 )
INSERT #Table(num)
SELECT n FROM Nbrs
OPTION ( MAXRECURSION 99 )
this table used as x in above query

I created this SQLFiddle.
-- Numbers table is probably a good idea
WITH Nbrs ( num ) AS
(
SELECT 0 UNION ALL
SELECT 1 + num FROM Nbrs WHERE num < 99
)
-- All columns, except for 'num' come from myTbl
SELECT dateadd(M, (num),S_Date) AS TheDate,
Round(
CASE
WHEN (SUTARate / 100) * (BaseSalary / 12) <= SUTAMaximumAmount
THEN (SUTARate / 100) * (BaseSalary / 12)
ELSE 0
END
, 2) As Amount
-- This may be the number you were trying to multiply
,DatePart(Month, dateadd(M, (num),S_Date)) As PotentialMultiiplier
FROM Nbrs AS x, myTbl AS d
WHERE (num >= 0)
AND (num <= (DateDiff(M, S_Date, T_Date)) )
I am not entirely sure what your goal is, but you are probably on the right track with a numbers table. Because the result you are going for does not change much over time (i.e., nearly every month has an amount of $44), it is difficult to determine the correct code for the query. So, I recommend you provide a different set of data for better result-checking.
If you fiddle with the SQL in the provided link, you can re-post with better code, and then we can better solve your issue.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why does my cumulative column not work as expected? - sql

Related

Rolling min/max in SQL Query?

How to get a row number or ID of where the MAX() value was found

How to calculate a running total that is a distinct sum of values

Difference between rows in sql based on rows number (SQL Server)

Calculate wrong amount in query

Categories

Resources