Referencing previous row value using a window function - sql

I'd like to be able to reference a previous rows value (which is calculated using previous rows value as well) and do something to that value in the next column.
It would be something like, where "prev" is the value of the most recent computation:
select sum(price * qty + prev) over (order by date) from temp
Here is example data, note that the 4th row of entry column need reference the 3rd row of realized_pnl:
cumu_usd cumu_stock entry realized_pnl
100 1 100 0
300 5 60 0
150 3 50 (150/2 - 60) * 2 = 30
200 4 (200+30)/4 30
entry = (cumu_usd +realized_pnl) / cumu_stock

Related

Find max and last value from a googlesheet query skipping x rows

I have a data set in google sheets, for each week of data I have 3 rows. I wish to query the data in every second row to calculate the max value and the last value.
For instance:
ROW
DATA
1
800
2
Text
3
500
4
More text
5
600
6
Blah
7
700
8
Blah
For Max value I have the following which will return 800
MAX(FILTER(QUERY(A1:A,"Select * skipping 2"), QUERY(A1:A,"Select * skipping 2") <> 0))
How do I change it up to return the last value? Which should return 700
try:
=LOOKUP(2^99,FILTER(A:A,A:A<>0))
#rockinfreakshow answer will successfully find the last number.
To filter a range by n amount of rows, you can use:
=FILTER(A:A,MOD(ROW(A:A),n)=1)
Change n with your desired value, and 1 with the number of row you want to get. 1 for the first, 2 for the second, but 0 if you want the nth one. To find MAX, just wrap it in MAX()
To find the last one, even if it's a text or number, you can use SORTN and SEQUENCE:
=SORTN(FILTER(A:A,MOD(ROW(A:A),n)=1,A:A<>""),1,1, SEQUENCE(COUNTA(FILTER(A:A,MOD(ROW(A:A),n)=1,A:A<>""))),0)
It orders the elements in reverse order and only chooses the first one
Remember to change n with the number of rows and =1 with the number of row you want to choose

Grouping rows so a column sums to no more than 10 per group

I have a table that looks like:
col1
------
2
2
3
4
5
6
7
with values sorted in ascending order.
I want to assign each row to groups with labels 0,1,...,n so that each group has a total of no more than 10. So in the above example it would look like this:
col1 |label
------------
2 0
2 0
3 0
4 1
5 1
6 2
7 3
I tried using this:
floor(sum(col1) OVER (partition by ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) /10))
But this doesn't work correctly because it is performing the operations
as:
floor(2/10) = 0
floor([2+2]/10) = 0
floor([2+2+3]/10) = 0
floor([2+2+3+4]/10) = 1
floor([2+2+3+4+5]/10 = 1
floor([2+2+3+4+5+6]/10 = 2
floor([2+2+3+4+5+6+7]/10) = 2
It's all coincidentally correct until the last calculation, because even though
[2+2+3+4+5+6+7] / 10 = 2.9
and
floor(2.9) = 2
what it should do is realise 6+7 is > 10 so the 5th row with value 7 needs be in its own group so iterate the group number + 1 and allocate this row into a new group.
What I really want it to do is when it encounters a sum > 10 then set group number = group number + 1, allocate the CURRENT ROW into this new group, and then finally set the new start row to be the CURRENT ROW.
This is too long for a comment.
Solving this problem requires scanning the table, row-by-row. In SQL, this would be through a recursive CTE (or hierarchical query). Hive supports neither of these.
The issue is that each time a group is defined, the difference between 10 and the sum is "forgotten". That is, when you are further down in the list, what happens earlier on is not a simple accumulation of the available data. You need to know how it was split into groups.
A related problem is solvable. The related problem would assign all rows to groups of size 10, splitting rows between two groups. Then you would know what group a later row is in based only on the cumulative sum of the previous rows.

Calculate percentage between two values

I have two columns that hold numbers for which I am trying to calculate the difference in % between and show the result in another column but the results seem to be wrong.
This is the code in question.
SELECT
GenPar.ParameterValue AS ClaimType,
COUNT(Submitted.ClaimNumber) AS SubmittedClaims,
COUNT(ApprovalProvision.ClaimNumber) AS ApprovedClaims,
COUNT(Declined.ClaimNumber) AS DeclinedClaims,
COUNT(Pending.ClaimNumber) AS PendingClaims,
ISNULL(SUM(SubmittedSum.SumInsured),0) AS TotalSubmittedSumInsured,
ISNULL(SUM(ApprovedSum.SumInsured),0) AS TotalApprovedSumInsured,
ISNULL(SUM(RejectedSum.SumInsured),0) AS TotalRejectedSumInsured,
ISNULL(SUM(PendingSum.SumInsured),0) AS TotalPendingSumInsured,
--This column is to show the diff in %
CASE WHEN COUNT(Submitted.ClaimNumber) <> 0 AND COUNT(ApprovalProvision.ClaimNumber) <> 0
THEN (COUNT(ApprovalProvision.ClaimNumber),0) - (COUNT(Submitted.ClaimNumber),0)
/COUNT(Submitted.ClaimNumber) * 100
ELSE 0
END
What I need is to show the difference in % between the columns SubmittedClaims and ApprovedClaims. Any column, or both may contain zeroes and it may not.
So it's: COUNT(Submitted.ClaimNumber) - COUNT(ApprovalProvision.ClaimNumber) / COUNT(Submitted.ClaimNumber) * 100 as far as I know.
I have tried this and an example of what it does is it takes 1 and 117 and returns 17 when the difference between 1 and 117 is a decrease of 99.15%. Another example is 2 and 100. This simply returns 0 whereas the difference is a decrease of 98%.
CASE WHEN COUNT(Submitted.ClaimNumber) <> 0 AND COUNT(ApprovalProvision.ClaimNumber) <> 0
THEN (COUNT(ApprovalProvision.ClaimNumber),0) - (COUNT(Submitted.ClaimNumber),0)
/COUNT(Submitted.ClaimNumber) * 100
ELSE 0
END
I've checked this link and this seems to be what I am doing.
Percentage difference between two values
I've also tried this code:
NULLIF(COUNT(Submitted.ClaimNumber),0) - NULLIF(COUNT(ApprovalProvision.ClaimNumber),0)
/ NULLIF(COUNT(Submitted.ClaimNumber),0) * 100
and this takes for example 2 and 100 and returns -4998 when the real difference is a decrease of 98%.
For completion, Submitted.ClaimNumber is this portion of code:
LEFT OUTER JOIN (SELECT * FROM Company.Schema.ClaimMain WHERE CurrentStatus=10)Submitted
ON Submitted.ClaimNumber = ClaimMain.ClaimNumber
ApprovalProvision.ClaimNumber is this:
LEFT OUTER JOIN (SELECT * FROM Company.Schema.ClaimMain WHERE CurrentStatus=15)ApprovalProvision
ON ApprovalProvision.ClaimNumber = ClaimMain.ClaimNumber
Ideally, this column would also deal with 0's. So if one value is 0 and the other is X, the result should return 0 since a percentage can't be calculated if original number is 0. If the original value is X and the new value is 0, I should show a decrease of 100%.
This will occur across all columns but there is no need to flood the page with the rest of the columns since all calculations will occur in the same manner.
Anybody see what I'm doing wrong?
I'm not familiar with why you have (x,0) as a syntax
But I see that you have
(COUNT(ApprovalProvision.ClaimNumber),0) - (COUNT(Submitted.ClaimNumber),0)
/COUNT(Submitted.ClaimNumber) * 100
shouldn't it be,
( COUNT(ApprovalProvision.ClaimNumber) - COUNT(Submitted.ClaimNumber) )
/COUNT(Submitted.ClaimNumber) * 100
It looks like it would do count of ApprovalProvision.ClaimNumber - 100 since submitted.claimnumber divided by itself is 1 times 100 is 100.
The 4900 number actually sounds right. Lets take the following example, you have 2 apples, and then you're given 98 more and got 100 apples.
An increase of 98% would have meant from 2 apples, you would have 3.96 apples.
An increase of 100% means from 2 apples you end with 4 apples. An increase of 1000% means from 2 apples you end with 22 apples. So 4000% means you end with 82 apples. 5000% means from 2 apples, you reach 102 apples.
(100-2)/2*100 = 98 / 2 = 49 * 100 = 4900, so it looks like there is a 4900% increase in number of apples if you started with 2 apples and reach 100.
Now if you had flipped the 2 and 100, say starting with 100, now you have 2,
(2-100)/100*100 = -98, so a -98% change of apples, or a 98% decrease.
Hope this solves your problem.

Get the Average on column1 based on the Range between First Value (0)and Last value (0) of Column2 in qlikview

I am very new to Qlikview.
I need to get the Average of Goodpages Column , between the Range of First Value as 0 (which is found in 3rd row) and Last Value as 0 (which is found in 10th row). Note: The data is not static. So the value as 0 in Column (Yellow Calculated) can come in any row. I need this requirement.
GoodPages YellowCalculated
315 0.35
320 0.25
300 0 -- First Value as 0 found in 3rd row
200 0.37
250 0.17
315 0.18
350 0
345 0.68
355 0.57
325 0 -- Last Value as 0 found in 10th row
275 0.27
Not sure how you want to present the result. Here is one way.
Create a straight table with dimensions ID and GoodPages. You will get the average with this expression: avg( {<_flag_avg = {1}>} GoodPages). The total average will show what you are after. You can also add another expression: only(YellowCalculated)
//Script:
Data:
Load
RowNo() as ID,
*,
if(YellowCalculated = 0,1) as _flag_zero
;
LOAD * INLINE [
GoodPages, YellowCalculated
315,0.35
320,0.25
300,0
200,0.37
250,0.17
315,0.18
350,0
345,0.68
355,0.57
325,0
275,0.27
];
minmax:
load
min(ID) as minRow,
max(ID) as maxrow
Resident Data
where _flag_zero = 1;
test:
IntervalMatch(ID)
Load
minRow,
maxrow
Resident minmax;
LEFT JOIN(Data)
LOAD
ID,
1 as _flag_avg
Resident test;
drop table test;
drop table minmax;
drop field _flag_zero;

How to add a metric for top n count in mdx

I want to create a metric which will show only the top 10 result whenever applied.
Suppose the data is
item Price
A 20
B 45
C 50
D 80
E 10
F 90
G 85
H 55
I 40
J 100
I want to show the top 5 result in descending order. So, the expected result is:
j-100
f-90
g-85
d-80
h-55
other-165
I am already getting the result with the following MDX query:
With
Set [Top10] AS
(TOPCOUNT({ORDER( ({[DimProduct].[item].[All].Children})
,([Measures].[Price]),BDESC)},10))
MEMBER [DimProduct].[item].[OtherAll] AS
(avg({EXCEPT([DimProduct].[item].Members, [Top10])})
)
Select
[Measures].[ Price] on Columns,
{
[Top10]
,[DimProduct].[item].[OtherAll]
} on Rows
FROM [testcube]
Result:
j-100
f-90
g-85
d-80
h-55
other-165
I basically want to create a metric with the above query and save it to my cube solution.
So, when I drag item and price it will show all the data i.e all 10 rows.
A 20
B 45
C 50
D 80
E 10
F 90
G 85
H 55
I 40
J 100
And, when we drag our newly created metric then it will show top 5 result with the other row (other will be sum of rest of the rows)
j-100
f-90
g-85
d-80
h-55
other-165
Is there any way to achieve this functionality?
Edit 1
Created one dynamic set with top 10
Created calculated measure for others
Created another dynamic set to show both the results i.e top 10 and others.
But when we select the dynamic set to show top 10 + others, it is throwing the error:
A set has been encountered that can not contain calculated members