I'm taking a set of transactions and amounts, and I want to create a new amount column, with the following logic --
Check a running total of (new) amounts thus far.
If adding this amount to the previous total would bring the total to less than zero, the new amount field should be zero. Otherwise, it should be equal to the old amount.
Here's an example of what I'm looking for --
Item Record Old amount New Amount Running Total
1 1 100 100 100
1 2 -100 -100 0
1 3 -200 0 0
1 4 500 500 500
1 5 -300 -300 200
1 6 300 300 500
My running total starts at zero.
My first amount is 100, and that doesn't take the total < 0, so I pass it through and set the
new amount to 100.
My second amount is -100, and that doesn't take my running total of 100 to < 0, so I set the new amount to -100.
My third amount is -200. That would take the running total of 0 to -200, < 0. Thus, I set the new amount to 0.
My fourth amount is 500. It gets passed through.
My fifth amount is -300. That would take the running total of 500 to 200, which is still >= 0. It gets passed through.
My sixth amount is 300. It gets passed through, leaving me with a final amount total of 500.
The difficult part is in cases like record five here. In order to know that it won't take the final running total below zero, you need to have already calculated the new total for record 3.
I think you can do this by setting up common table expressions in order to make a recursive query, but I've foundered on how exactly to create that. If possible, I'd like to avoid cursors.
this is a WINDOW FUNCTION solution with a wrapping CASE statement.
look up LAG
Related
I'm trying to figure out how to add the values of one column (the amount column) to the next few rows based on the condition of another column (the days column). If the condition of the days column is greater than 1, for each day greater than 1 I add the amount column to that many following rows. So if days is three, I add the amount to the next two rows (the first day is just the current row). I actually think this is easier if I make a copy of the amount column, so I made a copy called backlog.
So let's say I have an amount column that represents the amount of support tickets that need to be resolved each day. Each amount has a number of days it takes for the amount to be resolved. I need the total amount to be a sum of the value today and the sum of the outstanding tickets. So if I have an amount of 1 for 2 days, I have 1 ticket amount today and I add that same 1 tomorrow to the ticket amount of tomorrow. If this doesn't make sense, the below examples will. I have a solution as well, but my main issue is doing this efficiently.
Here is a sample dataframe to use:
amount = list(np.zeros(10)) + [random.randint(1,3) for val in range(15)]
random.shuffle(amount)
ex = pd.DataFrame({
'Amount': amount
})
ex.loc[ex['Amount']>0, 'Days'] = [random.randint(0,4) for val in range(15)]
ex.loc[ex['Amount']==0, 'Days'] = 0
ex['Days'] = ex['Days'].astype(int)
ex['Backlog'] = ex['Amount']
ex.head(10)
Input Dataframe:
Amount
Days
Backlog
2
0
2
1
3
1
2
2
2
3
0
3
Desired Output Dataframe:
Amount
Days
Backlog
2
0
2
1
3
1
2
2
3
3
0
6
In the last two values of the backlog column, I have a value of 3 (2 from the current day amount plus 1 from the prior day amount) and a value of 6 (3 for the current day + 2 from the previous day amount + 1 from two days ago).
I have made code for this below, which I think achieves the outcome:
for i in range(0, len(ex['Amount'])):
Days = ex['Days'].iloc[i]
if Days >= 2:
for j in range (1,Days):
if (i+j)>= len(ex['Amount']):
break
ex['Backlog'].iloc[i+j] += ex['Amount'].iloc[i]
The problem is that I'm already using two for loops to slice the data frame for two features first, so when this code is used as a function for a very large data frame it runs far too slowly, and my main goal has been to implement a faster way to do this. Is there a more efficient pandas method to achieve the same outcome? Possibly without having to use slow iteration or a nested for loop? I'm at a loss.
I have two columns that hold numbers for which I am trying to calculate the difference in % between and show the result in another column but the results seem to be wrong.
This is the code in question.
SELECT
GenPar.ParameterValue AS ClaimType,
COUNT(Submitted.ClaimNumber) AS SubmittedClaims,
COUNT(ApprovalProvision.ClaimNumber) AS ApprovedClaims,
COUNT(Declined.ClaimNumber) AS DeclinedClaims,
COUNT(Pending.ClaimNumber) AS PendingClaims,
ISNULL(SUM(SubmittedSum.SumInsured),0) AS TotalSubmittedSumInsured,
ISNULL(SUM(ApprovedSum.SumInsured),0) AS TotalApprovedSumInsured,
ISNULL(SUM(RejectedSum.SumInsured),0) AS TotalRejectedSumInsured,
ISNULL(SUM(PendingSum.SumInsured),0) AS TotalPendingSumInsured,
--This column is to show the diff in %
CASE WHEN COUNT(Submitted.ClaimNumber) <> 0 AND COUNT(ApprovalProvision.ClaimNumber) <> 0
THEN (COUNT(ApprovalProvision.ClaimNumber),0) - (COUNT(Submitted.ClaimNumber),0)
/COUNT(Submitted.ClaimNumber) * 100
ELSE 0
END
What I need is to show the difference in % between the columns SubmittedClaims and ApprovedClaims. Any column, or both may contain zeroes and it may not.
So it's: COUNT(Submitted.ClaimNumber) - COUNT(ApprovalProvision.ClaimNumber) / COUNT(Submitted.ClaimNumber) * 100 as far as I know.
I have tried this and an example of what it does is it takes 1 and 117 and returns 17 when the difference between 1 and 117 is a decrease of 99.15%. Another example is 2 and 100. This simply returns 0 whereas the difference is a decrease of 98%.
CASE WHEN COUNT(Submitted.ClaimNumber) <> 0 AND COUNT(ApprovalProvision.ClaimNumber) <> 0
THEN (COUNT(ApprovalProvision.ClaimNumber),0) - (COUNT(Submitted.ClaimNumber),0)
/COUNT(Submitted.ClaimNumber) * 100
ELSE 0
END
I've checked this link and this seems to be what I am doing.
Percentage difference between two values
I've also tried this code:
NULLIF(COUNT(Submitted.ClaimNumber),0) - NULLIF(COUNT(ApprovalProvision.ClaimNumber),0)
/ NULLIF(COUNT(Submitted.ClaimNumber),0) * 100
and this takes for example 2 and 100 and returns -4998 when the real difference is a decrease of 98%.
For completion, Submitted.ClaimNumber is this portion of code:
LEFT OUTER JOIN (SELECT * FROM Company.Schema.ClaimMain WHERE CurrentStatus=10)Submitted
ON Submitted.ClaimNumber = ClaimMain.ClaimNumber
ApprovalProvision.ClaimNumber is this:
LEFT OUTER JOIN (SELECT * FROM Company.Schema.ClaimMain WHERE CurrentStatus=15)ApprovalProvision
ON ApprovalProvision.ClaimNumber = ClaimMain.ClaimNumber
Ideally, this column would also deal with 0's. So if one value is 0 and the other is X, the result should return 0 since a percentage can't be calculated if original number is 0. If the original value is X and the new value is 0, I should show a decrease of 100%.
This will occur across all columns but there is no need to flood the page with the rest of the columns since all calculations will occur in the same manner.
Anybody see what I'm doing wrong?
I'm not familiar with why you have (x,0) as a syntax
But I see that you have
(COUNT(ApprovalProvision.ClaimNumber),0) - (COUNT(Submitted.ClaimNumber),0)
/COUNT(Submitted.ClaimNumber) * 100
shouldn't it be,
( COUNT(ApprovalProvision.ClaimNumber) - COUNT(Submitted.ClaimNumber) )
/COUNT(Submitted.ClaimNumber) * 100
It looks like it would do count of ApprovalProvision.ClaimNumber - 100 since submitted.claimnumber divided by itself is 1 times 100 is 100.
The 4900 number actually sounds right. Lets take the following example, you have 2 apples, and then you're given 98 more and got 100 apples.
An increase of 98% would have meant from 2 apples, you would have 3.96 apples.
An increase of 100% means from 2 apples you end with 4 apples. An increase of 1000% means from 2 apples you end with 22 apples. So 4000% means you end with 82 apples. 5000% means from 2 apples, you reach 102 apples.
(100-2)/2*100 = 98 / 2 = 49 * 100 = 4900, so it looks like there is a 4900% increase in number of apples if you started with 2 apples and reach 100.
Now if you had flipped the 2 and 100, say starting with 100, now you have 2,
(2-100)/100*100 = -98, so a -98% change of apples, or a 98% decrease.
Hope this solves your problem.
I have a cube I've built with three separate measures: "TY Sales", "LY Sales", and "% Change", what I'm trying to do is have special behavior for the aggregate rows, basically not including any "LY Sales" values when summing the total if "TY Sales" is 0. So currently my cube works like below:
LYSales TYSales %Change
Year 1 450 300 -33%
Week 1 100 125 +25%
Week 2 150 175 +14%
Week 3 200 0 +0%
The aggregate column "Year 1" in this example, is summing all values for each sales measure. What I want it to do instead, is only include values in LYSales if TYSales also has a non-zero value. So my ideal state would be below:
LYSales TYSales %Change
Year 1 250 300 +20%
Week 1 100 125 +25%
Week 2 150 175 +14%
Week 3 200 0 +0%
I'm new to SSAS, so any guidance is appreciated. Thanks
An easy and reliable way to achieve that would be to change the source column of LYSales to be zero if TYSales is zero. This would be done in the fact table on which the measure is based. You could implement that
either in the ETL process, changing the LYSales column values to be zero when TYSales is zero,
or in a view based on the fact table that is then used in the Data Source View instead of the original table,
or as a Calculated Calculation of the fact table in the Data Source View.
In the latter two cases, the calculation formula would be SQL like this:
case when TYSales <> 0 then LYSales else 0 end
Then switch the measure definition to use that column.
So I need an idea of how to divide out an amount of money into actual counts of various bills and coinage. I know this is confusing, so let me give an example:
$16.32 - Sixteen dollars and thirty-two cents
One $10 bill
One $5 bill
One $1 bill
One Quarter ($0.25)
One Nickel ($0.05)
Two Pennies ($0.01)
So as you can see, we're just getting the number of bills and coinage that goes into a value, which will change according to user input.
Here's my current setup (Visual Basic):
If 100 Mod amount < 0 Then
If 50 Mod amount < 0 Then
' Continue this pattern until you get all the way down to the end ($0.01)
Else
While amount > 50
fiftiesAmount += 1
amount -= 50
End If
Else
While amount > 100
hundredsAmount += 1
amount -= 100
End If
Basically, each If statement determines whether or not your total amount needs an extra billing amount of that type, and then either adds to the amount of bills/coinage already created or moves on to the next amount.
Is this an efficient way of doing things, or am I missing out on an easier/faster algorithm/pattern that would make my life, and whoever is reading my code's life easier?
If you need extra details, I'll be happy to edit the question as needed.
Convert your amount to cents (it's easier). Divide by the currency value being tested, and then deduct that amount from the balance (pseudo-code)
Value = 16.32 * 100 ' Convert to cents
If Value > 10000 ' Hundreds
Hundreds = Value / 10000 ' How many?
Value = Value - (Hundreds * 10000) ' Reduce amount accordingly
End If
If Value > 5000 ' Fifties
Fifties = Value / 5000
Value = Value - (Fifties * 5000)
End If
If Value > 2000 ' Twenties
Twenties = Value / 2000
Value = Value - (Twenties * 2000)
End If
Repeat until you have less than 100, at which point you start with coins (50, 25, 10, 5)
Once you've got > 10, you've reached pennies; save them, reduce Value by that amount, and
Value is zero, so you're finished.
Here is my data (apologies for poor formatting, maybe that should have been my first question!):
Customer Percentage Increase
1 2%
2 12%
3 -50%
4 87%
5 -20%
6 -1%
7 123%
8 -98%
9 10%
10 13%
I created a pivot table in Excel with Percentage Increase as the Row Labels and Count of Customer as the value.
Row Labels Count of Customer
-98% 1
-50% 1
-20% 1
-1% 1
2% 1
10% 1
12% 1
13% 1
87% 1
123% 1
Grand Total 10
I then wanted to group the percentages to something easier to read, but the percentage ranges do not show percentages, instead they show regular numbers.
Row Labels Count of Customer
<-0.5 1
-0.5--0.25 1
-0.25-0 2
0-0.25 4
0.75-1 1
>1 1
Grand Total 10
How do I get the number formatting of my percentage ranges to be percentages, i.e. 0% - 25%, etc?
Thank you.
The only way I know is to change them by hand, i.e., click into the cell with the label and change it to say what you want.
If you do this, I'd also create the labels for the categories that don't show up yet, e.g., 25% to 50% (and 50% to 75%) in your example. To do this, choose Field Settings>Layout & Print and check "Show items with no data". Change the labels for those ranges as well. Once you do, you can uncheck "Show items with no data", and in the future if there are counts in new ranges the new labels will still be what you entered. (At least it seems to work that way).