Given a column for 'Growth Factors' and a starting value I need to compute future values. For example, if a starting value of 1 is provided then the computed 'Value' column would be as shown below. Thus, Value(t2) = Value(t1) x Growth_Factor(t2). Base condition is Value(t1) = Starting_Value x Growth_Factor(t1). Example shown below.
How do I compute this in SQL (or Presto) where the computed value is dependent on previous computed values?
Growth Factor
Value
Time
1.2
1.2
1
1.1
1.32
2
1.5
1.98
3
1.7
3.366
4
You could sum the logarithms and invert when finished. This will work other than some possibility of small floating point error. But you're also going to introduce error once you multiply more than a few numbers with doubling decimal places at every iteration.
exp(
sum(ln(growth)) over (order by time)
)
Related
I have a column containing measurement values in meters.
I want to round them up (ceil) them to the next 100m and return it as a km value.
Special thing is: if the original value is a "round" number (100m increment) it should be ceiled up to the next 100m increment (see line 3 in the example below).
Example:
meter_value kilometer_value
1111 1.2
111 0.2
1000 1.1
I think I can get the first two lines by doing:
ceil(meter_value/1000,1) as kilometer_value
The solution I thought of to fix the edge case in line three is to just add 1 meter always:
ceil((meter_value+1)/1000,1) as kilometer_value
It seems a bit clumsy, is there a better way/alternative function to archive this?
You can check to see if it's divisible by 100 and only add one if it is:
ceil(((meter_value + iff(meter_value % 100 = 0, 1, 0))/1000), 1)
This will prevent situations where (if decimal parts are allowed) adding 1 to a value of 999.5 would not be accurate if adding one all the time.
Greg's answer is good, simpler to read to me would be to
divide by 100
floor
add 1
ceil
divide by 10
select
column1 as meter_value
,ceil(((meter_value + iff(meter_value % 100 = 0, 1, 0))/1000), 1) as greg
,ceil(floor(meter_value/100)+1)/10 as simeon
from values
(1111)
,(111)
,(1000)
,(1)
,(0)
;
METER_VALUE
GREG
SIMEON
1,111
1.2
1.2
111
0.2
0.2
1,000
1.1
1.1
1
0.1
0.1
0
0.1
0.1
do we want to mention negative values? I mean it distance, so it's a directionless magnitude, right?
anyway with negative value, both our methods the +1 forces the boundary case to be wrong.
Actually:
Once you have floored adding the 1 or 0.1 if you divide by 1000 vs 100 first, you don't need to ceil at all
thus two short forms can be:
,ceil(floor(meter_value/100)+1)/10 as version_a
,(floor(meter_value/100)+1)/10 as version_b
,floor(meter_value/1000,1)+0.1 as version_c
I need to calculate some measures on a window of my dataframe, with the value of interest in the centre of the window. To be more clear I use an example: if I have a dataset of 10 rows and a window size of 2, when I am in the 5th row I need to compute for example the mean of the values in 3rd, 4th, 5th, 6th and 7th row. When I am in the first row, I will not have the previous rows so I need to use only the following ones (so in the example, to compute the mean of 1st, 2nd and 3rd rows); if there are some rows but not enough, I need to use all the rows that are present (so fpr example if I am in the 2nd row, I will use 1st, 2nd, 3rd and 4th). How can I do that? As the title of my question suggest, the first idea I had was to count the number of rows preceding and following the current one, but I don't know how to do that. I am not forced to use this method, so if you have any suggestions on a better method feel free to share it.
What you want is a rolling mean with min_periods=1, center=True:
df = pd.DataFrame({'col': range(10)})
N = 2 # numbers of rows before/after to include
df['rolling_mean'] = df['col'].rolling(2*N+1, min_periods=1, center=True).mean()
output:
col rolling_mean
0 0 1.0
1 1 1.5
2 2 2.0
3 3 3.0
4 4 4.0
5 5 5.0
6 6 6.0
7 7 7.0
8 8 7.5
9 9 8.0
I assume that you have the target_row and window_size numbers as an input. You are trying to do an operation on a window_size of rows around the target_row in a dataframe df, and I gather from your question that you already know that you can't just grab +/- the window size, because it might exceed the size of the dataframe. Instead, just quickly define the resulting start and end rows based on the dataframe size, and then pull out the window you want:
start_row = max(target_row - window_size, 0)
end_row = min(target_row + window_size, len(df)-1)
window = df.iloc[start_row:end_row+1,:]
Then you can perform whatever operation you want on the window such as taking an average with window.mean().
I've been learning TSQL and need some help with a conversion CPU MIPS into PERCENTAGE.
I've built my code to get some data that I'm expecting. In addition to this, I want to add a column to my code which is to get the CPU%. I have a column that gives me TOTALCPU MIPS and want to use this in the code but in the form of percentage. Example, I have these values in my TOTAL CPU Column:
1623453.66897
0
0
2148441.01573933
3048946.946314
I want to convert these values into percentage and use them. I couldn't find much info on the internet.
Appreciate your response.
I assume that you have 5 numeric quantities (2 of them being zero) and you want to find the percentage that corresponds to each of them out of the addition of the five quantities. Is it so?
To find the percentage of a particular number in the addition you multiply the number by 100 and divide by the addition, the result is the percentage that that number is in relation with the addition.
The sum: 6820841.631023
The percentage of the first number (of MIPS):
1623453.668970 * 100 / 6820841.631023 = 23.80136876 =>
23.80136876% is the percentage of CPU used by the first program.
To give the answer some SQL looking, refering to Mips_Table as the view/table that contains the MIPs data:
select mips, mips/totMips*100 Pct_CPU
from Mips_Table,
(select sum(mips) TotMips from Mips_Table) k
I have data like this
EmployeeID Value
1 7
2 6
3 5
4 3
I would like to create a DAX calculated column (or do I need a measure?) that gives me for each row, Value - AVG() of selected rows.
So if the AVG() of the above 4 rows is 5.25, I would get results like this
EmployeeID Value Diff
1 7 1.75
2 6 0.75
3 5 -0.25
4 3 -1.75
Still learning DAX, I cannot figure out how to implement this?
Thanks
I figured this out with the help of some folks on MSDN forums.
This will only work as a measure because measures are selection aware while calculated columns are not.
The Average stored in a variable is critical. ALLSELECTED() gives you the current selection in a pivot table.
AVERAGEX does the row value - avg of selection.
Diff:=
Var ptAVG = CALCULATE(AVERAGE[Value],ALLSELECTED())
RETURN AVERAGEX(Employee, Value - ptAVG)
You can certainly do this with a calculated column. It's simply
Diff = TableName[Value] - AVERAGE(TableName[Value])
Note that this averages over all employees. If you want to average over only specific groups, then more work needs to be done.
I have a cube built on a fact which, amongst others, includes the Balance and Percentage columns. I have a calculation which multiplies the Balance by the Percentage to obtain an Adjusted Value. I now need to have this Adjusted Value divided by the sum of all balances, to get weighted values.
The problem is that this sum of all balances doesn't apply to the whole dataset. Rather, it should be calculated on a filtered subset of the whole data. This filtering is being done in Excel using a pivot table, so i do not know what conditions will be used to filter.
So, for example, this would be the pivot i'd like to see:
ID Balance Percentage Adjusted Value Weighted Adjusted Value
1 100 1.5 115 0.38 (ie 115/300)
2 50 2 51 0.17 (ie 51/300)
3 150 1 150 0.50 (ie 150/300)
300 is obtained by summing the balance of the rows that show in the filtered pivot.
Can this calculation be somehow done in OLAP? Or is it impossible to compute this sum with what i know?
Yes should be possible; e.g., assuming 1/2/3 are the children of a common parent, then the following calculated measure should do the trick :
WAV = AV / ( id.parent, Balance )
If not we would need more information about the actual data model and query.