Why does the max function aggregate? - sap

I want to return the maximum value in a column in a block with values for 13 periods.
Max([CountVariable]) returns the value against each period.
Max([CountVariable] forAll([Period)) returns the sum of all values.
This is what I am getting:
Period CountVariable Max([CountVariable]) Max([CountVariable] forAll([Period))
1 10 10 45
2 15 15 45
3 20 20 45
This is what I'd like:
Period CountVariable Max
1 10 20
2 15 20
3 20 20

You're close. To get the maximum for all periods, you need to set the variable to evaluate in the output context. You do this by specifying the context operator (ForAll) outside of the Max() function. So:
Max([CountVariable]) forAll([Period])

Related

Group rows using the cumulative sum of a third column

I have a table with two columns:
sort_column = A column I use for sorting
value_column = My metric of interest (a positive integer)
Using SQL, I need to create contiguous groups of rows, ordered by sort_column, such that the sum of value_column within each group is the largest possible but staying below 100 (100 not included).
Find below an example of my desired result.
Thanks
sort_column
value_column
desired_result
1
53
1
2
25
1
3
33
2
4
25
2
5
10
2
6
46
3
7
9
3
8
49
4
9
48
4
10
53
5
11
33
5
12
52
6
13
29
6
14
16
6
15
66
7
16
1
7
17
62
8
18
57
9
19
47
10
20
12
10
Ok, so after a few lengthy attempts, I came to the conclusion the task is impossible with pure SQL, because a given value of the desired column depends on previous values of that same column, in a way that cannot be obtained from the first two columns alone, so the problem is impossible to tackle without using a recursive CTE, which BigQuery does not support.
I solved the issue by writing a javascript UDF for the task. It seems to be working fine and produces the expected results.
Many thanks everyone!

Why does Pandas df.mode() return a zero before the actual modal value?

When I run df.mode() on the below dataframe I get a leading zero before the expected output. Why is that?
df
sample 1 2 3 4 5 6 7 8 9 10
zone run
2 5 14 12 22 23 24 22 23 22 23 23
print(df.iloc[:,3:10].mode(axis=1)))
gives
0
zone run
2 5 23
expecting
zone run
2 5 23
pd.Series.mode
Return the mode(s) of the dataset. Always returns Series even if only one value is returned.
So that's how it is by design. A Series must have an index and it will start counting from 0. This ensures that the return type is stable regardless of whether there is only a single mode or multiple values tied for the mode.
So if you take a slice where values are tied for the mode, your return is a Series where the numbers 0, ...N are indicators for the N values tied for the mode (modal values in sorted order).
df.iloc[:, 4:7]
#sample 5 6 7
#zone run
#2 5 24 22 23
df.iloc[:,4:7].mode(axis=1)
# 0 1 2 # <- 3 values tied for mode so 3 labels
#zone run
#2 5 22 23 24
My thinking is, df.mode returns a dataframe. By default, dataframes if no column values are given allocates indices as column names. In this case,0 is allocated because that is how pandas/python begins count.
Because it is a dataframe, the only way to change the column name which in this case is an index is to apply the .rename(columnn) method. Hence, to get what you need you will have to;
df1.iloc[:,3:10].agg('mode', axis=1).reset_index().rename(columns={0:''})
zone run
0 2 5 23

SQL - Select rows after reaching minimum value/threshold

Using Sql Server Mgmt Studio. My data set is as below.
ID Days Value Threshold
A 1 10 30
A 2 20 30
A 3 34 30
A 4 25 30
A 5 20 30
B 1 5 15
B 2 10 15
B 3 12 15
B 4 17 15
B 5 20 15
I want to run a query so only rows after the threshold has been reached are selected for each ID. Also, I want to create a new days column starting at 1 from where the rows are selected. The expected output for the above dataset will look like
ID Days Value Threshold NewDayColumn
A 3 34 30 1
A 4 25 30 2
A 5 20 30 3
B 4 17 15 1
B 5 20 15 2
It doesn't matter if the data goes below the threshold for the latter rows, I want to take the first row when threshold is crossed as 1 and continue counting rows for the ID.
Thank you!
You can use window functions for this. Here is one method:
select t.*, row_number() over (partition by id order by days) as newDayColumn
from (select t.*,
min(case when value > threshold then days end) over (partition by id) as threshold_days
from t
) t
where days >= threshold_days;

Select every ten steps SQL

I have the following table:
----------------------------------------------
oNumber oValue1
----------------------------------------------
1 54
2 44
3 89
4 65
ff.
10 33
11 22
ff.
20 43
21 76
ff.
100 45
I want to select every 10 value in oNumber. So the result should be:
----------------------------------------------
oNumber oValue1
----------------------------------------------
10 33
20 43
ff.
100 45
Also, oNumber is not a sequence number. It's just a value. Even it isn't a sequence number, 10, 20, 30 and so on will always appear under oNumber field.
Does anyone know how is the tsql for this case?
Thank you.
select * from table where oNumber % 10 = 0
https://msdn.microsoft.com/en-us/library/ms190279.aspx
Use the "Modulo" operator - %. So in this case, the answer would be something like:
SELECT * FROM table WHERE oNumber % 10 = 0
This will only load if oNumber is a number divisible by ten (and therefore has a remainder zero).
In the case you simply want multiples of 10, then just use the modulo operator as stated by Daniel and Ian.
select *
from table
where oNumber % 10 = 0;
However, I felt that you could be alluding to the fact that you want to get every 10th item in your list. If that's the case, which it may be not, you would simply just sequence your set based on oNumber and use the modulo operator.
select *
from (
select *,
RowNum = row_number() over (order by oNumber)
from table) a
where RowNum % 10 = 0;

PowerPivot formula for row wise weighted average

I have a table in PowerPivot which contains the logged data of a traffic control camera mounted on a road. This table is filled the velocity and the number of vehicles that pass this camera during a specific time(e.g. 14:10 - 15:25). Now I want to know that how can I get the average velocity of cars for an specific hour and list them in a separate table with 24 rows(hour 0 - 23) where the second column of each row is the weighted average velocity of that hour? A sample of my stat_table data is given below:
count vel hour
----- --- ----
133 96.00237 15
117 91.45705 21
81 81.90521 6
2 84.29946 21
4 77.7841 18
1 140.8766 17
2 56.14951 14
6 71.72839 13
4 64.14309 9
1 60.949 17
1 77.00728 21
133 100.3956 6
109 100.8567 15
54 86.6369 9
1 83.96901 17
10 114.6556 21
6 85.39127 18
1 76.77993 15
3 113.3561 2
3 94.48055 2
In a separate PowerPivot table I have 24 rows and 2 columns but when I enter my formula, the whole rows get updated with the same number. My formula is:
=sumX(FILTER(stat_table, stat_table[hour]=[hour]), stat_table[count] * stat_table[vel])/sumX(FILTER(stat_table, stat_table[hour]=[hour]), stat_table[count])
Create a new calculated column named "WeightedVelocity" as follows
WeightedVelocity = [count]*[vel]
Create a measure "WeightedAverage" as follows
WeightedAverage = sum(stat_table[WeightedVelocity]) / sum(stat_table[count])
Use measure "WeightedAverage" in VALUES area of pivot Table and use "hour" column in ROWS to get desired result.