im using a calculated member to subtract special values from measures of special attributes.
In pseudo code it shoud work like this:
CASE Attribute
WHEN (Attribute = A) THEN Measure_of_A - 14
WHEN (Attribute = B) THEN Measure_of_B - 2
ELSE 5
I tried this in MDX within a calculated Member
CASE [D Software].[Dim Software].CURRENTMEMBER
WHEN [D Software].[Dim Software].[Feature Desc].&[A]&[AAA]
THEN ([D Software].[Dim Software].[Feature Desc].&[A]&[AAA],[Measures].[Status6]) - 14
WHEN [D Software].[Dim Software].[Feature Desc].&[B]&[BBB]
THEN ( [D Software].[Dim Software].[Feature Desc].&[BBB]&[B],[Measures].[Status6]) - 2
ELSE '5'
END
IT works fine, however the aggregated value of total is always wrong.
There I get a result looking like this:
Attribute Value
AA -14
Total_AA 5
BB -2
Total_BB 5
Grand_Total 5
Has someone an advice for me ? Where is my failure ? Why is there no correct aggregation of the values ?
Calculation of calculated member is executed after cube do all the aggregation.
You need to define aggregation explicitly in the calculation for total e.g.
CASE [D Software].[Dim Software].CURRENTMEMBER
WHEN [D Software].[Dim Software].[Feature Desc].&[A]&[AAA]
THEN ([D Software].[Dim Software].[Feature Desc].&[A]&[AAA],[Measures].[Status6]) - 14
WHEN [D Software].[Dim Software].[Feature Desc].&[B]&[BBB]
THEN ( [D Software].[Dim Software].[Feature Desc].&[BBB]&[B],[Measures].[Status6]) - 2
WHEN [D Software].[Dim Software].[Feature Desc].&[A]
THEN sum( [D Software].[Dim Software].[Feature Desc].&[A].children,[Measures].[Status6])
WHEN [D Software].[Dim Software].[Feature Desc].&[B]
THEN sum( [D Software].[Dim Software].[Feature Desc].&[B].children,[Measures].[Status6])
WHEN [D Software].[Dim Software].[Feature Desc].[All]
THEN sum( [D Software].[Dim Software].[Feature Desc].[All].children,[Measures].[Status6])
ELSE '5'
END
Writing aggregation logic in MDX calculated member this way is not maintable. You should use scope to handle this kind of logic.
Related
I am working with datetime. Is there anyway to get a value of n months before.
For example, the data look like:
dft = pd.DataFrame(
np.random.randn(100, 1),
columns=["A"],
index=pd.date_range("20130101", periods=100, freq="M"),
)
dft
Then:
For every Jul of each year, we take value of December in previous year and apply it to June next year
For other month left (from Aug this year to June next year), we take value of previous month
For example: that value from Jul-2000 to June-2001 will be the same and equal to value of Dec-1999.
What I've been trying to do is:
dft['B'] = np.where(dft.index.month == 7,
dft['A'].shift(7, freq='M') ,
dft['A'].shift(1, freq='M'))
However, the result is simply a copy of column A. I don't know why. But when I tried for single line of code :
dft['C'] = dft['A'].shift(7, freq='M')
then everything is shifted as expected. I don't know what is the issue here
The issue is index alignment. This shift that you performed acts on the index, but using numpy.where you convert to arrays and lose the index.
Use pandas' where or mask instead, everything will remain as Series and the index will be preserved:
dft['B'] = (dft['A'].shift(1, freq='M')
.mask(dft.index.month == 7, dft['A'].shift(7, freq='M'))
)
output:
A B
2013-01-31 -2.202668 NaN
2013-02-28 0.878792 -2.202668
2013-03-31 -0.982540 0.878792
2013-04-30 0.119029 -0.982540
2013-05-31 -0.119644 0.119029
2013-06-30 -1.038124 -0.119644
2013-07-31 0.177794 -1.038124
2013-08-31 0.206593 -2.202668 <- correct
2013-09-30 0.188426 0.206593
2013-10-31 0.764086 0.188426
... ... ...
2020-12-31 1.382249 -1.413214
2021-01-31 -0.303696 1.382249
2021-02-28 -1.622287 -0.303696
2021-03-31 -0.763898 -1.622287
2021-04-30 0.420844 -0.763898
[100 rows x 2 columns]
In Norway we have something called D- and S-numbers. These are National identification number where the day or month of birth are modified.
D-number
[d+4]dmmyy
S-number
dd[m+5]myy
I have a column with dates, some of them normal (ddmmyy) and some of them are formatted as D- or S-numbers. Leading zeroes are also missing.
df = pd.DataFrame({'dates': [241290, #24.12.90
710586, #31.05.86
105299, #10.02.99
56187] #05.11.87
})
dates
0 241290
1 710586
2 105299
3 56187
I've written this function to add leading zero and convert the dates, but this solution doesn't feel that great.
def func(s):
s = s.astype(str)
res = []
for index, value in s.items():
# Make sure all dates have 6 digits (add leading zero)
if len(value) == 5:
value = ('0' + value)
# Convert S- and D-dates to regular dates
if int(value[0]) > 3:
# substract 4 from the first digit
res.append(str(int(value[0]) - 4) + value[1:])
elif int(value[2]) > 1:
# subtract 5 from the third digit
res.append(value[:2] + str(int(value[2]) - 5) + value[3:])
else:
res.append(value)
return pd.Series(res)
Is there a smoother and faster way of accomplishing the same result?
Normalize dates by padding with 0 then explode into 3 columns of two digits (day, month, year). Apply your rules and combine columns to a DateTimeIndex:
# Suggested by #HenryEcker
# Changed: .pad(6, fillchar='0') to .zfill(6)
dates = df['dates'].astype(str).str.zfill(6).str.findall('(\d{2})') \
.apply(pd.Series).astype(int) \
.rename(columns={0: 'day', 1: 'month', 2: 'year'}) \
.agg({'day': lambda d: d if d <= 31 else d - 40,
'month': lambda m: m if m <= 12 else m - 50,
'year': lambda y: 1900 + y})
df['dates2'] = pd.to_datetime(dates)
Output:
>>> df
dates dates2
0 241290 1990-12-24
1 710586 1986-05-31
2 105299 1999-02-10
3 56187 1987-11-05
>>> dates
day month year
0 24 12 1990
1 31 5 1986
2 10 2 1999
3 5 11 1987
You can keep the Series as integers until the final step. The disadvantage of the method below is that the offsets do not match what the specifications say and may take more mental power to comprehend:
def func2(s):
# In mathematical operations, digits are counted from right
# so "first digit" becomes sixth and "third digit" becomes
# fourth in a 6-digit number
delta = np.select(
[s // 10**5 % 10 > 3, s // 10**3 % 10 > 1],
[4 * 10**5 , 5 * 10**3 ],
0
)
return (s - delta).astype('str').str.pad(6, fillchar='0')
I have this CSV file
http://www.sharecsv.com/s/2503dd7fb735a773b8edfc968c6ae906/whatt2.csv
I want to create three columns, 'MT_Value','M_Value', and 'T_Data', one who has the mean of the data grouped by year and month, which I accomplished by doing this.
data.groupby(['Year','Month']).mean()
But for M_value I need to do the mean of only the values different from zero, and for T_Data I need the count of the values that are zero divided by the total of values, I guess that for the last one I need to divide the amount of values that are zero by the amount of total data grouped, but honestly I am a bit lost. I looked on google and they say something about transform but I didn't understood very well
Thank you.
You could do something like this:
(data.assign(M_Value=data.Valor.where(data.Valor!=0),
T_Data=data.Valor.eq(0))
.groupby(['Year','Month'])
[['Valor','M_Value','T_Data']]
.mean()
)
Explanation: assign will create new columns with respective names. Now
data.Valor.where(data.Valor!=0) will replace 0 values with nan, which will be ignored when we call mean().
data.Valor.eq(0) will replace 0 with 1 and other values with 0. So when you do mean(), you compute count(Valor==0)/total_count().
Output:
Valor M_Value T_Data
Year Month
1970 1 2.306452 6.500000 0.645161
2 1.507143 4.688889 0.678571
3 2.064516 7.111111 0.709677
4 11.816667 13.634615 0.133333
5 7.974194 11.236364 0.290323
... ... ... ...
1997 10 3.745161 7.740000 0.516129
11 11.626667 21.800000 0.466667
12 0.564516 4.375000 0.870968
1998 1 2.000000 15.500000 0.870968
2 1.545455 5.666667 0.727273
[331 rows x 3 columns]
A dummy dataset is :
data <- data.frame(
group = c(1,1,1,1,1,2),
dates = as.Date(c("2005-01-01", "2006-05-01", "2007-05-01","2004-08-01",
"2005-03-01","2010-02-01")),
value = c(10,20,NA,40,NA,5)
)
For each group, the missing values need to be filled with the non-missing value corresponding to the nearest date within same group. In case of a tie, pick any.
I am using dplyr. which.closest from birk but it needs a vector and a value. How to look up within a vector without writing loops. Even if there is an SQL solution, will do.
Any pointers to the solution?
May be something like: value = value[match(which.closest(dates,THISdate) & !is.na(value))]
Not sure how to specify Thisdate.
Edit: The expected value vector should look like:
value = c(10,20,20,40,10,5)
Using knn1 (nearest neighbor) from the class package (which comes with R -- don't need to install it) and dplyr define an na.knn1 function which replaces each NA value in x with the non-NA x value having the closest time.
library(class)
na.knn1 <- function(x, time) {
is_na <- is.na(x)
if (sum(is_na) == 0 || all(is_na)) return(x)
train <- matrix(time[!is_na])
test <- matrix(time[is_na])
cl <- x[!is_na]
x[is_na] <- as.numeric(as.character(knn1(train, test, cl)))
x
}
data %>% mutate(value = na.knn1(value, dates))
giving:
group dates value
1 1 2005-01-01 10
2 1 2006-05-01 20
3 1 2007-05-01 20
4 1 2004-08-01 40
5 1 2005-03-01 10
6 2 2010-02-01 5
Add an appropriate group_by if the intention was to do this by group.
You can try the use of sapply to find the values closest since the x argument in `which.closest only takes a single value.
first create a vect whereby the dates with no values are replaced with NA and use it within the which.closest function.
library(birk)
vect=replace(data$dates,which(is.na(data$value)),NA)
transform(data,value=value[sapply(dates,which.closest,vec=vect)])
group dates value
1 1 2005-01-01 10
2 1 2006-05-01 20
3 1 2007-05-01 20
4 1 2004-08-01 40
5 1 2005-03-01 10
6 2 2010-02-01 5
if which.closest was to take a vector then there would be no need of sapply. But this is not the case.
Using the dplyr package:
library(birk)
library(dplyr)
data%>%mutate(vect=`is.na<-`(dates,is.na(value)),
value=value[sapply(dates,which.closest,vect)])%>%
select(-vect)
hoping to get some help here. I have a report that shows 4 fields: current YTD sales, previous YTD sales, the difference between the 2 in dollars, and the difference between the 2 in percent. I'm running into a divide by 0 error and the value of "NaN" as the value for the percent field. I get the divide by 0 error when I have a value in the current YTD ("OrderInfoConstruction") but 0 in the previous YTD ("OrderInfoClosedConstruction"), since my expression for the % field is:
=(Sum(Fields!PRICE_EXT.Value, "OrderInfoConstruction") -
Sum(Fields!PRICE_EXT.Value, "OrderInfoClosedConstruction")) /
Sum(Fields!PRICE_EXT.Value, "OrderInfoClosedConstruction")
and the value of "Sum(Fields!PRICE_EXT.Value, "OrderInfoClosedConstruction") is 0 (the previous YTD value). For the NaN value issue, it's the same expression, but in this case, BOTH current and previous YTD's are 0. How can I have it NOT divide if the value is 0 to solve the divide by 0 error and what is a NaN and how can I have it just show "0" instead? I've found some help on this but have NO idea how to take the IIF statement below and adapt it for my statement above?
=IIf(Fields!SomeField.Value = 0, 0, Fields!SomeOtherField.Value / IIf(Fields!SomeField.Value = 0, 1, Fields!SomeField.Value))
thanks in advance for the help!!!
If you want to display 0 for both 0/0 and #/0, you just need to check the denominator value for zero. Basically IIf(PrevYTD = 0, 0, (CurrYTD - PrevYTD) / PrevYTD), or with your actual fields:
=IIf(Sum(Fields!PRICE_EXT.Value, "OrderInfoClosedConstruction") = 0, 0,
(Sum(Fields!PRICE_EXT.Value, "OrderInfoConstruction") -
Sum(Fields!PRICE_EXT.Value, "OrderInfoClosedConstruction")) /
Sum(Fields!PRICE_EXT.Value, "OrderInfoClosedConstruction"))
Also, NaN stands for not a number, and 0/0 is one operation that produces it.