In my cube I have defined a calculated measure as the sum of two other measures. How can I make this measure show up in one of my measure groups?
Within your cube script where the calculated measure is defined, like so:
CREATE MEMBER CURRENTCUBE.[Measures].[Calculated Measure]
AS ([Measures].[Measure 1] / [Measures].[Measure 2]),
FORMAT_STRING = "Percent",
NON_EMPTY_BEHAVIOR = { [Measure 1], [Measure 2] },
VISIBLE = 1 , DISPLAY_FOLDER = '\Custom Folder' , ASSOCIATED_MEASURE_GROUP = 'Custom Measure Group';
The properties relevant to your question are in bold.
Related
I have two dataframes, one is a income df and the other is a fx df. my income df shows income from different accounts on different dates but it also shows extra income in a different currency. my fx df shows the fx rates for certain currency pairs on the same date the extra income came into the accounts.
I want to convert the currency of the extra income into the same currency as the account so for example, account HP on 23/3 has extra income = 35 GBP, i want to convert that into EUR as that's the currency of the account. Please note it has to use the fx table as i have a long history of data points to fill and other accounts so i do not want to manually code 35 * the fx rate. Finally i then want to create another column for income df that will sum the daily income + extra income in the same currency together
im not sure how to bring both df together so i can get the correct fx rate for that sepecifc date to convert the currency of the extra income into the currency of the account
my code is below
import pandas as pd
income_data = {'date': ['23/3/22', '23/3/22', '24/3/22', '25/3/22'], 'account': ['HP', 'HP', 'JJ', 'JJ'],
'daily_income': [1000, 1000, 2000, 2000], 'ccy of account': ['EUR', 'EUR', 'USD', 'USD'],
'extra_income': [50, 35, 10, 12.5], 'ccy of extra_income': ['EUR', 'GBP', 'EUR', 'USD']}
income_df = pd.DataFrame(income_data)
fx_data = {'date': ['23/3/22', '23/3/22', '24/3/22', '25/3/22'], 'EUR/GBP': [0.833522, 0.833522, 0.833621, 0.833066],
'USD/EUR': [0.90874, 0.90874, 0.91006, 0.90991]}
fx_df = pd.DataFrame(fx_data)
the final df should look like this (i flipped the fx rate so 1/0.833522 to get some of the values)
Would really appreicate if someone could help me with this. my inital thpought was merge but i dont have a common column and not sure if map function would work either as i dont have a dictionary. apologies in advance if any of my code is not greate - i am still self learning, thanks!
Consider creating a common column for merging in both data frames. Below uses assign to add columns and Series operators (over arithmetic ones: +, -, *, /).
# ADD NEW COLUMN AS CONCAT OF CCY COLUMNS
income_data = income_data.assign(
currency_ratio = lambda df: df["ccy of account"] + "/" + df["ccy of extra_income"]
)
# ADD REVERSED CURRENCY RATIOS
# RESHAPE WIDE TO LONG FORMAT
fx_data_long = pd.melt(
fx_data.assign(**{
"GBP/EUR": lambda df: df["EUR/GBP"].div(-1),
"EUR/USD": lambda df: df["USD/EUR"].div(-1)
}),
id_vars = "date",
var_name = "currency_ratio",
value_name = "fx_rate"
)
# MERGE AND CALCULATE
income_data = (
income_data.merge(
fx_data_long,
on = ["date", "currency_ratio"],
how = "left"
).assign(
total_income = lambda df: df["daily_income"].add(df["extra_income"].mul(df["fx_rate"]))
)
)
I have a strange problem with calculating the weighted mean of a pandas dataframe. I want to do the following steps:
(1) calculate the weighted mean of all the data
(2) calculate the weighted mean of each group of data
The issue is when I do step 2, then the mean of groups means (weighted by the number of members in each group) is not the same as the weighted mean of all the data (step 1). Mathematically it should be (here). I even thought maybe the issue is the dtype, so I set everything on float64 but the problem still exists. Below I provided a simple example that illustrates this problem:
My dataframe has a data, a weight and group columns:
data = np.array([
0.20651903, 0.52607571, 0.60558061, 0.97468593, 0.10253621, 0.23869854,
0.82134792, 0.47035085, 0.19131938, 0.92288234
])
weights = np.array([
4.06071562, 8.82792146, 1.14019687, 2.7500913, 0.70261312, 6.27280216,
1.27908358, 7.80508994, 0.69771745, 4.15550846
])
groups = np.array([1, 1, 2, 2, 2, 2, 3, 3, 4, 4])
df = pd.DataFrame({"data": data, "weights": weights, "groups": groups})
print(df)
>>> print(df)
data weights groups
0 0.206519 4.060716 1
1 0.526076 8.827921 1
2 0.605581 1.140197 2
3 0.974686 2.750091 2
4 0.102536 0.702613 2
5 0.238699 6.272802 2
6 0.821348 1.279084 3
7 0.470351 7.805090 3
8 0.191319 0.697717 4
9 0.922882 4.155508 4
# Define a weighted mean function to apply to each group
def my_fun(x, y):
tmp = np.average(x, weights=y)
return tmp
# Mean of the population
total_mean = np.average(np.array(df["data"], dtype="float64"),
weights= np.array(df["weights"], dtype="float64"))
# Group data
group_means = df.groupby("groups").apply(lambda d: my_fun(d["data"],d["weights"]))
# number of members of each group
counts = np.array([2, 4, 2, 2],dtype="float64")
# Total mean calculated from mean of groups mean weighted by counts of each group
total_mean_from_group_means = np.average(np.array(group_means,
dtype="float64"),
weights=counts)
print(total_mean)
0.5070955626929458
print(total_mean_from_group_means)
0.5344436242465216
As you can see the total mean calculated from group means is not equal to the total mean. What I am doing wrong here?
EDIT: Fixed a typo in the code.
You compute a weighted mean within each group, so when you compute the total mean from the weighted means, the correct weight for each group is the sum of the weights within the group (and not the size of the group).
In [47]: wsums = df.groupby("groups").apply(lambda d: d["weights"].sum())
In [48]: total_mean_from_group_means = np.average(group_means, weights=wsums)
In [49]: total_mean_from_group_means
Out[49]: 0.5070955626929458
I have a list of transactions in a data frame and want to group by Symbols and take the sum of one of the columns. Additionally, I want the first instance of this column (per symbol).
My code:
local_filename= 'C:\Users\\nshah\Desktop\Naman\TEMPLATE.xlsx'
data_from_local_file = pd.read_excel(local_filename, error_bad_lines=False, sheet_name='JP_A')
data_from_local_file = data_from_local_file[['Symbol','Security Name', 'Counterparty', 'Manager', 'Rate', 'LocatedAmt']]
data_grouped = data_from_local_file.groupby(['Symbol'])
pivoted = data_grouped['LocatedAmt'].sum().reset_index()
Next I want first instance of let's say rate with same symbol.
Thank you in advance!
You can achieve the sum and first observed instance as follows:
data_grouped = data_from_local_file.groupby(['Symbol'], as_index=False).agg({'LocatedAmt':[sum, 'first']})
To accomplish this for all columns, you can pass the agg function across all columns:
all_cols = ['Symbol','Security Name', 'Counterparty', 'Manager', 'Rate', 'LocatedAmt']
data_grouped_all = data_from_local_file.groupby(['Symbol'], as_index=False)[all_cols].agg([sum, 'first'])
I have item_code column in my data and another column, sales, which represents sales quantity for the particular item.
The data can have a particular item id many times. There are other columns tell apart these entries.
I want to plot only the outlier sales for each item (because data has thousands of different item ids, plotting every entry can be difficult).
Since I'm very new to this, what is the right way and tool to do this?
you can use pandas. You should choose a method to detect outliers, but I have an example for you:
If you want to get outliers for all sales (not in groups), you can use apply with function (example - lambda function) to have outliers indexes.
import numpy as np
%matplotlib inline
df = pd.DataFrame({'item_id': [1, 1, 2, 1, 2, 1, 2],
'sales': [0, 2, 30, 3, 30, 30, 55]})
df[df.apply(lambda x: np.abs(x.sales - df.sales.mean()) / df.sales.std() > 1, 1)
].set_index('item_id').plot(style='.', color='red')
In this example we generated data sample and search indexes of points what are more then mean / std + 1 (you can try another method). And then just plot them where y is count of sales and x is item id. This method detected points 0 and 55. If you want search outliers in groups, you can group data before.
df.groupby('item_id').apply(lambda data: data.loc[
data.apply(lambda x: np.abs(x.sales - data.sales.mean()) / data.sales.std() > 1, 1)
]).set_index('item_id').plot(style='.', color='red')
In this example we have points 30 and 55, because 0 isn't outlier for group where item_id = 1, but 30 is.
Is it what you want to do? I hope it helps start with it.
I'm performing a multidimensional lookup to assign a value in a new column.
I have a table that has some historical employee data by month. There are two unique people in this example, and they can have multiple jobs within a month.
I want to create a new column that tells me if each unique person has an eligible job based on the conditions below. The challenge is each row has to be considered by month/year.
import pandas as pd
import numpy as np
data = {'Month': ["January", "January", "January", "February", "February", "February", "March", "March", "March", "March"],
'Year': [2015,2015,2015,2015,2015,2015,2016,2016,2016,2016],
'Job #': [1,1,2,1,2,1,1,1,2,3],
'Pay Group': ["Excluded","Included","Excluded","Excluded","Included","Included","Excluded","Exclcuded","Excluded","Included"],
'Name': ["John","Bill","Bill","John","John","Bill","John","Bill","Bill","Bill"]}
df = pd.DataFrame(data, columns=['Month', 'Year', 'Job #', 'Pay Group', 'Name'])
df
Eligible Jobs Conditions:
If ( Job # = 1 AND Pay Group = Include ) AND if the prior condition is false, then look for the next largest Job # within the given month/year AND Pay Group = Includes
IIUC:
You want to for each person, within each month/year, you want to grab the smallest job # such that Pay Group == Included.
Filter only those that are included. Sort by Job #. Group by year, month, and name taking the index of the minimum obseravation. Use this to assign a new column.
dfi = df[df['Pay Group'] == 'Included'].sort_values('Job #')
gc = ['Year', 'Month', 'Name']
idx = dfi.groupby(gc, as_index=False)['Job #'].idxmin()
df['Eligible Job'] = 'Not Eligible'
df.ix[idx] = 'Eligible'
df