How to (idiomatically) use pandas .loc to return an empty dataframe when key is not in index [duplicate] - pandas

This question already has answers here:
Pandas .loc without KeyError
(6 answers)
Closed 2 years ago.
Say I have a DataFrame (with a multi-index, for that matter), and I wish to take the values at some index - but, if that index does not exist, I wish for it to return an empty df instead of a KeyError.
I've searched for similar questions, but they are all about pandas returning an empty dataframe when it is not desired at some cases (conversely, I do desire an empty dataframe in return).
For example:
import pandas as pd
df = pd.DataFrame(index=pd.MultiIndex.from_tuples([(1,1),(1,2),(3,1)]),
columns=['a','b'], data=[[1,2],[3,4],[10,20]])
so, df is:
a b
1 1 1 2
2 3 4
3 1 10 20
and df.loc[1] is:
a b
1 1 2
2 3 4
df.loc[2] raises a KeyError, and I'd like something that returns
a b
The closest I could get is by calling df.loc[idx:idx] as a slice, which gives the correct result for idx=2, but for idx=1 it returns
a b
1 1 1 2
2 3 4
instead of the desires result.
Of course I can define a function to do it,

One idea with if-else statament:
def get_val(x):
return df.loc[x] if x in df.index.levels[0] else pd.DataFrame(columns=df.columns)
Or generally with try-except statement:
def get_val(x):
try:
return df.loc[x]
except KeyError:
return pd.DataFrame(columns=df.columns)
print (get_val(1))
a b
1 1 2
2 3 4
print (get_val(2))
Empty DataFrame
Columns: [a, b]
Index: []

Related

GroupBy-Apply even for empty DataFrame

I am using groupby-apply to create new DataFrame from given Data Frame. But if given DataFrame is empty result would look like given DataFrame with group keys not like target new DataFrame. So to get look of target new DataFrame I have to use if..else with length check and if given DataFrame is empty then manually create DataFrame with specified columns and indexes.
It is kinda broken flow of code. Also if in future structure of target DataFrame happen to change I would have to fix code in two places instead of one.
Is there a way to get look of a target DataFrame even if given DataFrame is empty with GroupBy only (or without if..else)?
Simplified example:
def some_func(df: pd.DataFrame):
return df.values.sum() + pd.DataFrame([[1,1,1], [2,2,2], [3,3,3]], columns=['new_col1', 'new_col2', 'new_col3'])
df1 = pd.DataFrame([[1,1], [1,2], [2,1], [2,2]], columns=['col1', 'col2'])
df2 = pd.DataFrame(columns=['col1', 'col2'])
df1_grouped = df1.groupby(['col1'], group_keys=False).apply(lambda df: some_func(df))
df2_grouped = df2.groupby(['col1'], group_keys=False).apply(lambda df: some_func(df))
Result for df1 is ok:
new_col1 new_col2 new_col3
0 6 6 6
1 7 7 7
2 8 8 8
0 8 8 8
1 9 9 9
2 10 10 10
And not ok for df2:
Empty DataFrame
Columns: [col1, col2]
Index: []
If..else to get expected result for df2:
df = df2
if df.empty:
df_grouped = pd.DataFrame(columns=['new_col1', 'new_col2', 'new_col3'])
else:
df_grouped = df.groupby(['col1'], group_keys=False).apply(lambda df: some_func(df))
Gives what I need:
Empty DataFrame
Columns: [new_col1, new_col2, new_col3]
Index: []

Remove row having any value 0 [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 3 years ago.
I have a dataframe having two columns "Service" and "Value". I want to remove all the rows having "0" under value column.
service value
abc 10
def 0
ghi 0
xyz 5
I want my dataframe looks like
service value
abc 10
xyz 5
I tried the following
df = df[(df != 0).all(1)]
df = pd.DataFrame(list(result.items()),columns=['service', 'value'])
df = df[(df != 0).all(1)]
For small Dataframe having 6-7 rows it's working fine but in another Dataframe having 125 rows I am getting the following error.
Illegal instruction
PS: I checked all the values under "value" column and these are numbers.
You can use the drop function combined with a condition :
df = pd.DataFrame(
{'service': ['abc', 'def', 'ghi', 'xyz'],
'value': [10,0,0,5]})
df.drop(df[df.value==0].index)
Out :
service value
0 abc 10
3 xyz 5

Apply diffs down columns of pandas dataframe [duplicate]

This question already has answers here:
How to replace NaNs by preceding or next values in pandas DataFrame?
(10 answers)
Closed 3 years ago.
I want to apply diffs down columns for a pandas dataframe.
EX:
A B C
23 40000 1
24 nan nan
nan 42000 2
I would want something like:
A B C
23 40000 1
24 40000 1
24 42000 2
I have tried variations of pandas groupby. I think this is probably the right approach. (or applying some function down columns, but not sure if this is efficient correct me if i'm wrong)
I was able to "apply diffs down the column" and get something like:
A B C
24 42000 2
by calling: df = df.groupby('col', as_index=False).last() for each column, but this is not what I am looking for. I am not a pandas expert so apologies if this is a silly question.
Explained above
Look at this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
df = df.fillna(method='ffill')

How to create new pandas column by vlookup-like procedure on another data-frame

I have a dataframe that looks like this. It will be used to map values using two categorical variables. Maybe converting this to a dictionary would be better.
The 2nd data-frame is very large with a screenshot shown below. I want to take the values from the categorical variables to create a new attribute (column) based on the 1st data-frame.
For example...
A row with FICO_cat of (700,720] and OrigLTV_cat of (75,80] would receive a value of 5.
A row with FICO_cat of (700,720] and OrigLTV_cat of (85,90] would receive a value of 6.
Is there an efficient way to do this?
If your column labels are the FICO_cat values, and your Index is OrigLTV_cat, this should work:
Given a dataframe df:
780+ (740,780) (720,740)
(60,70) 3 3 3
(70,75) 4 5 4
(75,80) 3 1 2
Do:
df = df.unstack().reset_index()
df.rename(columns = {'level_0' : 'FICOCat', 'level_1' : 'OrigLTV', 0 : 'value'}, inplace = True)
Output:
FICOCat OrigLTV value
0 780+ (60,70) 3
1 780+ (70,75) 4
2 780+ (75,80) 3
3 (740,780) (60,70) 3
4 (740,780) (70,75) 5
5 (740,780) (75,80) 1
6 (720,740) (60,70) 3
7 (720,740) (70,75) 4
8 (720,740) (75,80) 2

grouping by column and then doing a boxplot by the index in pandas

I have a large dataframe which I would like to group by some column and examine graphically the distribution per group using a boxplot. I found that df.boxplot() will do it for each column of the dataframe and put it in one plot, just as I need.
The problem is that after a groupby operation, my data is all in one column with the group labels in the index , so i can't call boxplot on the result.
here is an example:
df = DataFrame({'a':rand(10),'b':[x%2 for x in range(10)]})
df
a b
0 0.273548 0
1 0.378765 1
2 0.190848 0
3 0.646606 1
4 0.562591 0
5 0.409250 1
6 0.637074 0
7 0.946864 1
8 0.203656 0
9 0.276929 1
Now I want to group by column b and boxplot the distribution of both groups in one boxplot. How can I do that?
You can use the by argument of boxplot. Is that what you are looking for?
df.boxplot(column='a', by='b')