Trying to create the new column in my dataframe based on the below condition:
dataFrame01['final'] = dataFrame01.apply(lambda x: x['Name'] if x['Eval'] == 'NAN' else x['Eval'], axis=1)
but every time only ELSE block is getting executed I mean values from else condition as getting populated but not from IF conditions.
Please help and let me know what mistake I am doing here.
Hard to say without seeing the data. It appears as though the below expression is not getting evaluated.
x['Eval'] == 'NAN'
As a hunch, check that you are specifying your NaN correctly. In Pandas, missing values are typically specified as np.nan. One way to evaluate missing values in Pandas is with pd.isnull(). Thus, the code would look something like this:
dataFrame01['final'] = dataFrame01.apply(lambda x: x['Name'] if pd.isnull(x['Eval']) else x['Eval'], axis=1)
Related
I've scraped a PDF table and it came with an annoying formatting feature.
The table has two columns. In some cases, one row stayed with what should be the column A value and the next stayed with what should be the column B value. Like this:
df = pd.DataFrame()
df['names'] = ['John','Mary',np.nan,'George']
df['numbers'] = ['1',np.nan,'2','3']
I want to reformat that database so wherever there is an empty cell on df['numbers'] it fills it with the value of the next line. Then I apply .dropna() to eliminate the still-wrong cells.
I thied this:
for i in range(len(df)):
if df['numbers'][i] == np.nan:
df['numbers'][i] = df['numbers'][i+1]
No change on the dataframe, though. No error message, too.
What am I missing?
While I don't think this solves all your problems, the reason why you are not updating the dataframe is the line
if df['numbers'][i] == np.nan: , since this always evaluates to False.
To implement a vlaid test for nan in this case you must use
if pd.isnull(df['numbres'][i]): this will evaluate to True or False depending on the cell contents.
This is the solution I found:
df[['numbers']] = df[['numbers']].fillna(method='bfill')
df = df[~df['names'].isna()]
It's probably not the most elegant, but it worked.
I have a dataframe with one column of unequal list which I want to spilt into multiple columns (the item value will be the column names). An example is given below
I have done through iterrows, iterating thruough the rows and examine the list from each rows. It seem workable as my dataframe has few rows. However, I wonder if there is any clean methods
I have done through additional_df = pd.DataFrame(venue_df.location.values.tolist())
However the list break down into as below
thanks fro your help
Can you try this code: built assuming venue_df.location contains the list you have shown in the cells.
venue_df['school'] = venue_df.location.apply(lambda x: ('school' in x)+0)
venue_df['office'] = venue_df.location.apply(lambda x: ('office' in x)+0)
venue_df['home'] = venue_df.location.apply(lambda x: ('home' in x)+0)
venue_df['public_area'] = venue_df.location.apply(lambda x: ('public_area' in x)+0)
Hope this helps!
First lets explode your location column, so we can get your wanted end result.
s=df['Location'].explode()
Then lets use crosstab in that series so we can get your end result
import pandas as pd
pd.crosstab(s).unstack()
I didnt test it out cause i dont know you base_df
I am new to Pandas. Sorry for using images instead of tables here; I tried to follow the instructions for inserting a table, but I couldn't.
Pandas version: '1.3.2'
Given this dataframe with Close and Volume for stocks, I've managed to calculate OBV, using pandas, like this:
df.groupby('Ticker').apply(lambda x: (np.sign(x['Close'].diff().fillna(0)) * x['Volume']).cumsum())
The above gave me the correct values for OBV as
shown here.
However, I'm not able to assign the calculated values to a new column.
I would like to do something like this:
df['OBV'] = df.groupby('Ticker').apply(lambda x: (np.sign(x['Close'].diff().fillna(0)) * x['Volume']).cumsum())
But simply doing the expression above of course will throw us the error:
ValueError: Columns must be same length as key
What am I missing?
How can I insert the calculated values into the original dataframe as a single column, df['OBV'] ?
I've checked this thread so I'm sure I should use apply.
This discussion looked promising, but it is not for my case
Use Series.droplevel for remove first level of MultiIndex:
df['OBV'] = df.groupby('Ticker').apply(lambda x: (np.sign(x['Close'].diff().fillna(0)) * x['Volume']).cumsum()).droplevel(0)
Given a pandas DataFrame (df), where one column (unique_val_col) should have a unique value, what is the best the best way to extract this value (not as a list)?
So far I've used the following code:
output = list(set(df[unique_val_col)))
if len(output)==1: output = output[0]
Or if there is a chance for nans then change the first line to be:
output = [val for val in list(set(df[unique_val_col))) if val == val]
The question is whether there is a more direct way, that would also reflect the fact that the column actually has only one value without needing the 'if' statement.
I think you are trying to find a value that occurs only once, if that's so you could achieve it like this
df['unique_value_counts'].value_counts().sort_values(ascending=False).keys()[0]
I am attempting to dynamically create a new column based on the values of another column.
Say I have the following dataframe
A|B
11|1
22|0
33|1
44|1
55|0
I want to create a new column.
If the value of column B is 1, insert 'Y' else insert 'N'.
The resulting dataframe should looks like so:
A|B|C
11|1|Y
22|0|N
33|1|Y
44|1|Y
55|0|N
I could do this by iterating through the column values,
for i in dataframe['B'].values:
if i==1:
add Y to Series
else:
add N to Series
dataframe['C'] = Series
However I am afraid this will severely reduce performance especially since my dataset contains 500,000+ rows.
Any help will be greatly appreciated.
Thank you.
Avoid chained indexing by using loc. There are some subtleties with returning a view versus a copy in pandas that are related to numpy
df['C'] = 'N'
df.loc[df.B == 1, 'C'] = 'Y'
Try this:
df['C'] = 'N'
df['C'][df['B']==1] = 'Y'
should be faster.