Drop rows that hasn't float value in a column

Drop rows that hasn't float value in a column - pandas

I have this df:
My task is to find results with this conditions:
[(df.neighbourhood_group == 'Manhattan') & (df.room_type == 'Entire home/apt') & (df.price.between(150.0, 175.0))]`
But this is not working. The error message says:
TypeError: '>=' not supported between instances of 'str' and 'float'
Because in the price column I have the value Private room wrote somewhere.
How can I write a piece of code that tells to keep only float values and drop all the others?
NOTE
These are not working:
df = df[df['price'].apply(lambda x: type(x) in [float])
clean['price']=df['price'].str.replace('Private room', '0.0')
clean.price = clean.price.astype(float)
df.select_dtypes(exclude=['str'])
This is the CSV data.

One way to achive it:
df['price'] = df.apply(lambda r: r['price'] if type(x['price'])==float else np.nan, axis=1)
df.dropna(inplace=True)
In this way you will replace any non-float row with np.nan, and later remove such row.

Related

Dataframe index with isclose function

I have a dataframe with numerical values between 0 and 1. I am trying to create simple summary statistics (manually). I when using boolean I can get the index but when I try to use math.isclose the function does not work and gives an error.
For example:
import pandas as pd
df1 = pd.DataFrame({'col1':[0,.05,0.74,0.76,1], 'col2': [0,
0.05,0.5, 0.75,1], 'x1': [1,2,3,4,5], 'x2':
[5,6,7,8,9]})
result75 = df1.index[round(df1['col2'],2) == 0.75].tolist()
value75 = df1['x2'][result75]
print(value75.mean())
This will give the correct result but occasionally the value result is NAN so I tried:
result75 = df1.index[math.isclose(round(df1['col2'],2), 0.75, abs_tol = 0.011)].tolist()
value75 = df1['x2'][result75]
print(value75.mean())
This results in the following error message:
TypeError: cannot convert the series to <class 'float'>
Both are type "bool" so not sure what is going wrong here...

This works:
rows_meeting_condition = df1[(df1['col2'] > 0.74) & (df1['col2'] < 0.76)]
print(rows_meeting_condition['x2'].mean())

groupby with transform minmax

for every city , I want to create a new column which is minmax scalar of another columns (age).
I tried this an get Input contains infinity or a value too large for dtype('float64').
cols=['age']
def f(x):
scaler1=preprocessing.MinMaxScaler()
x[['age_minmax']] = scaler1.fit_transform(x[cols])
return x
df = df.groupby(['city']).apply(f)

From the comments:
df['age'].replace([np.inf, -np.inf], np.nan, inplace=True)
Or
df['age'] = df['age'].replace([np.inf, -np.inf], np.nan)

Creating new column in pandas using existing column values as filter using pandas - .isin() fails as Attribute Error

Error: AttributeError: 'int' object has no attribute 'isin'
Question: There are no null values, works in individual code block. Tried to modify the data type of series R to object, error goes : 'str' object has no attribute 'isin'
What am I missing?
Code:
X = [1, 2, 3, 4]
if dg['RFM_Segment'] == '111':
return 'Core'
elif (dg['R'].isin(X) & dg['F'].isin([1]) & dg['M'].isin(X) & (dg['RFM_Segment'] != '111')).any():
return 'Loyal'
elif (dg['R'].isin(X) & dg['F'].isin(X) & dg['M'].isin([1]) & (dg['RFM_Segment'] != '111')).any():
return 'Whales'
elif (dg['R'].isin(X) & dg['F'].isin([1]) & dg['M'].isin([3,4])).any():
return 'Promising'
elif (dg['R'].isin([1]) & dg['F'].isin([4]) & dg['M'].isin(X)).any():
return 'Rookies'
elif (dg['R'].isin([4]) & dg['F'].isin([4]) & dg['M'].isin(X)).any():
return 'Slipping'
else:
return 'NA'
dg['user_segment']= dg.apply(user_segment, axis= 1)

I will assume that you accidentally cut off the top of your code snipet, in which you define user_segment.
The issue lies in the way you tried to use apply. Note that apply will operate on Series, rather than DataFrame. So, by indexing into any element of a series, you will not receive a Series object (as you would when indexing into DataFrame), but rather a object of a given columns' type (like int, str etc.). An example:
import pandas as pd
X = ['a', 'c']
df = pd.DataFrame([['a', 'b'], ['c', 'd'], ['e', 'f']], columns=['col1', 'col2'])
df['col1'].isin(X) # this works, because I'm applying `isin` on the entire column.
def test_apply(x):
print(x['col1'].isin(X))
return x
df.apply(test_apply, axis=1) # this doesn't work,
# because I'm applying `isin` on a non-pandas object, in
# this example `str`

Pandas Data frame column condition check based on length of the value

I have pandas data frame which gets created by reading an excel file. The excel file has a column called serial number. Then I pass a serial number to another function which connect to API and fetch me the result set for those serial number.
My Code -:
def create_excel(filename):
try:
data = pd.read_excel(filename, usecols=[4,18,19,20,26,27,28],converters={'Serial Number': '{:0>32}'.format})
except Exception as e:
sys.exit("Error reading %s: %s" % (filename, e))
data["Subject Organization"].fillna("N/A",inplace= True)
df = data[data['Subject Organization'].str.contains("Fannie",case = False)]
#df['Serial Number'].apply(lamda x: '000'+x if len(x) == 29 else '00'+x if len(x) == 30 else '0'+x if len(x) == 31 else x)
print(df)
df.to_excel(r'Data.xlsx',index= False)
output = df['Serial Number'].apply(lambda x: fetch_by_ser_no(x))
df2 = pd.DataFrame(output)
df2.columns = ['Output']
df5 = pd.concat([df,df2],axis = 1)
The problem I am facing is I want to check if df5 returned by fetch_by_ser_no() is blank then make the serial number as 34 characters by adding two more leading 00 and then check the function again.
How can I do it by not creating multiple dataframe
Any help!!
Thanks

You can try to use if ... else ...:
output = df['Serial Number'].apply(lambda x: 'ok' if fetch_by_ser_no(x) else 'badly')

Dataframe loc with multiple string value conditions

Hi, given this dataframe is it possible to fetch the Number value associated with certain conditions using df.loc? This is what i came up with so far.
if df.loc[(df["Tags"]=="Brunei") & (df["Type"]=="Host"),"Number"]:
I want the output to be 1. Is this the correct way to do it?

You're in the right way, but you have to pass ".values[0]" in the end of the .loc statement to extract the only value that you got in the pandas Series.
df = pd.DataFrame({
'Tags': ['Brunei', 'China'],
'Type': ['Host', 'Address'],
'Number': [1, 1192]
}
)
display(df)
series = df.loc[(df["Tags"]=="Brunei") & (df["Type"]=="Host"),"Number"]
print(type(series))
value = df.loc[(df["Tags"]=="Brunei") & (df["Type"]=="Host"),"Number"].values[0]
print(type(value))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Drop rows that hasn't float value in a column - pandas

One way to achive it: df['price'] = df.apply(lambda r: r['price'] if type(x['price'])==float else np.nan, axis=1) df.dropna(inplace=True) In this way you will replace any non-float row with np.nan, and later remove such row.

Related

Dataframe index with isclose function

groupby with transform minmax

Creating new column in pandas using existing column values as filter using pandas - .isin() fails as Attribute Error

Pandas Data frame column condition check based on length of the value

Dataframe loc with multiple string value conditions

Categories

Resources