type error : 'float' object is not iterable - pandas

the o/p of a column of df['emp_years'] is:-
NaN
< 1 year
3 years
10+ years
10+ years
...
9 years
10+ years
1 year
3 years
3 years
Name: emp_years, Length: 10000, dtype: object
now when i try to implement this function on this column
def change(col):
for x in col:
print(x)
df['emp_years'].apply(change)
i get type error TypeError: 'float' object is not iterable
so can someone tell me how to solve this

you should consider using vectorisation.
Have a look at df.iterrows() or df.itertuples() as described in the pandas documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples

This error mostly appears when there are NaN values in the column. When you try to access/ print NaN values, this error message is given. I would advice you to clean the data a little bit. One solution is to drop your NaN values of that column. For that use df.dropna(subset=['emp_years']). I don't know whether you have changed some data types but I would advice you to do that and next time do provide more info about the dataset or provide some link to your code so that we can understand the issue better.
Happy Coding!

Related

apply function causing SettingWithCopyWarning error -? [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 7 days ago.
dataframe called condition produces the below output:
SUBJID LBSPCCND LBSPCCND_OTHER
0 0292-104 Adequate specimen
1 1749-101 Other Limited Sample
2 1733-104 Paraffin block; paraffin-embedded specimen
3 0587-102 Other Pathology Report
4 0130-101 Adequate specimen
5 0587-101 Adequate specimen
6 0609-102 Other Unacceptable
When I run the below code, I'm getting a settingwithcopywarning:
condition["LBSPCCND"] = condition["LBSPCCND"].apply(convert_condition)
condition
SUBJID LBSPCCND LBSPCCND_OTHER
0 0292-104 ADEQUATE
1 1749-101 Other Limited Sample
2 1733-104 PARAFFIN-EMBEDDED
3 0587-102 Other Pathology Report
4 0130-101 ADEQUATE
5 0587-101 ADEQUATE
6 0609-102 Other Unacceptable
This generates this error:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Copy() of my dataframe got rid of the error:
columns = ["SUBJID", "LBSPCCND", "LBSPCCND_OTHER"]
condition = igan[columns].copy()

pandas.Series.str.replace returns nan

I have some text stored in pandas.Series. for example:
df.loc[496]
'therapist and friend died in ~2006 Parental/Caregiver obligations:\n'
I need to replace the number in the text with full date, so I wrote
df.str.replace(
pat=r'(?:[^/])(\d{4}\b)',
repl= lambda m: ''.join('Jan/1/', m.groups()[0]),
regex=True
)
but the output is nan; though I tried to test the regular expression using findall, and there is no issue:
df.str.findall(r'(?:[^/])(\d{4}\b)')
496 [2006]
I don't understand what the issue is. most of the problems raised are about cases where Series type is number and not str; but my case is different the data type obviously is str. Nonetheless, I tried .astype(str) and still have the same result nan.
A possible solution:
df = pd.Series({496: 'therapist and friend died in ~2006 Parental/Caregiver obligations:\n'})
df.replace(r'~?(\d{4})\b', r'Jan 1, \1', regex=True)
Output:
496 therapist and friend died in Jan 1, 2006 Paren...
dtype: object

[pandas]Dividing all elements of columns in df with elements in another column (Same df)

I'm sorry, I know this is basic but I've tried to figure it out myself for 2 days by sifting through documentation to no avail.
My code:
import numpy as np
import pandas as pd
name = ["bob","bobby","bombastic"]
age = [10,20,30]
price = [111,222,333]
share = [3,6,9]
list = [name,age,price,share]
list2 = np.transpose(list)
dftest = pd.DataFrame(list2, columns = ["name","age","price","share"])
print(dftest)
name age price share
0 bob 10 111 3
1 bobby 20 222 6
2 bombastic 30 333 9
Want to divide all elements in 'price' column with all elements in 'share' column. I've tried:
print(dftest[['price']/['share']]) - Failed
dftest['price']/dftest['share'] - Failed, unsupported operand type
dftest.loc[:,'price']/dftest.loc[:,'share'] - Failed
Wondering if I could just change everything to int or float, I tried:
dftest.astype(float) - cant convert from str to float
Ive tried iter and items methods but could not understand the printouts...
My only suspicion is to use something called iterate, which I am unable to wrap my head around despite reading other old posts...
Please help me T_T
Apologies in advance for the somewhat protracted answer, but the question is somewhat unclear with regards to what exactly you're attempting to accomplish.
If you simply want price[0]/share[0], price[1]/share[1], etc. you can just do:
dftest['price_div_share'] = dftest['price'] / dftest['share']
The issue with the operand types can be solved by:
dftest['price_div_share'] = dftest['price'].astype(float) / dftest['share'].astype(float)
You're getting the cant convert from str to float error because you're trying to call astype(float) on the ENTIRE dataframe which contains string columns.
If you want to divide each item by each item, i.e. price[0] / share[0], price[1] / share[0], price[2] / share[0], price[0] / share[1], etc. You would need to iterate through each item and append the result to a new list. You can do that pretty easily with a for loop, although it may take some time if you're working with a large dataset. It would look something like this if you simply want the result:
new_list = []
for p in dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append(p/s)
If you want to get this in a new dataframe you can simply save it to a new dataframe using pd.Dataframe() method:
new_df = pd.Dataframe(new_list, columns=[price_divided_by_share])
This new dataframe would only have one column (the result, as mentioned above). If you want the information from the original dataframe as well, then you would do something like the following:
new_list = []
for n, a, p in zip(dftest['name'], dftest['age'], dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append([n, a, p, s, p/s])
new_df = pd.Dataframe(new_list, columns=[name, age, price, share, price_div_by_share])
If you check the data types of your dataframe, you will realise that they are all strings/object type :
dftest.dtypes
name object
age object
price object
share object
dtype: object
first step will be to change the relevant columns to numbers - this is one way:
dftest = dftest.set_index("name").astype(float)
dftest.dtypes
age float64
price float64
share float64
dtype: object
This way you make the names a useful index, and separate it from the numeric data. This is just a suggestion; you may have other reasons to leave names as a columns - in that case, you have to individually change the data types of each column.
Once that is done, you can safely execute your code :
dftest.div(dftest.share,axis=0)
age price share
name
bob 3.333333 37.0 1.0
bobby 3.333333 37.0 1.0
bombastic 3.333333 37.0 1.0
I assume this is what you expect as your outcome. If not, you can tweak it. Main part is get your data types as numbers before computation/division can occur.

Dataframe countif function issue [duplicate]

This question already has an answer here:
counting the amount of True/False values in a pandas row
(1 answer)
Closed 5 years ago.
Return
0.0000
-0.0116
0.0000
0.0100
I have a dataframe of the format above and I am trying to count >0 and <0 with the following code
print ("Positive Returns:")
print((df['Return']>0.0).count())
print ("Negative Returns:")
print((df['Return']<0.0).count())
However both return 5119 which is my whole dataframe length
It is not counting correctly.. can anyone advise please?
Thankyou
*not really a duplicate since I am not asking for true/false value it can be >0.1 for example
Use sum for count boolean Trues which are processed like 1s:
print((df['Return']>0.0).sum())
print((df['Return']<0.0).sum())

pd.datetime is failing to convert to date

I have a data frame, which has a column 'Date', it is a string type, and as I want to use the column 'Date' as index, first I want to convert it to datetime, so I did:
data['Date'] = pd.to_datetime(data['Date'])
then I did,
data = data.set_index('Date')
but when I tried to do
data = data.loc['01/06/2006':'09/06/2006',]
the slicing is not accomplished, there is no Error but the slicing doesn't occur, I tried with iloc
data = data.iloc['01/06/2006':'09/06/2006',]
and the error message is the following:
TypeError: cannot do slice indexing on <class `'pandas.tseries.index.DatetimeIndex'> with these indexers [01/06/2006] of <type 'str'>`
So I come to the conclusion that the pd.to_datetime didn't work, even though no Error was raised?
Can anybody clarify what is going on? Thanks in advance
It seems you need change order of datetime string to YYYY-MM-DD:
data = data.loc['2006-06-01':'2006-06-09']
Sample:
data = pd.DataFrame({'col':range(15)}, index=pd.date_range('2006-06-01','2006-06-15'))
print (data)
col
2006-06-01 0
2006-06-02 1
2006-06-03 2
2006-06-04 3
2006-06-05 4
2006-06-06 5
2006-06-07 6
2006-06-08 7
2006-06-09 8
2006-06-10 9
2006-06-11 10
2006-06-12 11
2006-06-13 12
2006-06-14 13
2006-06-15 14
data = data.loc['2006-06-01':'2006-06-09']
print (data)
col
2006-06-01 0
2006-06-02 1
2006-06-03 2
2006-06-04 3
2006-06-05 4
2006-06-06 5
2006-06-07 6
2006-06-08 7
2006-06-09 8
As I what I want is to create a new DataFrame with specific dates from the original DataFrame, I convert the column 'Date' as Index
data = data.set_index(data['Date'])
And then just create the new Data Frame using loc
data1 = data.loc['01/06/2006':'09/06/2006']
I am quite new to Python and I thought that I needed to convert to datetime the column 'Date' which is string, but apparently is not necessary. Thanks for your help #jezrael