apply function causing SettingWithCopyWarning error -? [duplicate] - pandas

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 7 days ago.
dataframe called condition produces the below output:
SUBJID LBSPCCND LBSPCCND_OTHER
0 0292-104 Adequate specimen
1 1749-101 Other Limited Sample
2 1733-104 Paraffin block; paraffin-embedded specimen
3 0587-102 Other Pathology Report
4 0130-101 Adequate specimen
5 0587-101 Adequate specimen
6 0609-102 Other Unacceptable
When I run the below code, I'm getting a settingwithcopywarning:
condition["LBSPCCND"] = condition["LBSPCCND"].apply(convert_condition)
condition
SUBJID LBSPCCND LBSPCCND_OTHER
0 0292-104 ADEQUATE
1 1749-101 Other Limited Sample
2 1733-104 PARAFFIN-EMBEDDED
3 0587-102 Other Pathology Report
4 0130-101 ADEQUATE
5 0587-101 ADEQUATE
6 0609-102 Other Unacceptable
This generates this error:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Copy() of my dataframe got rid of the error:
columns = ["SUBJID", "LBSPCCND", "LBSPCCND_OTHER"]
condition = igan[columns].copy()

Related

python pandas - Add a new column by merging 2 columns based on a condition

Here is the data frame I am working with:
df = igan[["SUBJID", "LBSPCCND", "LBSPCCND_OTHER"]]
df.head(12)
I need to merge LBSPCCND and LBSPCCND_OTHER into a new column called LBSPCCND_ALL. I want to keep all values in LBSPCCND except where it is = "Other". I want to take all values from LBSPCCND_OTHER where it is not blank and merge those values into the new column. (all blanks should mean that LBSPCCND has a valid value.) I can't have "Other" in my data set. SUBJID is what I am using as a unique identifier to merge this data back into my main data frame that you don't see here.
I put together these conditions, but I'm unsure how to get the new column based on these conditions.
condition1 = df["LBSPCCND"] != "Other"
condition2 = df["LBSPCCND_OTHER"] != ""
df["LBSPCCND_ALL"] = df[df[condition1 & condition2]]
#This is not working I get: Expected a 1D array, got an array with shape (13, 3)
I would do it this way :
df["LBSPCCND_ALL"] = df["LBSPCCND_OTHER"].replace("", None).fillna(df["LBSPCCND"])
Another variant,
df["LBSPCCND_ALL"] = df["LBSPCCND"].replace("Other", None).fillna(df["LBSPCCND_OTHER"])
Output :
print(df["LBSPCCND_ALL"])
0 Adequate specimen
1 Adequate specimen
2 Adequate specimen
3 Limited Sample
4 Paraffin block; paraffin-embedded specimen
5 Adequate specimen
6 Adequate specimen
7 Adequate specimen
8 Pathology Report
9 Adequate specimen
10 Adequate specimen
11 Unacceptable
Name: LBSPCCND_ALL, dtype: object

type error : 'float' object is not iterable

the o/p of a column of df['emp_years'] is:-
NaN
< 1 year
3 years
10+ years
10+ years
...
9 years
10+ years
1 year
3 years
3 years
Name: emp_years, Length: 10000, dtype: object
now when i try to implement this function on this column
def change(col):
for x in col:
print(x)
df['emp_years'].apply(change)
i get type error TypeError: 'float' object is not iterable
so can someone tell me how to solve this
you should consider using vectorisation.
Have a look at df.iterrows() or df.itertuples() as described in the pandas documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples
This error mostly appears when there are NaN values in the column. When you try to access/ print NaN values, this error message is given. I would advice you to clean the data a little bit. One solution is to drop your NaN values of that column. For that use df.dropna(subset=['emp_years']). I don't know whether you have changed some data types but I would advice you to do that and next time do provide more info about the dataset or provide some link to your code so that we can understand the issue better.
Happy Coding!

Merging two dataframes on the same type column gives me wrong result

I have two dataframes, assume A and B, which have been created after reading the sheets of an Excel file and performing some basic functions. I need to merge right the two dataframes on a column named ID which has first been converted to astype(str) for both dataframes.
The ID column of the left Dataframe (A) is:
0 5815518813016
1 5835503994014
2 5835504934023
3 5845535359006
4 5865520960012
5 5865532845006
6 5875531550008
7 5885498289039
8 5885498289039_A2
9 5885498289039_A3
10 5885498289039_X2
11 5885498289039_X3
12 5885509768698
13 5885522349999
14 5895507791025
Name: ID, dtype: object
The ID column of the right Dataframe (B) is:
0 5835503994014
1 5845535359006
2 5835504934023
3 5815518813016
4 5885498289039_A1
5 5885498289039_A2
6 5885498289039_A3
7 5885498289039_X1
8 5885498289039_X2
9 5885498289039_X3
10 5885498289039
11 5865532845006
12 5875531550008
13 5865520960012
14 5885522349998
15 5895507791025
16 5885509768698
Name: ID, dtype: object
However, when I merge the two, the rest of the columns of the left (A) dataframe become "empty" (np.nan) except for the rows where the ID does not contain only numbers but letters too. This is the pd.merge() I do:
A_B=A.merge(B[['ID','col_B']], left_on='ID', right_on='ID', how='right')
Do you have any ideas what might be so wrong? Your input is valuable.
Try turning all values in both columns into strings:
A['ID'] = A['ID'].astype(str)
B['ID'] = B['ID'].astype(str)
Generally, when a merge like this doesn't work, I would try to debug by printing out the unique values in each column to check if anything pops out (usually dtype issues).

How to change values in a column from object type to float. for Example, "'€220 M" to 220,000,000? [duplicate]

This question already has answers here:
Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe
(6 answers)
Closed 3 years ago.
I have a column with data as:
1 77M
2 118.5M
3 72M
4 102M
5 93M
6 67M
I need to change this to its numerical value as:
77,000,000
and so on.
I have tried different ways but could not come up with a definite solution.
okay this should work
(df[1].str.replace('M','').astype(float) * 1000000).astype(int).astype(str).apply(lambda x : x[:-6]+','+x[-6:-3]+','+x[-3:])
Output
0 77,000,000
1 118,500,000
2 72,000,000
3 102,000,000
4 93,000,000
5 67,000,000
Name: 1, dtype: object

Oracle PL/SQL number to_char ignores 0's even when decimal part exist [duplicate]

This question already has an answer here:
Fetching value from a number column removes 0 before decimal
(1 answer)
Closed 5 months ago.
Im using the following code to convert a number(38,2) to the format "XXX.XXX.XXX,XX":
TO_CHAR(number, '999G999G999G999G999G999G999D99')
Although, when the number is 0 or 0.XX the 0 is eaten up :
Input: 0
Result: ",00"
Expected Result: "0,00"
Input: 0.23
Result: ",23"
Expected Result: "0,23"
Why does this happen, is there a way of solving this without manipulating the result string ?
Use 0 instead of 9
TO_CHAR(0.00, '999G999G999G999G999G999G990D99')
0 ensures that if there is no digit in that place it'll display 0.