Replace transformation: What is the right way to do it? - pandas

I do this to replace a character in a string:
df['msg'] = df['msg'].str.replace(u'X','W')
And get this warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Then, I try to do this same transformation the right way (I thought) to avoid that warning:
df.loc[:,'msg'] = df.loc[:,'msg'].str.replace(u'X','W')
But, I am still getting the same warning, even though both codes works fine.
What is the correct way to do this kind of transformation?

This warning can be resolved by using the method copy():
df.loc[:,'msg'] = df['msg'].str.replace(u'X','W').copy()
Or assign()
df = df.assign(msg=df['msg'].str.replace(u'X','W'))

Related

Change Column Values in a Dataframe column using Pandas

The data type of the column is object. but, i still map it to string using astype(str). even used temp['Injury Severity'].str.strip() to remove spaces from column values.
enter image description here
I want to replace all "Fatal(0)",Fatal(1)"... with only "Fatal". so i used.temp['Injury Severity'] = temp['Injury Severity'].replace('Fatal(0)','Fatal',inplace = True).
But did not work. i also tried temp.loc[temp['Injury Severity'] == 'Fatal(0)','Injury Severity'] = temp['Injury Severity'].replace('Fatal(0)','Fatal',inplace = True)
In addition is tried str.replace but did not work out.lastly also used regex = True but no changes was observed.It still remains the same.
I think it is solved. It seems that the values were having leading and trailing spaces in the name of values.Thanks alot for the help everyone !!
Try This,
temp['Injury Severity'].replace('Fatal(0)','Fatal',inplace = True)
No need to assign it again.

pandas copy vs slice view

i am fully ware of the pandas dataframe view vs copy issue.
Pandas dataframe index slice view vs copy
I would think the below code <approach 1> will be "safe" and robust:
mydf = mydf[mydf.something == some condition]
mydf['some column'] = something else
Note that doing above, I change the parent dataframe all together, not creating a separate view.
<approach 2> I make the explicit .copy() method
mydf = mydf[mydf.something == some condition].copy()
mydf['some column'] = something else
In fact, I would think the latter will be an unneccessary overhead?
However, occasionally, (not consistently) i will still receive the below warning message, using the first approach (without the .copy())
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
am I missing any subtlety using apporach 1? Or should one always for robustness use approach 2? is the .copy() going to be a meaningful overhead?

Proper Syntax When Using dot Operator in String Interpolation in Dart

In Dart/Flutter, suppose you have an instance a of Class Y.
Class Y has a property, property1.
You want to print that property using string interpolation like so:
print('the thing I want to see in the console is: $a.property1');
But you can't even finish typing that in without getting an error.
The only way I can get it to work is by doing this:
var temp = a.property1;
print ('the thing I want to see in the console is: $temp');
I haven't found the answer online... and me thinks there must be a way to just do it directly without having to create a variable first.
You need to enclose the property in curly braces:
print('the thing I want to see in the console is: ${a.property}');
That will then print the value of a.property.
It seems you can also do this, but it doesn't seem to be documented anywhere:
print('..... $a.$property1');

What's the cleanest way for assigning a new pandas dataframe column to a single value?

Working with a dataframe df I wanted to create a new column A and assign it to a single value (a string in my case)
df['A'] = value
gave a warning and suggested to use loc
however the solution below still gave the same warning:
df.loc[:,'A']=value
Doing some research I found the solution below which does not generate a warning:
df=df.assign(A =value)
Is it the general accepted way of creating a new column and assigning it to a value? Are there other possibilities using loc?
pandas version '0.20.1'
EDIT: this is the warning message obtained for the 2 first methods
"A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"
As explained by #EdChum and #ScottBoston
Since df was derived using a mask on some original dataframe
df = df_original[boolean_mask]
to avoid the warning with the two first methods, use instead df=df_original[boolean_mask].copy()
df.assign does not need this because it automatically creates a copy of the original dataframe

Pandas and Fuzzy Match

Currently I have two data frames. I am trying to get a fuzzy match of client names using fuzzywuzzy's process.extractOne function. When I have run the following script on sample data I get good results and no error, but when I run the following on my current data frames I get both an Attribute and Type error. I am not able to provide the data for security reasons, but if anyone can figure out why I am getting errors based on the script provided I would be much obliged.
names2 = list(dftr3['Common Name'])
names3 = dict(zip(names2,names2))
def get_fuzz_match(row):
match = process.extractOne(row['CLIENT_NAME'],choices = n3.keys(),score_cutoff = 80)
if match:
return n3[match[0]]
return np.nan
dfmi4['Match Name'] = dfmi4.apply(get_fuzz_match, axis=1)
I know not having some examples makes this more difficult to troubleshoot, so I will answer any question and edit the post to help this process along. The specific errors are:
1.AttributeError: 'dict_keys' object has no attribute 'items'
2.TypeError: expected string or buffer
The AttributeError is straightforward and to be expected, I think. Fuzzywuzzy's process.extract function, which does most of the actual work in process.extractOne, uses a try:... except: clause to determine whether to process the choices parameter as dict-like or list-like. I think you are seeing the exception because the TypeError is raised during the except: clause.
The TypeError is trickier to pin down, but I suspect it occurs somewhere in the StringProcessor class, used in the processor module, again called by extract, which uses several string methods and doesn't catch exceptions. So it seems likely that your apply call is passing something that is not a string. Is it possible that you have any empty cells?