Conditional If Statement: If value in row starts with letter in string … set another column with some corresponding value - pandas

I have the 'Field_Type' column filled with strings and I want to derive the values in the 'Units' column using an if statement.
So Units shows the desired result. Essentially I want to call out what type of activity is occurring.
I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!
create_table['Units'] = pd.np.where(create_table['Field_Name'].str.startswith("W"), "MW",
pd.np.where(create_table['Field_Name'].str.contains("R"), "MVar",
pd.np.where(create_table['Field_Name'].str.contains("V"), "Per Unit")))```
ValueError: either both or neither of x and y should be given

You can write a function to define your conditionals, then use apply on the dataframe and pass the funtion
def unit_mapper(row):
if row['Field_Type'].startswith('W'):
return 'MW'
elif 'R' in row['Field_Type']:
return 'MVar'
elif 'V' in row['Field_Type']:
return 'Per Unit'
else:
return 'N/A'
And then
create_table['Units'] = create_table.apply(unit_mapper, axis=1)

In your text you talk about Field_Type but you are using Field_Name in your example. Which one is good ?
You want to do something like:
create_table[create_table['Field_Type'].str.startwith('W'), 'Units'] = 'MW'
create_table[create_table['Field_Type'].str.startwith('R'), 'Units'] = 'MVar'
create_table[create_table['Field_Type'].str.startwith('V'), 'Units'] = 'Per Unit'

Related

Start working on pandas and getting erroe on thos

I have just started my work on pandas. Currently I'm working on a dataset of NETFLIX.
In this dataset I want to add a new column which contains the total number of cast members in that particular movie or tv show. I can calculate the cast individually but I want to calculate all of them. Can someone help me to write this code ?
Here is what I'm trying to do:
link https://www.kaggle.com/datasets/shivamb/netflix-shows?
def set_cast(val):
if val is None:
return 0
if val == 'None':
return 0
return len(val.split(', '))
data['num_of_cast'] = data['cast'].apply(set_cast)
getting these error
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
'float' object has no attribute 'split'
In your dataset the first column are some statistics. Maybe they are the flout values that causes the error.
Try skipping the first column. Maybe like this:
data['num_of_cast'] = [set_cast(x) for x in data['cast'] if data['cast'].index(x) >=1]
Instead of this:
data['num_of_cast'] = data['cast'].apply(set_cast)

Error while transforming series with series.map() function

def locate (code):
string1 = str(code)
floor = string1[3]
if floor == '1':
return 'Ground Floor'
else:
if int(string1[5]) < 1:
lobby = 'G'
elif int(string1[5]) < 2:
lobby = 'F'
else:
lobby = 'E'
return floor + lobby
print(locate('S191009'))
print(locate('S087525'))
This function works fine with individual input code as above with Output
Ground Floor
7E
But when I use this to map a series in data frame, it shows error.
error_data1['location'] = error_data1['status'].map(locate)
Error message: string index out of range.
How can I fix this??
Your problem is with your series values:
se = pd.Series(['S191009', 'rt'])
se.map(locate)
produces the same error you reported. You can ignore these rows using try...except in function if it does not hurt you.
The problem is you are indexing an index on a string that doesn't exist (i.e the string is shorter than what you expect). As the other answer mentioned, if you try and use
my_string="foo"
print(my_string[5])
You will get the same error. To solve this you should add a try except statement, or for simplicity an initial if statement that returns "NotValid" or something like that. Your data probably has strings that do not follow the standard form you expect.

I want to replace outlier instead of completely removing it... Any Suggestion?

I have data-set in which one column has outlier and these outlier somehow dependent on one more column which has 12 different categories.
So, I want to replace these outlier with mean of those categories.
for example:
column A has market_01, market_02, ..., market_12 and
column B has int values 984, 678, 1326, 887, ....., 710, .....
so, here, I want to replace 1326 outlier value with respect to its corresponding market_02.mean() rather than simply values.mean()
Try:
via mask() + groupby()+transform()
#Firstly find mean:
m=df.groupby('market')['values'].transform('mean').round(2)
#Finally replace outlier:
df['values']=df['values'].mask(df['values'].eq(1326),m)
OR
via np.where() with groupby()+transform():
m=df.groupby('market')['values'].transform('mean').round(2)
df['values']=np.where(df['values'].eq(1326),m,df['values'])
Step 1: Calculate the mean of the values corresponding to the market.
mean_val = df[df['market'] = 'market_02']['values'].mean()
Step 2: Now replace all values greater (or equal to) the value that you believe is outlier with the above calculated mean.
df['values'] = df['values'].apply(lambda x: mean_val if x >= mean_val else x)
Thank you #Pratik and #Anurag for your answers.
I have solved this by below approach: By this way my all outlier are replaced.
t = df.groupby('market').values
df['values'] = (
df.values.where(
t.transform('quantile', q=0.75) > df.values,
t.transform('median')))

Using to_datetime several columns names

I am working with several CSV's that first N columns are information and then the next Ms (M is big) columns are information regarding a date.
This is the dataframe picture
I need to set just the columns between N+1 to N+M - 1 columns name to date format.
I tried this, in this case N+1 = 5, no matter M, I suppose that I can use -1 to not affect the last column name.
ContDiarios.columns[5:-1] = pd.to_datetime(ContDiarios.columns[5:-1])
but I get the following error:
TypeError: Index does not support mutable operations
The way you are doing is not feasable. Please try this way
def convert(x):
try:
return pd.to_datetime(x)
except:
return x
x.columns = map(convert,x.columns)
Or you can also use df.rename property to convert it.

How do I reverse each value in a column bit wise for a hex number?

I have a dataframe which has a column called hexa which has hex values like this. They are of dtype object.
hexa
0 00802259AA8D6204
1 00802259AA7F4504
2 00802259AA8D5A04
I would like to remove the first and last bits and reverse the values bitwise as follows:
hexa-rev
0 628DAA592280
1 457FAA592280
2 5A8DAA592280
Please help
I'll show you the complete solution up here and then explain its parts below:
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
reversed_bits = [list_of_bits[-i] for i in range(1,len(list_of_bits)+1)]
return ''.join(reversed_bits)
df['hexa-rev'] = df['hexa'].apply(lambda x: reverse_bits(x))
There are possibly a couple ways of doing it, but this way should solve your problem. The general strategy will be defining a function and then using the apply() method to apply it to all values in the column. It should look something like this:
df['hexa-rev'] = df['hexa'].apply(lambda x: reverse_bits(x))
Now we need to define the function we're going to apply to it. Breaking it down into its parts, we strip the first and last bit by indexing. Because of how negative indexes work, this will eliminate the first and last bit, regardless of the size. Your result is a list of characters that we will join together after processing.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
The second line iterates through the list of characters, matches the first and second character of each bit together, and then concatenates them into a single string representing the bit.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
The second to last line returns the list you just made in reverse order. Lastly, the function returns a single string of bits.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
reversed_bits = [list_of_bits[-i] for i in range(1,len(list_of_bits)+1)]
return ''.join(reversed_bits)
I explained it in reverse order, but you want to define this function that you want applied to your column, and then use the apply() function to make it happen.