How to get the value of a cell based on filtering efficently - pandas

If I have a dataframe and want to have the value of a cell based on a condition/filter, it seems not to be a one line instruction.
In the example below I do not know the index at the beginning so I need to filter and then ask for index and apply it.
Is there a easier way to get the value without knowing the index upfront.
Edit: Easy spoken I want to have the value of the Column "Category" of the Row where the Column "No." has the value 'P1'.

You can use DataFrame.loc with specify condition and column name:
out = dfn.loc[dfn['No.'] == 'P1', 'Category']
Then out is Series, with one or more values, if no match get empty Series.
If need first value of out to scalar:
scalar = out.iat[0]
But this fail if empty Series:
out = dfn.loc[dfn['No.'] == 'aaaa', 'Category']
Then use:
scalar = next(iter(out), 'no match')

Related

Confusion about modifying column in dataframe with pandas

I'm working on a Bangaluru House Price Data csv from Kaggle. There is a column called 'total_sqft'. In this column, there are values that are a range of numbers (e.g.: 1000-1500), and I want to identify all those entries. I created this function to do so:
def is_float(x):
try:
float(x)
except:
return False
return True
I applied it to the column:
df3[~df3['total_sqft'].apply(is_float)]
This works, but I don't understand why this doesn't:
df3['total_sqft'] = ~df3['total_sqft'].apply(is_float)
This just returns 'False' for everything instead of the actual entries
Answer from comment:
In the first version you are selecting the rows that contain true values from the apply function. In the second you are setting the values to be the values of the apply function. Tilde means negation btw.

Remove a specific string value from the whole dataframe without specifying the column or row

I have a dataframe that has some cells with the value of "?". now this value causes an error ("Could not convert string to float: "?") whenever i try to use the multi information metric.
I already found a solution by simply using:
df.replace("?",0,inplace=True)
And it worked. BUT i'm wondering if i wanted to remove the whole row if one of its cells has the value of "?", how can i do that?
Notice that i don't have the column names that contains this value. it's spread in different column and that's why i can't use df.drop.
You can check for each cell if they are equal to "?" and then get a boolean series over rows that contain that character in any one of their cells. Then get the indices of rows that gave True and drop them:
has_ques_mark = df.eq("?").any(axis=1) # a boolean series
inds = has_ques_mark[has_ques_mark].index # row indices where above is True
new_df = df.drop(inds)
You can do it the following way:
df.drop(df.loc[df['column_name'] == "?"].index, inplace=True)
or in a slightly simpler syntax but maybe a bit less performant:
df = df.loc[df['column_name'] != "?"]

Extracting the value from a Pandas Dataframe column that has only a unique value

Given a pandas DataFrame (df), where one column (unique_val_col) should have a unique value, what is the best the best way to extract this value (not as a list)?
So far I've used the following code:
output = list(set(df[unique_val_col)))
if len(output)==1: output = output[0]
Or if there is a chance for nans then change the first line to be:
output = [val for val in list(set(df[unique_val_col))) if val == val]
The question is whether there is a more direct way, that would also reflect the fact that the column actually has only one value without needing the 'if' statement.
I think you are trying to find a value that occurs only once, if that's so you could achieve it like this
df['unique_value_counts'].value_counts().sort_values(ascending=False).keys()[0]

Most Efficient Way to Create New Column in Pandas from Existing One Based on Condition

How would I create a new column from an existing column based on condition? I know I can do it with a for loop, but it's very inefficient.
For example, let's say I have a column 'Customer_Left' with a binomial value of 0 or 1. I want to create a new column ('Value') that assigns a value of 100 to that customer.
My code was this:
value = 100
for i in range(len(df['Customer_Left')):
if df['Customer_Left'][i] == 1:
df['Value'] = value
What's the most efficient way to do this?
A more general answer if you want to set a value to a subset of rows is to use .loc with a condition, such as:
df.loc[df['Customer_Left'] == 1, 'Value'] = value
In your case, you could simply do:
df['Value'] = df['Customer_Left'] * 100

Pandas: How can I check if a pandas dataframe contains a specific value?

Specifically, I want to see if my pandas dataframe contains a False.
It's an nxn dataframe, where the indices are labels.
So if all values were True except for even one cell, then I would want to return False.
your question is a bit confusing, but assuming that you want to know whether there is at least one False in your dataframe, you could simply use
mask = df.mycol == False
mask.value_counts()
mask.sum()
mask.sum() > 0
All will tell you the truth
If you just want to scan your whole dataframe looking for a value, check df.values. It returns an array of all the rows in the dataframe.
value = False # This is the value you're searching
df_contains_value = any([value in row for row in df.values])