Numpy.selectand assign new colum in df with condition from values of two column - pandas

df = df.assign[test = np.select[df.trs = 'iw' & df.rp == 'yu'],[1,0],'null']
I want if df.trs == iw' and df.rp == 'yu'` than new column should be created should be 0 else 1 only for condotion fullfilling row not every row
I tried no.slect and with condition array. But not getting desired output

You don't need numpy.select, a simple boolean operator is sufficient:
df['test'] = (df['trs'].eq('iw') & df['rp'].eq('yu')).astype(int)
If you really want to use numpy, this would require numpy.where:
df['test'] = np.where(df['trs'].eq('iw') & df['rp'].eq('yu'), 1, 0)

Related

to_string(index = False) results in non empty string even when dataframe is empty

I am doing the following in my python script and I want to hide the index column when I print the dataframe. So I used .to_string(index = False) and then use len() to see if its zero or not. However, when i do to_string(), if the dataframe is empty the len() doesn't return zero. If i print the procinject1 it says "Empty DataFrame". Any help to fix this would be greatly appreciated.
procinject1=dfmalfind[dfmalfind["Hexdump"].str.contains("MZ") == True].to_string(index = False)
if len(procinject1) == 0:
print(Fore.GREEN + "[✓]No MZ header detected in malfind preview output")
else:
print(Fore.RED + "[!]MZ header detected within malfind preview (Process Injection indicator)")
print(procinject1)
That's the expected behaviour in Pandas DataFrame.
In your case, procinject1 stores the string representation of the dataframe, which is non-empty even if the corresponding dataframe is empty.
For example, check the below code snippet, where I create an empty dataframe df and check it's string representation:
df = pd.DataFrame()
print(df.to_string(index = False))
print(df.to_string(index = True))
For both index = False and index = True cases, the output will be the same, which is given below (and that is the expected behaviour). So your corresponding len() will always return non-zero.
Empty DataFrame
Columns: []
Index: []
But if you use a non-empty dataframe, then the outputs for index = False and index = True cases will be different as given below:
data = [{'A': 10, 'B': 20, 'C':30}, {'A':5, 'B': 10, 'C': 15}]
df = pd.DataFrame(data)
print(df.to_string(index = False))
print(df.to_string(index = True))
Then the outputs for index = False and index = True cases respectively will be -
A B C
10 20 30
5 10 15
A B C
0 10 20 30
1 5 10 15
Since pandas handles empty dataframes differently, to solve your problem, you should first check whether your dataframe is empty or not, using pandas.DataFrame.empty.
Then if the dataframe is actually non-empty, you could print the string representation of that dataframe, while keeping index = False to hide the index column.

Matching conditions in columns

I am trying to match conditions so that if text is present in both columns A and B and a 0 is in column C, the code should return 'new' in column C (overwriting the 0). Example dataframe below:
import pandas as pd
df = pd.DataFrame({"A":['something',None,'filled',None], "B":['test','test','test',None], "C":['rt','0','0','0']})
I have tried the following, however it only seems to accept the last condition so that any '0' entries in column C become 'new' regardless of None in columns A or B. (in this example I only expect 'new' to appear on row 2.
import numpy as np
conditions = [(df['A'] is not None) & (df['B'] is not None) & (df['C'] == '0')]
values = ['new']
df['C'] = np.select(conditions, values, default=df["C"])
Appreciate any help!
You will need to use .isna() and filter where it is not nan/none (using ~) as below:
conditions = [~(df['A'].isna()) & ~(df['B'].isna()) & (df['C'] == '0')]
output:
A B C
0 something test rt
1 None test 0
2 filled test new
3 None None 0
Use Series.notna for test None or NaNs:
conditions = [df['A'].notna() & df['B'].notna() & (df['C'] == '0')]
Or:
conditions = [df[['A','B']].notna().all(axis=1) & (df['C'] == '0')]
values = ['new']
df['C'] = np.select(conditions, values, default=df["C"])
print (df)
A B C
0 something test rt
1 None test 0
2 filled test new
3 None None 0
Use
mask = df[['A', 'B']].notna().all(1) & df['C'].eq('0')
df.loc[mask, 'C'] = 'new'

Data frame: get row and update it

I want to select a row based on a condition and then update it in dataframe.
One solution I found is to update df based on condition, but I must repeat the condition, what is the better solution so that I get the desired row once and change it?
df.loc[condition, "top"] = 1
df.loc[condition, "pred_text1"] = 2
df.loc[condtion, "pred1_score"] = 3
something like:
row = df.loc[condition]
row["top"] = 1
row["pred_text1"] = 2
row["pred1_score"] = 3
Extract the boolean mask and set it as a variable.
m = condition
df.loc[m, 'top'] = 1
df.loc[m, 'pred_text1'] = 2
df.loc[m, 'pred1_score'] = 3
but the shortest way is:
df.loc[condition, ['top', 'pred_text1', 'pred_score']] = [1, 2, 3]
Update
Wasn't it possible to retrieve the index of row and then update it by that index?
idx = df[condition].idx
df.loc[idx, 'top'] = 1
df.loc[idx, 'pred_text1'] = 2
df.loc[idx, 'pred1_score'] = 3

Iterate over an xpath (table row) python

I have xpaths as follow:
/html/body/div[1]/table[3]/tbody/tr[1]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[2]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[4]/td[3]/a
As you can see, the tr[] values are changing. I want iterate over these values.
Below is the code I have used
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a')
Please let me know How can I iterate over them.
This may not be the exact solution you are looking for but this is the idea.
tableRows = driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr")
for e in tableRows:
e.find_element_by_xpath(".//td[3]/a")
if you want all third td for every row, use this:
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr/td[3]/a')
If you want only the first 3 rows use this:
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr[position() < 4]/td[3]/a')
For looping through tds look I.e. this answer
Another alternative, assuming 4 elements:
for elem in range(1,5):
element = f"/html/body/div[1]/table[3]/tbody/tr[{elem}]/td[3]/a"
#e = driver.find_element_by_xpath(element)
#e.click()
print(element)
Prints:
/html/body/div[1]/table[3]/tbody/tr[1]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[2]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[4]/td[3]/a
You could do whatever you wanted with the elements in the loop, I just printed to show the value
Option 1: Fixed Column number like 3 needs to be be iterated:
rows = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr"))
for row in range(1, (rows + 1)):
local_xpath = ""/html/body/div[1]/table[3]/tbody/tr[" + str(row) + "]/td[3]"
# do something with element
# cell_text = self.driver.find_element_by_xpath(local_xpath ).text
Option 2: Both Row and Col needs to be iterated:
rows = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr"))
columns = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr/td"))
for row in range(1, (rows + 1)):
for column in range(1, (columns + 1)):
local_xpath = ""/html/body/div[1]/table[3]/tbody/tr[" + str(row) + "]/td[" + str(column) + "]"
# do something with element
# cell_text = self.driver.find_element_by_xpath(local_xpath ).text

Drop rows from pandas dataframe

I need to drop some rows from a pandas dataframe aa based on a query as follows:
aa.loc[(aa['_merge'] == 'right_only') & (aa['Context Interpretation'] == 'Topsoil')]
How do I drop this selection from the datafram aa?
You can do add '~'
out = aa.loc[~((aa['_merge'] == 'right_only') & (aa['Context Interpretation'] == 'Topsoil'))]
Or
idx = aa.index[(aa['_merge'] == 'right_only') & (aa['Context Interpretation'] == 'Topsoil')]
out = aa.drop(idx)