All() is printing every time else statement - pandas

Basically pandas object is applying to entire data frame not individually
that is why it is going to else condition. we need to apply on each rows
I got proper output while applying on one row frame. While applying entire data frame I got the error No keys on each rows, Basically some rows of res have None only those rows are expected to be No keys
sample dataframe
res,url1,url2
{'bool': True, 'val':False},{'bool': False, 'val':False},{'bool': True, 'val':False}
None,{'bool': True, 'val':False},{'bool': False, 'val':False}
{'bool': False, 'val':False},},{'bool': True, 'val':False},{'bool': True, 'val':False}
Code
def func1():
return ('url1')
def func2():
return ('url2')
def test_func():
if df['res'].str['bool'].all() and df['url1'].str['bool'].all():
return func1()
elif df['res'].str['bool'].all() and df['url2'].str['bool'].all():
return func2()
else:
return ("No Keys")
Expected Out
output
url1
No Keys
url2
MY out
No keys
No Keys
No Kyes
I need to apply on the below code more than 5000 urls
df['output'] = df.apply(test_func)
While applying I got the error No keys on each rows
if i do any its passing False because first row of the url1 bools is False
What is the issue is if all() its checking all the rows since None is present in the second rows its printing No Keys

Recreating DataFrame
res url1 \
0 {'bool': True, 'val': False} {'bool': False, 'val': False}
1 None {'bool': True, 'val': False}
2 {'bool': False, 'val': False} {'bool': True, 'val': False}
url2
0 {'bool': True, 'val': False}
1 {'bool': False, 'val': False}
2 {'bool': True, 'val': False}
use pd.apply
df.apply(lambda x: 'url1' if (x['res'] != None and x['res'].get('bool') and x['url1'].get('bool'))\
else 'url2' if (x['res'] != None and x['res'].get('bool') and x['url2'].get('bool'))
else 'No Keys',1)
Output
0 url2
1 No Keys
2 No Keys
dtype: object
Note - for third row, res bool value is False, so doing and will give false and hence No Keys

You can also use a nested np.where:
import pandas as pd
import numpy as np
#Recreate dataframe
df = pd.DataFrame(data = {
'res': [{'bool': True, 'val':False}, None, {'bool': False, 'val':False}],
'url1':[{'bool': False, 'val':False}, {'bool': True, 'val':False}, {'bool': True, 'val':False}],
'url2':[{'bool': True, 'val':False},{'bool': False, 'val':False},{'bool': True, 'val':False}]})
# Define logic
df['Output'] = np.where(df['res'].str['bool'] & df['url1'].str['bool'], 'url1',
np.where(df['res'].str['bool'] & df['url2'].str['bool'], 'url2',
'No Keys'))
# Check Result
df
res ... Output
0 {'bool': True, 'val': False} ... url2
1 None ... No Keys
2 {'bool': False, 'val': False} ... No Keys

Related

What is the best approach to remove in pandas all columns with all values equals to False?

I've seen in this question how to drop columns with all nan, but I'm looking for a way to remove all columns with all False values.
Using the info in that question, I'm thinking of replacing False with nan, dropping them, and then replacing nan back with False, but I don't know if that is the best approach.
A working piece of code with my approach would be as follows:
df = pd.DataFrame(data={'A':[True, True, False], 'B': [False, True, False], 'C':[False, False, False], 'D': [True, True, True]})
df.replace(to_replace=False, value=np.nan, inplace=True)
df.dropna(axis=1, how='all', inplace=True)
df.fillna(False, inplace=True)
You could use:
df.loc[:,~df.eq(False).all()]
Output:
A B D
0 True False True
1 True True True
2 False False True

Pandas: interpreting linked boolean conditions without using a for loop

I would like to achieve the following results in the column stop (based on columns price, limit and strength) without using a stupidly slow for loop.
The difficulty: the direction of the switch (False to True or True to False) of the first condition (price and limit) impact the interpretation of the remaining one (strength).
Here is a screenshot of the desired result with my comments and explanations:
Here is the code to replicate the above DataFrame:
import pandas as pd
# initialise data of lists.
data = {'price':[1,3,2,5,3,3,4,5,6,5,3],
'limit':[1.2,3.3,2.1,4.5,3.5,3.8,3,4.5,6.3,4.5,3.5],
'strength': [False, False, False, False, False, True, True, True, True, False, False],
'stop': [True, True, True, True, True, True, False, False, False, False, True]}
# Create DataFrame
df = pd.DataFrame(data)
Many thanks in advance for your help.

Splitting a numpy array / pandas dataframe by boolean delimiters

Assume a numpy array (actually Pandas) of the form:
[value, included,
0.123, False,
0.127, True,
0.140, True,
0.111, False,
0.159, True,
0.321, True,
0.444, True,
0.323, True,
0.432, False]
I'd like to split the array such that False elements are excluded and successive runs of True elements are split into their own array. So for the above case, we'd end up with:
[[0.127, True,
0.140, True],
[0.159, True,
0.321, True,
0.444, True,
0.323, True]]
I can certainly do this by pushing individual elements onto lists, but surely there must be a more numpy-ish way to do this.
You can create groups by inverse mask by ~ with Series.cumsum and filter only Trues by boolean indexing, then create list of DataFrames by DataFrame.groupby:
dfs = [v for k, v in df.groupby((~df['included']).cumsum()[df['included']])]
print (dfs)
[ value included
1 0.127 True
2 0.140 True, value included
4 0.159 True
5 0.321 True
6 0.444 True
7 0.323 True]
Also is possible convert Dataframes to arrays by DataFrame.to_numpy:
dfs = [v.to_numpy() for k, v in df.groupby((~df['included']).cumsum()[df['included']])]
print (dfs)
[array([[0.127, True],
[0.14, True]], dtype=object), array([[0.159, True],
[0.321, True],
[0.444, True],
[0.32299999999999995, True]], dtype=object)]

Fill forward a DataFrame with matching values

I have a DataFrame of booleans. I would like to replace the 2 False values that are directly positioned after a True value. I thought the .replace() method would do it since the 5th example seems to be what I am looking for.
Here is what I do:
dataIn = pd.DataFrame([False, False, False, True, False, False, False, False])
dataOut = dataIn.replace(to_replace=False, method='ffill', limit=2)
>>> TypeError: No matching signature found
Here is the output I am looking for:
dataOut = pd.DataFrame([False, False, False, True, True, True, False, False])
# create a series not a dateframe
# if you have a dataframe then assign to a new variable as a series
# s = df['bool_col']
s = pd.Series([False, True, False, True, False, False, False, False])
# create a mask based on the logic using shift
mask = (s == False) & (((s.shift(1) == True) & (s.shift(-1) == False))\
| ((s.shift(2) == True) & (s.shift(1) == False)))
# numpy.where to create the new output
np.where(mask, True, s)
# array([False, True, False, True, True, True, False, False])
# assign to a new column in the frame (if you want)
# df['new_col'] = np.where(mask, True, s)
Define a function which conditionally replaces 2 first elements with True:
def condRepl(grp):
rv = grp.copy()
if grp.size >= 2 and grp.eq(False).all():
rv.iloc[0:2] = [True] * 2
return rv
The condition triggering this replace is:
group has 2 elements or more,
the group is composed solely of False values.
Then, using this function, transform each group of "new" values
(each change in the value starts a new group):
dataIn[0] = dataIn[0].groupby(s.ne(s.shift()).cumsum()).transform(condRepl)
Thanks for both answers above. But actually, it seems the .replace() can be used, but it does not entirely handle booleans.
By replacing them temporarily by int, it is possible to use it:
dataIn = pd.DataFrame([False, False, False, True, False, False, False, False])
dataOut = dataIn.astype(int).replace(to_replace=False, method='ffill', limit=2).astype(bool)

How to convert my out to dataframe type not series

How to convert my out to dataframe
def new_func(x):
d1 = (x['response'])
new_func(df)
0 {'bool': False, 'is_doc': True}
1 {'bool': False, 'is_doc': True}
Name: response_dl_back_url, dtype: object
if i change to d1 = df['response'] then also its not working
My Expected out
{'bool': False, 'is_doc': True}
{'bool': False, 'is_doc': True}
Basically when calling from the data frame it has to dict type not object
I guess you have a string representation of a dictionary, here is one way, we will still need to hold the dictionary in something so I'll nest it within a dictionary.
print(df)
response
0 {'bool': False, 'is_doc': True}
1 {'bool': False, 'is_doc': True}
from ast import literal_eval
d = {}
counter = 0
for x in df['response'].tolist():
counter += 1
d[counter] = literal_eval(x)
print(d)
{1: {'bool': False, 'is_doc': True}, 2: {'bool': False, 'is_doc': True}}
type(d)
dict
you can try this:
d1 = df['response'] # Series
d1 = df[['response']] # DataFrame