Pandas loop from leftmost column and change values by dictinoary - pandas

I have the following dictionary and dataframe:
val_dict = {
'key1': ['val1', 'val2', 'val3'],
'key2': ['val4', 'val5']
}
df = pd.DataFrame(data={'val5': [True, False, False],
'val2': [False, True, False],
'val3': [True, True, False],
'val1': [True, False, True],
'val4': [True, True, False],
'val6': [False, False, True]},
index=pd.Series([1, 2, 3], name='index'))
index
val5
val2
val3
val1
val4
val6
1
True
False
True
True
True
False
2
False
True
True
False
True
False
3
False
False
False
True
False
True
How do I go through the dataframe from the left so that if the column is True, other columns in the val_dict values turn to False?
index
val5
val2
val3
val1
val4
val6
1
True
False
True
FALSE
FALSE
False
2
False
True
FALSE
False
True
False
3
False
False
False
True
False
True
For example, index 1 has val5 as True, so val4 switches to False because they are both assigned to the same val_dict key. Similarly, val2 is False but val3 is True, so val1 gets turned to False. Note that it should skip over val6.
I tried converting df to a dictionary with df.to_dict('index') to work with two dictionaries. However, dictionaries are unordered and the order of the columns is important, so I thought it might make the code buggy.

One way is with a combination of assign and mask:
# either val2 or val3 can be True:
com = df.filter(['val2', 'val3']).sum(1).ge(1)
# val2 is the leftmost, so start with that
(df.assign(**df.filter(['val1', 'val3']).mask(df.val2, False))
# next is the combination of val2 and val3
.assign(val1 = lambda df: df.val1.mask(com, False),
val4 = lambda df: df.val4.mask(df.val5, False))
)
Out[84]:
val5 val2 val3 val1 val4 val6
index
1 True False True False False False
2 False True False False True False
3 False False False True False True
Note that val6 is untouched, so the values remain the same.

Here's what I have with trying to convert to a dictionary:
def section_filter(df, section_dict):
result = {}
for index, vals in df.to_dict('index').items():
lst = []
for val in section_dict.values():
lst.append({k:v for k, v in vals.items() if k in val})
for k, v in vals.items():
if k not in [m for mi in section_dict.values() for m in mi]:
lst.append({k: v})
for l in lst:
for i in l:
if l[i]:
l.update({k:False for k in l.keys()})
l[i] = True
break
result[index] = {k: v for d in lst for k, v in d.items()}
return pd.DataFrame.from_dict(result, orient='index', columns=df.columns)
print(df)
print()
print(section_filter(df, val_dict))
val5 val2 val3 val1 val4 val6
index
1 True False True True True False
2 False True True False True False
3 False False False True False True
val5 val2 val3 val1 val4 val6
1 True False True False False False
2 False True False False True False
3 False False False True False True

Related

Getting boolean columns based on value presence in other columns

I have this table:
df1 = pd.DataFrame(data={'col1': ['a', 'e', 'a', 'e'],
'col2': ['e', 'a', 'c', 'b'],
'col3': ['c', 'b', 'b', 'a']},
index=pd.Series([1, 2, 3, 4], name='index'))
index
col1
col2
col3
1
a
e
c
2
e
a
b
3
a
c
b
4
e
b
a
and this list:
all_vals = ['a', 'b', 'c', 'd', 'e' 'f']
How do I make boolean columns from df1 such that it includes all columns from the all_vals list, even if the value is not in df1?
index
a
b
c
d
e
f
1
TRUE
FALSE
TRUE
FALSE
TRUE
FALSE
2
TRUE
TRUE
FALSE
FALSE
TRUE
FALSE
3
TRUE
TRUE
TRUE
FALSE
FALSE
FALSE
4
TRUE
TRUE
FALSE
FALSE
TRUE
FALSE
You can iterate over all_vals to check if the value exists and create new column
for val in all_vals:
df1[val] = (df1 == val).any(axis=1)
Use get_dummies with aggregate max per columns and DataFrame.reindex:
df1 = (pd.get_dummies(df1, dtype=bool, prefix='', prefix_sep='')
.groupby(axis=1, level=0).max()
.reindex(all_vals, axis=1, fill_value=False))
print (df1)
a b c d e f
index
1 True False True False True False
2 True True False False True False
3 True True True False False False
4 True True False False True False

Consolidating columns by the number before the decimal point in the column name

I have the following dataframe (three example columns below):
import pandas as pd
array = {'25.2': [False, True, False], '25.4': [False, False, True], '27.78': [True, False, True]}
df = pd.DataFrame(array)
25.2 25.4 27.78
0 False False True
1 True False False
2 False True True
I want to create a new dataframe with consolidated columns names, i.e. add 25.2 and 25.4 into 25 new column. If one of the values in the separate columns is True then the value in the new column is True.
Expected output:
25 27
0 False True
1 True False
2 True True
Any ideas?
use rename()+groupby()+sum():
df=(df.rename(columns=lambda x:x.split('.')[0])
.groupby(axis=1,level=0).sum().astype(bool))
OR
In 2 steps:
df.columns=[x.split('.')[0] for x in df]
#OR
#df.columns=df.columns.str.replace(r'\.\d+','',regex=True)
df=df.groupby(axis=1,level=0).sum().astype(bool)
output:
25 27
0 False True
1 True False
2 True True
Note: If you have int columns then you can use round() instead of split()
Another way:
>>> df.T.groupby(np.floor(df.columns.astype(float))).sum().astype(bool).T
25.0 27.0
0 False True
1 True False
2 True True

Tensorflow how to produce mask with first few rows are 1, the rest are 0

I want to create mask matrix, are there any function like this in Tensorflow,
function(true_line=2, row_length=5, column_length=3)
Output:
True True True
True True True
False False False
False False False
False False False
The first true_line are filled with True, and the column length is 3, the total row length is 5.
You can use tf.stack to create above mask matrix.
Example:
import tensorflow as tf
a = tf.constant([True, True,True])
b = tf.constant([True, True,True])
c = tf.constant([False, False,False])
d = tf.constant([False, False,False])
e = tf.constant([False, False,False])
f = tf.constant([False, False,False])
filter = tf.stack([a, b, c,d, e, f])
print(filter)
Output:
tf.Tensor(
[[ True True True]
[ True True True]
[False False False]
[False False False]
[False False False]
[False False False]], shape=(6, 3), dtype=bool)
Reference: https://www.tensorflow.org/api_docs/python/tf/stack

Pandas True False Matching

For this table:
I would like to generate the 'desired_output' column. One way to achieve this maybe:
All the True values from col_1 are transferred straight across to desired_output (red arrow)
In desired_output, place a True value above any existing True value (green arrow)
Code I have tried:
df['desired_output']=df.col_1.apply(lambda x: True if x.shift()==True else False)
Thankyou
You can chain by | for bitwise OR original with shifted values by Series.shift:
d = {"col1":[False,True,True,True,False,True,False,False,True,False,False,False]}
df = pd.DataFrame(d)
df['new'] = df.col1 | df.col1.shift(-1)
print (df)
col1 new
0 False True
1 True True
2 True True
3 True True
4 False True
5 True True
6 False False
7 False True
8 True True
9 False False
10 False False
11 False False
try this
df['desired_output'] = df['col_1']
df.loc[1:, 'desired_output'] = df.col_1[1:].values | df.col_1[:-1].values
print(df)
In case those are saved as string. all_caps (TRUE / FALSE)
Input:
col_1
0 True
1 True
2 False
3 True
4 True
5 False
6 Flase
7 True
8 False
Code:
df['desired']=df['col_1']
for i, e in enumerate(df['col_1']):
if e=='True':
df.at[i-1,'desired']=df.at[i,'col_1']
df = df[:(len(df)-1)]
df
Output:
col_1 desired
0 True True
1 True True
2 False True
3 True True
4 True True
5 False False
6 Flase True
7 True True
8 False False

use series to select rows from df pandas

Continued from this thread: get subsection of df based on multiple conditions
I would like to pull given rows based on multiple conditions which are stored in a Series object.
columns = ['is_net', 'is_pct', 'is_mean', 'is_wgted', 'is_sum']
index = ['a','b','c','d']
data = [['True','True','False','False', 'False'],
['True','True','True','False', 'False'],
['True','True','False','False', 'True'],
['True','True','False','True', 'False']]
df = pd.DataFrame(columns=columns, index=index, data=data)
df
is_net is_pct is_mean is_wgted is_sum
a True True False False False
b True True True False False
c True True False False True
d True True False True False
My conditions:
d={'is_net': 'True', 'is_sum': 'True'}
s=pd.Series(d)
Expected output:
is_net is_pct is_mean is_wgted is_sum
c True True False False True
My failed attempt:
(df == s).all(axis=1)
a False
b False
c False
d False
dtype: bool
Not sure why 'c' is False when the two conditions were met.
Note, I can achieve the desired results like this but I would rather use the Series method.
df[(df['is_net']=='True') & (df['is_sum']=='True')]
As you only have 2 conditions we can sum these and filter the df:
In [55]:
df[(df == s).sum(axis=1) == 2]
​
Out[55]:
is_net is_pct is_mean is_wgted is_sum
c True True False False True
This works because booleans convert to 1 and 0 for True and False:
In [56]:
(df == s).sum(axis=1)
​
Out[56]:
a 1
b 1
c 2
d 1
dtype: int64
You could modify a little bit your solution by adding subset for your columns:
In [219]: df[(df == s)[['is_net', 'is_sum']].all(axis=1)]
Out[219]:
is_net is_pct is_mean is_wgted is_sum
c True True False False True
or:
In [219]: df[(df == s)[s.index].all(axis=1)]
Out[219]:
is_net is_pct is_mean is_wgted is_sum
c True True False False True