Continued from this thread: get subsection of df based on multiple conditions
I would like to pull given rows based on multiple conditions which are stored in a Series object.
columns = ['is_net', 'is_pct', 'is_mean', 'is_wgted', 'is_sum']
index = ['a','b','c','d']
data = [['True','True','False','False', 'False'],
['True','True','True','False', 'False'],
['True','True','False','False', 'True'],
['True','True','False','True', 'False']]
df = pd.DataFrame(columns=columns, index=index, data=data)
df
is_net is_pct is_mean is_wgted is_sum
a True True False False False
b True True True False False
c True True False False True
d True True False True False
My conditions:
d={'is_net': 'True', 'is_sum': 'True'}
s=pd.Series(d)
Expected output:
is_net is_pct is_mean is_wgted is_sum
c True True False False True
My failed attempt:
(df == s).all(axis=1)
a False
b False
c False
d False
dtype: bool
Not sure why 'c' is False when the two conditions were met.
Note, I can achieve the desired results like this but I would rather use the Series method.
df[(df['is_net']=='True') & (df['is_sum']=='True')]
As you only have 2 conditions we can sum these and filter the df:
In [55]:
df[(df == s).sum(axis=1) == 2]
Out[55]:
is_net is_pct is_mean is_wgted is_sum
c True True False False True
This works because booleans convert to 1 and 0 for True and False:
In [56]:
(df == s).sum(axis=1)
Out[56]:
a 1
b 1
c 2
d 1
dtype: int64
You could modify a little bit your solution by adding subset for your columns:
In [219]: df[(df == s)[['is_net', 'is_sum']].all(axis=1)]
Out[219]:
is_net is_pct is_mean is_wgted is_sum
c True True False False True
or:
In [219]: df[(df == s)[s.index].all(axis=1)]
Out[219]:
is_net is_pct is_mean is_wgted is_sum
c True True False False True
Related
I have this table:
df1 = pd.DataFrame(data={'col1': ['a', 'e', 'a', 'e'],
'col2': ['e', 'a', 'c', 'b'],
'col3': ['c', 'b', 'b', 'a']},
index=pd.Series([1, 2, 3, 4], name='index'))
index
col1
col2
col3
1
a
e
c
2
e
a
b
3
a
c
b
4
e
b
a
and this list:
all_vals = ['a', 'b', 'c', 'd', 'e' 'f']
How do I make boolean columns from df1 such that it includes all columns from the all_vals list, even if the value is not in df1?
index
a
b
c
d
e
f
1
TRUE
FALSE
TRUE
FALSE
TRUE
FALSE
2
TRUE
TRUE
FALSE
FALSE
TRUE
FALSE
3
TRUE
TRUE
TRUE
FALSE
FALSE
FALSE
4
TRUE
TRUE
FALSE
FALSE
TRUE
FALSE
You can iterate over all_vals to check if the value exists and create new column
for val in all_vals:
df1[val] = (df1 == val).any(axis=1)
Use get_dummies with aggregate max per columns and DataFrame.reindex:
df1 = (pd.get_dummies(df1, dtype=bool, prefix='', prefix_sep='')
.groupby(axis=1, level=0).max()
.reindex(all_vals, axis=1, fill_value=False))
print (df1)
a b c d e f
index
1 True False True False True False
2 True True False False True False
3 True True True False False False
4 True True False False True False
I want to create mask matrix, are there any function like this in Tensorflow,
function(true_line=2, row_length=5, column_length=3)
Output:
True True True
True True True
False False False
False False False
False False False
The first true_line are filled with True, and the column length is 3, the total row length is 5.
You can use tf.stack to create above mask matrix.
Example:
import tensorflow as tf
a = tf.constant([True, True,True])
b = tf.constant([True, True,True])
c = tf.constant([False, False,False])
d = tf.constant([False, False,False])
e = tf.constant([False, False,False])
f = tf.constant([False, False,False])
filter = tf.stack([a, b, c,d, e, f])
print(filter)
Output:
tf.Tensor(
[[ True True True]
[ True True True]
[False False False]
[False False False]
[False False False]
[False False False]], shape=(6, 3), dtype=bool)
Reference: https://www.tensorflow.org/api_docs/python/tf/stack
For this table:
I would like to generate the 'desired_output' column. One way to achieve this maybe:
All the True values from col_1 are transferred straight across to desired_output (red arrow)
In desired_output, place a True value above any existing True value (green arrow)
Code I have tried:
df['desired_output']=df.col_1.apply(lambda x: True if x.shift()==True else False)
Thankyou
You can chain by | for bitwise OR original with shifted values by Series.shift:
d = {"col1":[False,True,True,True,False,True,False,False,True,False,False,False]}
df = pd.DataFrame(d)
df['new'] = df.col1 | df.col1.shift(-1)
print (df)
col1 new
0 False True
1 True True
2 True True
3 True True
4 False True
5 True True
6 False False
7 False True
8 True True
9 False False
10 False False
11 False False
try this
df['desired_output'] = df['col_1']
df.loc[1:, 'desired_output'] = df.col_1[1:].values | df.col_1[:-1].values
print(df)
In case those are saved as string. all_caps (TRUE / FALSE)
Input:
col_1
0 True
1 True
2 False
3 True
4 True
5 False
6 Flase
7 True
8 False
Code:
df['desired']=df['col_1']
for i, e in enumerate(df['col_1']):
if e=='True':
df.at[i-1,'desired']=df.at[i,'col_1']
df = df[:(len(df)-1)]
df
Output:
col_1 desired
0 True True
1 True True
2 False True
3 True True
4 True True
5 False False
6 Flase True
7 True True
8 False False
I'm using Pandas DataFrames. I'm looking to identify all rows where both columns A and B == True, then represent in Column C the all points on other side of that intersection where only A or B is still true but not the other. For example:
A B C
0 False False False
1 True False True
2 True True True
3 True True True
4 False True True
5 False False False
6 True False False
7 True False False
I can find the direct overlaps quite easily:
df.loc[(df['A'] == True) & (df['B'] == True), 'C'] = True
... however this does not take into account the overlap need.
I considered creating column 'C' in this way, then grouping each column:
grp_a = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
grp_b = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
grp_c = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
From there I thought to iterate over the indexes in grp_c.indices and test the indices in grp_a and grp_b against those, find the min/max index of A and B and update column C. This feels like an inefficient way of getting to the result I want though.
Ideas?
Try this:
#Input df just columns 'A' and 'B'
df = df[['A','B']]
df['C'] = df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)
print(df)
Output:
A B C
0 False False False
1 True False True
2 True True True
3 True True True
4 False True True
5 False False False
6 True False False
7 True False False
Explanation:
First, create column 'C' with the assignment of minimum value, what this does is to ass True to C where both A and B are True. Next, using
df[['A','B']].max(1) == 0
0 True
1 False
2 False
3 False
4 False
5 True
6 False
7 False
dtype: bool
We can find all of the records were A and B are both False. Then we use cumsum to create a count of those False False records. Allowing us to create grouping of records with the False False recording having a count up until the next False False record which gets incremented.
(df[['A','B']].max(1) == 0).cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
dtype: int32
Let's group the dataframe with the newly assigned column C by this grouping created with cumsum. Then take the maximum value of column C from that group. So, if the group has a True True record, assign True to all the records in that group. Lastly, use mask to turn the first False False record back to False.
df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)
0 False
1 True
2 True
3 True
4 True
5 False
6 False
7 False
Name: C, dtype: bool
And, assign that series to df['C'] overwriting the temporarily assigned C in the statement.
df['C'] = df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)
I have three boolean fields, where their count is shown below:
I want to draw a bar chart that have
Offline_RetentionByTime with 37528
Offline_RetentionByCount with 29640
Offline_RetentionByCapacity with 3362
How to achieve that?
I think you can use apply value_counts for creating new df1 and then DataFrame.plot.bar:
df = pd.DataFrame({'Offline_RetentionByTime':[True,False,True, False],
'Offline_RetentionByCount':[True,False,False,True],
'Offline_RetentionByCapacity':[True,True,True, False]})
print (df)
Offline_RetentionByCapacity Offline_RetentionByCount Offline_RetentionByTime
0 True True True
1 True False False
2 True False True
3 False True False
df1 = df.apply(pd.value_counts)
print (df1)
Offline_RetentionByCapacity Offline_RetentionByCount \
True 3 2
False 1 2
Offline_RetentionByTime
True 2
False 2
df1.plot.bar()
If need plot only True values select by loc:
df1.loc[True].plot.bar()