Change every n-th element of a row in a 2d numpy array depending on the row number - numpy

I have a 2d array:
H = 12
a = np.ones([H, H])
print(a.astype(int))
[[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1]]
The goal is, for every row r to substitute every r+1-th (starting with 0th) element of that row with 0.
Namely, for the 0th row substitute every 'first' (i.e. all of them) element with 0. For the 1st row substitute every 2nd element with 0. And so on.
It can trivially be done in a loop (the printed array is the desired output):
for i in np.arange(H):
a[i, ::i+1] = 0
print(a.astype(int))
[[0 0 0 0 0 0 0 0 0 0 0 0]
[0 1 0 1 0 1 0 1 0 1 0 1]
[0 1 1 0 1 1 0 1 1 0 1 1]
[0 1 1 1 0 1 1 1 0 1 1 1]
[0 1 1 1 1 0 1 1 1 1 0 1]
[0 1 1 1 1 1 0 1 1 1 1 1]
[0 1 1 1 1 1 1 0 1 1 1 1]
[0 1 1 1 1 1 1 1 0 1 1 1]
[0 1 1 1 1 1 1 1 1 0 1 1]
[0 1 1 1 1 1 1 1 1 1 0 1]
[0 1 1 1 1 1 1 1 1 1 1 0]
[0 1 1 1 1 1 1 1 1 1 1 1]]
Can I make use the vectorisation power of numpy here and avoid looping? Or it is not possible?

You can use a np.arange and broadcast modulo over itself
import numpy as np
H = 12
a = np.arange(H)
((a % (a+1)[:, None]) != 0).astype('int')
Output
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1],
[0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1],
[0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1],
[0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

Related

How can I create a new column to identify the sequence between Zero Values

I would like to create a new column in order to figure it out how many different sequences I have when I find the Zero value until the next Zero value, with 1’s values between then.
I am using R to develop such code:
I have two Scenarios: I have the Conversion Column and I'd like to create the New Column
First Scenario (when my Conversions Column starts with 1):
Conversions
New Column (The Sequence)
1
1
1
1
0
2
1
2
1
2
1
2
0
3
1
3
1
3
0
4
0
4
0
4
1
4
1
4
1
4
0
5
0
5
Second Scenario (when my Conversions Column starts with 0)
Conversions
New Column (The Sequence)
0
1
0
1
0
1
1
1
0
2
1
2
1
2
1
2
0
3
0
3
1
3
0
4
1
4
1
4
0
5
1
5
1
5
Thanks
library(dplyr)
dt1 <- tibble(
conversion = c(1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0),
sequence = c(1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5),
id = 1:17
)
dt2 <- tibble(
conversion = c(0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1),
sequence = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5),
id = 1:17
)
build_seq <- function(df) {
df %>%
mutate(
new_col = ifelse((conversion - lag(conversion, 1)) == -1, id, NA),
new_col = as.numeric(as.factor(new_col))
) %>%
tidyr::fill(new_col, .direction = "down") %>%
mutate(
new_col = ifelse(is.na(new_col), 1, new_col + 1)
)
}
new_dt1 <- build_seq(dt1)
new_dt2 <- build_seq(dt2)
all(new_dt1$new_col == new_dt1$sequence)
all(new_dt2$new_col == new_dt2$sequence)

How to choose 2D diagonals of a 3D NumPy array

I define an array as :
XRN =np.array([[[0,1,0,1,0,1,0,1,0,1],
[0,1,1,0,0,1,0,1,0,1],
[0,1,0,0,1,1,0,1,0,1],
[0,1,0,1,0,0,1,1,0,1],],
[[0,1,0,1,0,1,1,0,0,1],
[0,1,0,1,0,1,0,1,1,0],
[1,1,1,0,0,0,0,1,0,1],
[0,1,0,1,0,0,1,1,0,1],],
[[0,1,0,1,0,1,1,1,0,0],
[0,1,0,1,1,1,0,1,0,0],
[0,1,0,1,1,0,0,1,0,1],
[0,1,0,1,0,0,1,1,0,1],]])
print(XRN.shape,XRN)
XRN_LEN = XRN.shape[1]
I can obtain the sum of inner matrix with :
XRN_UP = XRN.sum(axis=1)
print("XRN_UP",XRN_UP.shape,XRN_UP)
XRN_UP (3, 10) [[0 4 1 2 1 3 1 4 0 4]
[1 4 1 3 0 2 2 3 1 3]
[0 4 0 4 2 2 2 4 0 2]]
I want to get the sum of all diagonals with the same shape (3,10)
I tested the code :
RIGHT = [XRN.diagonal(i,axis1=0,axis2=1).sum(axis=1) for i in range(XRN_LEN)]
np_RIGHT = np.array(RIGHT)
print("np_RIGHT=",np_RIGHT.shape,np_RIGHT)
but got
np_RIGHT= (4, 10) [[0 3 0 3 1 2 0 3 1 2]
[1 3 2 1 0 1 1 3 0 3]
[0 2 0 1 1 1 1 2 0 2]
[0 1 0 1 0 0 1 1 0 1]]
I checked all values for axis1 and axis 2 but never got the shape(3,10) : How can I do ?
axis1 axis2 shape
0 1 (4,10)
0 2 (4,4)
1 0 (4,10)
1 2 (4,3)
2 0 (4,4)
2 1 (4,3)
If I understand correctly, you want to sum all possible diagonals on the three elements separately. If that's the case, then you must apply np.diagonal on axis1=1 and axis2=2. This way, you end up with 10 diagonals per element which you sum down to 10 values per element. There are 3 elements, so the resulting shape is (10, 3):
>>> np.array([XRN.diagonal(i, 1, 2).sum(1) for i in range(XRN.shape[-1])])
array([[2, 3, 2],
[2, 1, 2],
[1, 1, 2],
[3, 2, 3],
[2, 2, 2],
[2, 2, 2],
[2, 3, 3],
[2, 2, 2],
[1, 0, 0],
[1, 1, 0]])

Pandas index clause across multiple columns in a multi-column header

I have a data frame with multi-column headers.
import pandas as pd
headers = pd.MultiIndex.from_tuples([("A", "u"), ("A", "v"), ("B", "x"), ("B", "y")])
f = pd.DataFrame([[1, 1, 0, 1], [1, 0, 0, 0], [0, 0, 1, 1], [1, 0, 1, 0]], columns = headers)
f
A B
u v x y
0 1 1 0 1
1 1 0 0 0
2 0 0 1 1
3 1 0 1 0
I want to select the rows in which either all the A columns or all the B columns are true.
I can do so explicitly.
f[f["A"]["u"].astype(bool) | f["A"]["v"].astype(bool)]
A B
u v x y
0 1 1 0 1
1 1 0 0 0
3 1 0 1 0
f[f["B"]["x"].astype(bool) | f["B"]["y"].astype(bool)]
A B
u v x y
0 1 1 0 1
2 0 0 1 1
3 1 0 1 0
I want to write a function select(f, top_level_name) where the indexing clause applies to all the columns under the same top level name such that
select(f, "A") == f[f["A"]["u"].astype(bool) | f["A"]["v"].astype(bool)]
select(f, "B") == f[f["B"]["x"].astype(bool) | f["B"]["y"].astype(bool)]
I want this function to work with arbitrary numbers of sub-columns with arbitrary names.
How do I write select?

a list as a sublist of a list from group into list

I have a dataframe, which has 2 columns,
a b
0 1 2
1 1 1
2 1 1
3 1 2
4 1 1
5 2 0
6 2 1
7 2 1
8 2 2
9 2 2
10 2 1
11 2 1
12 2 2
Is there a direct way to make a third column as below
a b c
0 1 2 0
1 1 1 1
2 1 1 0
3 1 2 1
4 1 1 0
5 2 0 0
6 2 1 1
7 2 1 0
8 2 2 1
9 2 2 0
10 2 1 0
11 2 1 0
12 2 2 0
in which target [1, 2] is a sublist of df.groupby('a').b.apply(list), find the 2 rows that firstly fit the target in every group.
df.groupby('a').b.apply(list) gives
1 [2, 1, 1, 2, 1]
2 [0, 1, 1, 2, 2, 1, 1, 2]
[1,2] is a sublist of [2, 1, 1, 2, 1] and [0, 1, 1, 2, 2, 1, 1, 2]
so far, I have a function
def is_sub_with_gap(sub, lst):
'''
check if sub is a sublist of lst
'''
ln, j = len(sub), 0
ans = []
for i, ele in enumerate(lst):
if ele == sub[j]:
j += 1
ans.append(i)
if j == ln:
return True, ans
return False, []
test on the function
In [55]: is_sub_with_gap([1,2], [2, 1, 1, 2, 1])
Out[55]: (True, [1, 3])
You can change output by select index values of groups in custom function, flatten it by Series.explode and then test index values by Index.isin:
L = [1, 2]
def is_sub_with_gap(sub, lst):
'''
check of sub is a sublist of lst
'''
ln, j = len(sub), 0
ans = []
for i, ele in enumerate(lst):
if ele == sub[j]:
j += 1
ans.append(i)
if j == ln:
return lst.index[ans]
return []
idx = df.groupby('a').b.apply(lambda x: is_sub_with_gap(L, x)).explode()
df['c'] = df.index.isin(idx).view('i1')
print (df)
a b c
0 1 2 0
1 1 1 1
2 1 1 0
3 1 2 1
4 1 1 0
5 2 0 0
6 2 1 1
7 2 1 0
8 2 2 1
9 2 2 0
10 2 1 0
11 2 1 0
12 2 2 0

Create tensors where all elements up to a given index are 1s, the rest are 0s

I have a placeholder lengths = tf.placeholder(tf.int32, [10]). Each of the 10 values assigned to this placeholder are <= 25. I now want to create a 2-dimensional tensor, called masks, of shape [10, 25], where each of the 10 vectors of length 25 has the first n elements set to 1, and the rest set to 0 - with n being the corresponding value in lengths.
What is the easiest way to do this using TensorFlow's built in methods?
For example:
lengths = [4, 6, 7, ...]
-> masks = [[1, 1, 1, 1, 0, 0, 0, 0, ..., 0],
[1, 1, 1, 1, 1, 1, 0, 0, ..., 0],
[1, 1, 1, 1, 1, 1, 1, 0, ..., 0],
...
]
You can reshape lengths to a (10, 1) tensor, then compare it with another sequence/indices 0,1,2,3,...,25, which due to broadcasting will result in True if the indices are smaller then lengths, otherwise False; then you can cast the boolean result to 1 and 0:
lengths = tf.constant([4, 6, 7])
n_features = 25
​
import tensorflow as tf
​
masks = tf.cast(tf.range(n_features) < tf.reshape(lengths, (-1, 1)), tf.int8)
with tf.Session() as sess:
print(sess.run(masks))
#[[1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]