Suppose I have one input 1D tensor, I want to get indices for unique elements in 1D tensor.
input 1D tensor
[ 1 3 0 0 0 3 5 6 8 9 12 2 5 7 0 11 6 7 0 0]
expected output
Values: [1, 3, 0, 5, 6, 8, 9, 12, 2, 7, 11]
indices: [0, 1, 2, 6, 7, 8, 9, 10, 11, 13, 15]
Here is my strategy now.
input = [ 1, 3, 0, 0, 0, 3, 5, 6, 8, 9, 12, 2, 5, 7, 0, 11, 6, 7, 0, 0,]
unique_value_in_input, _ = tf.unique(input) # [1 3 0 5 6 8 9 12 2 7 11]
number_of_unique_value = tf.shape(unique_value_in_input)[0] #11
y = tf.reshape(y, (number_of_unique_value, 1)) #[[1], [3], [0], [5], [6], [8], [9], ..]
input_matrix = tf.tile(input, [number_of_unique_value]) # repeat the tensor for tf.equal()
input_matrix = tf.reshape(input, [number_of_unique_value,-1])
cols = tf.where(tf.equal(input_matrix, y))[:,-1] #[[ 0 0] [ 1 1] [ 1 5] [ 2 6] [ 2 12] ...]
Since I will have repeat value in tf.where() step, which means I have duplicated True in result.
Is there any function I can use in this issue?

You should be able to do the following and get the desired output. We do the following. For each value in unique values, you get a boolean tensor and get the maximum index (i.e only the first maximum index) through tf.argmax.
import tensorflow as tf
input = tf.constant([ 1, 3, 0, 0, 0, 3, 5, 6, 8, 9, 12, 2, 5, 7, 0, 11, 6, 7, 0, 0,], tf.int64)
unique_vals, _ = tf.unique(input)
res = tf.map_fn(
lambda x: tf.argmax(tf.cast(tf.equal(input, x), tf.int64)),
with tf.Session() as sess:


Find duplicated sequences in numpy.array or pandas column

For example, I have got an array like this:
([ 1, 5, 7, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5 ])
I need to find all duplicated sequences , not values, but sequences of at least two values one by one.
The result should be like this:
of length 2: [1, 5] with indexes (0, 16);
of length 3: [3, 3, 7] with indexes (6, 12); [7, 9, 4] with indexes (2, 8)
The long sequences should be excluded, if they are not duplicated. ([5, 5, 5, 5]) should NOT be taken as [5, 5] on indexes (0, 1, 2)! It's not a duplicate sequence, it's one long sequence.
I can do it with pandas.apply function, but it calculates too slow, swifter did not help me.
And in real life I need to find all of them, with length from 10 up to 100 values one by one on database with 1500 columns with 700 000 values each. So i really do need a vectorized decision.
Is there a vectorized decision for finding all at once? Or at least for finding only 10-values sequences? Or only 4-values sequences? Anything, that will be fully vectorized?
One possible implementation (although not fully vectorized) that finds all sequences of size n that appear more than once is the following:
import numpy as np
def repeated_sequences(arr, n):
Na = arr.size
r_seq = np.arange(n)
n_seqs = arr[np.arange(Na - n + 1)[:, None] + r_seq]
unique_seqs = np.unique(n_seqs, axis=0)
comp = n_seqs == unique_seqs[:, None]
M = np.all(comp, axis=-1)
if M.any():
matches = np.array(
[np.convolve(M[i], np.ones((n), dtype=int)) for i in range(M.shape[0])]
repeated_inds = np.count_nonzero(matches, axis=-1) > n
repeated_matches = matches[repeated_inds]
idxs = np.argwhere(repeated_matches > 0)[::n]
grouped_idxs = np.split(
idxs[:, 1], np.unique(idxs[:, 0], return_index=True)[1][1:]
return [], []
return unique_seqs[repeated_inds], grouped_idxs
In theory, you could replace
matches = np.array(
[np.convolve(M[i], np.ones((n), dtype=int)) for i in range(M.shape[0])]
matches = scipy.signal.convolve(
M, np.ones((1, n), dtype=int), mode="full"
which would make the whole thing "fully vectorized", but my tests showed that this was 3 to 4 times slower than the for-loop. So I'd stick with that. Or simply,
matches = np.apply_along_axis(np.convolve, -1, M, np.ones((n), dtype=int))
which does not have any significant speed-up, since it's basically a hidden loop (see this).
This is based off #Divakar's answer here that dealt with a very similar problem, in which the sequence to look for was provided. I simply made it so that it could follow this procedure for all possible sequences of size n, which are found inside the function with n_seqs = arr[np.arange(Na - n + 1)[:, None] + r_seq]; unique_seqs = np.unique(n_seqs, axis=0).
For example,
>>> a = np.array([1, 5, 7, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5])
>>> repeated_seqs, inds = repeated_sequences(a, n)
>>> for i, seq in enumerate(repeated_seqs[:10]):
...: print(f"{seq} with indexes {inds[i]}")
[3 3 7] with indexes [ 6 12]
[7 9 4] with indexes [2 8]
The long sequences should be excluded, if they are not duplicated. ([5, 5, 5, 5]) should NOT be taken as [5, 5] on indexes (0, 1, 2)! It's not a duplicate sequence, it's one long sequence.
This is not directly taken into account and the sequence [5, 5] would appear more than once according to this algorithm. You could do something like this, based off #Paul's answer here, but it involves a loop:
import numpy as np
repeated_matches = np.array([[0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
idxs = np.argwhere(repeated_matches > 0)
grouped_idxs = np.split(
idxs[:, 1], np.unique(idxs[:, 0], return_index=True)[1][1:]
>>> print(grouped_idxs)
[array([ 6, 7, 8, 12, 13, 14], dtype=int64),
array([ 7, 8, 9, 10], dtype=int64)]
# If there are consecutive numbers in grouped_idxs, that means that there is a long
# sequence that should be excluded. So, you'd have to check for consecutive numbers
filtered_idxs = []
for idx in grouped_idxs:
if not all((idx[1:] - idx[:-1]) == 1):
>>> print(filtered_idxs)
[array([ 6, 7, 8, 12, 13, 14], dtype=int64)]
Some tests:
>>> n = 3
>>> a = np.array([1, 5, 7, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5])
>>> %timeit repeated_sequences(a, n)
414 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> n = 4
>>> a = np.random.randint(0, 10, (10000,))
>>> %timeit repeated_sequences(a, n)
3.88 s ± 54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> result, _ = repeated_sequences(a, n)
>>> result.shape
(2637, 4)
This is not the most efficient implementation by far, but it works as a 2D approach. Plus, if there aren't any repeated sequences, it returns empty lists.
EDIT: Full implementation
I vectorized the routine I added in the Disclaimer section as a possible solution to the long sequence problem and ended up with the following:
import numpy as np
# Taken from:
def stack_padding(it):
def resize(row, size):
new = np.array(row)
return new
row_length = max(it, key=len).__len__()
mat = np.array([resize(row, row_length) for row in it])
return mat
def repeated_sequences(arr, n):
Na = arr.size
r_seq = np.arange(n)
n_seqs = arr[np.arange(Na - n + 1)[:, None] + r_seq]
unique_seqs = np.unique(n_seqs, axis=0)
comp = n_seqs == unique_seqs[:, None]
M = np.all(comp, axis=-1)
repeated_seqs = []
idxs_repeated_seqs = []
if M.any():
matches = np.apply_along_axis(np.convolve, -1, M, np.ones((n), dtype=int))
repeated_inds = np.count_nonzero(matches, axis=-1) > n
if repeated_inds.any():
repeated_matches = matches[repeated_inds]
idxs = np.argwhere(repeated_matches > 0)
grouped_idxs = np.split(
idxs[:, 1], np.unique(idxs[:, 0], return_index=True)[1][1:]
# Additional routine
# Pad this uneven array with zeros so that we can use it normally
grouped_idxs = np.array(grouped_idxs, dtype=object)
padded_idxs = stack_padding(grouped_idxs)
# Find the indices where there are padded zeros
pad_positions = padded_idxs == 0
# Perform the "consecutive-numbers check" (this will take one
# item off the original array, so we have to correct for its shape).
idxs_to_remove= np.pad(
(padded_idxs[:, 1:] - padded_idxs[:, :-1]) == 1,
[(0, 0), (0, 1)],
pad_positions = np.argwhere(pad_positions)
i = pad_positions[:, 0]
j = pad_positions[:, 1] - 1 # Shift by one (shape correction)
idxs_to_remove[i, j] = True # Masking, since we don't want pad indices
# Obtain a final mask (boolean opposite of indices to remove)
final_mask = ~idxs_to_remove.all(axis=-1)
grouped_idxs = grouped_idxs[final_mask] # Filter the long sequences
repeated_seqs = unique_seqs[repeated_inds][final_mask]
# In order to get the correct indices, we must first limit the
# search to a shape (on axis=1) of the closest multiple of n.
# This will avoid taking more indices than we should to show where
# each repeated sequence begins
to = padded_idxs.shape[1] & (-n)
# Build the final list of indices (that goes from 0 - to with
# a step of n
idxs_repeated_seqs = [
grouped_idxs[i][:to:n] for i in range(grouped_idxs.shape[0])
return repeated_seqs, idxs_repeated_seqs
For example,
n = 2
examples = [
# First example is your original example array.
np.array([1, 5, 7, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5]),
# Second example has a long sequence of 5's, and since there aren't
# any [5, 5] anywhere else, it's not taken into account and therefore
# should not come out.
np.array([1, 5, 5, 5, 5, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5]),
# Third example has the same long sequence but since there is a [5, 5]
# later, then it should take it into account and this sequence should
# be found.
np.array([1, 5, 5, 5, 5, 6, 5, 5, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5]),
# Fourth example has a [5, 5] first and later it has a long sequence of
# 5's which are uneven and the previous implementation got confused with
# the indices to show as the starting indices. In this case, it should be
# 1, 13 and 15 for [5, 5].
np.array([1, 5, 5, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 5, 5, 5, 5, 5]),
for a in examples:
print(f"\nExample: {a}")
repeated_seqs, inds = repeated_sequences(a, n)
for i, seq in enumerate(repeated_seqs):
print(f"\t{seq} with indexes {inds[i]}")
Output (as expected):
Example: [1 5 7 9 4 6 3 3 7 9 4 0 3 3 7 8 1 5]
[1 5] with indexes [0 16]
[3 3] with indexes [6 12]
[3 7] with indexes [7 13]
[7 9] with indexes [2 8]
[9 4] with indexes [3 9]
Example: [1 5 5 5 5 6 3 3 7 9 4 0 3 3 7 8 1 5]
[1 5] with indexes [0 16]
[3 3] with indexes [6 12]
[3 7] with indexes [7 13]
Example: [1 5 5 5 5 6 5 5 7 9 4 0 3 3 7 8 1 5]
[1 5] with indexes [ 0 16]
[5 5] with indexes [1 3 6]
Example: [1 5 5 9 4 6 3 3 7 9 4 0 3 5 5 5 5 5]
[5 5] with indexes [ 1 13 15]
[9 4] with indexes [3 9]
You can test it out yourself with more examples and more cases. Keep in mind this is what I understood from your disclaimer. If you want to count the long sequences as one, even if multiple sequences are in there (for example, [5, 5] appears twice in [5, 5, 5, 5]), this won't work for you and you'd have to come up with something else.

A question about numpy ndarray transformation

any simple way to change this array
[[ 3 4 0 1 2]
[ 8 9 5 6 7]
[13 14 10 11 12]]
[[ 0 0 0 1 2]
[ 0 0 5 6 7]
[ 0 0 10 11 12]]
Edit: maximum supported dimension for an ndarray is 32, found 306 for transpose
Use Slicing:
>>> a[:,:2] = 0
>>> a
array([[ 0, 0, 0, 1, 2],
[ 0, 0, 5, 6, 7],
[ 0, 0, 10, 11, 12]])

Efficiently construct numpy matrix from offset ranges of 1D array [duplicate]

Lets say I have a Python Numpy array a.
a = numpy.array([1,2,3,4,5,6,7,8,9,10,11])
I want to create a matrix of sub sequences from this array of length 5 with stride 3. The results matrix hence will look as follows:
One possible way of implementing this would be using a for-loop.
result_matrix = np.zeros((3, 5))
for i in range(0, len(a), 3):
result_matrix[i] = a[i:i+5]
Is there a cleaner way to implement this in Numpy?
Approach #1 : Using broadcasting -
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides -
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
Starting in Numpy 1.20, we can make use of the new sliding_window_view to slide/roll over windows of elements.
And coupled with a stepping [::3], it simply becomes:
from numpy.lib.stride_tricks import sliding_window_view
# values = np.array([1,2,3,4,5,6,7,8,9,10,11])
sliding_window_view(values, window_shape = 5)[::3]
# array([[ 1, 2, 3, 4, 5],
# [ 4, 5, 6, 7, 8],
# [ 7, 8, 9, 10, 11]])
where the intermediate result of the sliding is:
sliding_window_view(values, window_shape = 5)
# array([[ 1, 2, 3, 4, 5],
# [ 2, 3, 4, 5, 6],
# [ 3, 4, 5, 6, 7],
# [ 4, 5, 6, 7, 8],
# [ 5, 6, 7, 8, 9],
# [ 6, 7, 8, 9, 10],
# [ 7, 8, 9, 10, 11]])
Modified version of #Divakar's code with checking to ensure that memory is contiguous and that the returned array cannot be modified. (Variable names changed for my DSP application).
def frame(a, framelen, frameadv):
"""frame - Frame a 1D array
a - 1D array
framelen - Samples per frame
frameadv - Samples between starts of consecutive frames
Set to framelen for non-overlaping consecutive frames
Modified from Divakar's 10/17/16 11:20 solution:
Assumes array is contiguous
Output is not writable as there are multiple views on the same memory
if not isinstance(a, np.ndarray) or \
not (a.flags['C_CONTIGUOUS'] or a.flags['F_CONTIGUOUS']):
raise ValueError("Input array a must be a contiguous numpy array")
# Output
nrows = ((a.size-framelen)//frameadv)+1
oshape = (nrows, framelen)
# Size of each element in a
n = a.strides[0]
# Indexing in the new object will advance by frameadv * element size
ostrides = (frameadv*n, n)
return np.lib.stride_tricks.as_strided(a, shape=oshape,
strides=ostrides, writeable=False)

tensorflow expand counts into ranges

We have a Tensor of unknown length N, containing some int32 values.
How can we generate another Tensor that will contain N ranges concatenated together, each one between 0 and the int32 value from the original tensor ?
For example, if we have [4, 4, 5, 3, 1], the output Tensor should look like [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0].
Thank you for any advice.
You can make this work with a tensor as input by using a tf.RaggedTensor which can contain dimensions of non-uniform length.
# Or any other N length tensor
tf_counts = tf.convert_to_tensor([4, 4, 5, 3, 1])
# [4 4 5 3 1]
# Create a ragged tensor, each row is a sequence of length tf_counts[i]
tf_ragged = tf.ragged.range(tf_counts)
# <tf.RaggedTensor [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2], [0]]>
# Read values
tf.print(tf_ragged.flat_values, summarize=-1)
# [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0]
For this 2-dimensional case the ragged tensor tf_ragged is a “matrix“ of rows with varying length:
[[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2],
Check tf.ragged.range for more options on how to create the sequences on each row: starts for inclusive lower limits, limits for exclusive upper limit, deltas for increment. Each may vary for each sequence.
Also mind that the dtype of the tf_counts tensor will propagate to the final values.
If you want to have everything as a tensorflow object, then use tf.range() along with tf.concat().
In [88]: vals = [4, 4, 5, 3, 1]
In [89]: tf_range = [tf.range(0, limit=item, dtype=tf.int32) for item in vals]
# concat all `tf_range` objects into a single tensor
In [90]: concatenated_tensor = tf.concat(tf_range, 0)
In [91]: concatenated_tensor.eval()
Out[91]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
There're other approaches to do this as well. Here, I assume that you want a constant tensor but you can construct any tensor once you have the full range list.
First, we construct the full range list using a list comprehension, make a flat list out of it, and then construct a tensor.
In [78]: from itertools import chain
In [79]: vals = [4, 4, 5, 3, 1]
In [80]: range_list = list(chain(*[range(item) for item in vals]))
In [81]: range_list
Out[81]: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
In [82]: const_tensor = tf.constant(range_list, dtype=tf.int32)
In [83]: const_tensor.eval()
Out[83]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
On the other hand, we can also use tf.range() but then it returns an array when you evaluate it. So, you'd have to construct the list from the arrays and then make a flat list out of it and finally construct the tensor as in the following example.
list_of_arr = [tf.range(0, limit=item, dtype=tf.int32).eval() for item in vals]
range_list = list(chain(*[arr.tolist() for arr in list_of_arr]))
# output
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
const_tensor = tf.constant(range_list, dtype=tf.int32)
#output tensor as numpy array
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)

Deleting Chained Duplicates

Lets say I have a list:
lits = [1, 1, 1, 2, 0, 0, 0, 0, 3, 3, 1, 4, 5, 2, 2, 2, 0, 0, 0]
and i need this to become [1, 1, 2, 0, 0, 3, 3, 1, 4, 5, 2, 2, 0, 0]
(Delete duplicates, but only in a chain of duplicates. Going to do this on a huge HDF5 file, with pandas, numpy. Would rather not use a for loop iterating through all elements.
table = table.drop_duplicates(cols='[SPEED OVER GROUND [kts]]', take_last=True)
Is there a modification I can do to this code?
In pandas you can do a boolean mask, selecting a row only if it is differs from either the preceding or succeeding value:
>>> df=pd.DataFrame({ 'lits':lits })
>>> df[ (df.lits != df.lits.shift(1)) | (df.lits != df.lits.shift(-1)) ]
0 1
2 1
3 2
4 0
7 0
8 3
9 3
10 1
11 4
12 5
13 2
15 2
16 0
18 0