Decomposing pairs of run-lengths in NumPy

Decomposing pairs of run-lengths in NumPy - numpy

I am wondering whether the below-described function would be possible to implement quickly and efficiently in NumPy, or whether I would have to resort to Cython.
Let us say I have two vectors representing the runs in compressed run-length encodings. These do not have to be the same length. Furthermore, a run-length can never be zero.
r1 = np.array([1, 1, 3])
r2 = np.array([1, 3, 2])
I want a function that decomposes these and returns the lengths they share and the index in the respective array to the run that was decomposed.
Like
shared_runs, idx1, idx2 = f(r1, r2)
print(shared_runs)
# [1, 1, 2, 1, 1]
print(idx1)
# [0, 1, 2, 2, nan] # nan since the length of r1 is 5
print(idx2)
# [0, 1, 1, 2, 2]
idx2 explained:
The first shared run has length 1, same as the first run of r2. Therefore the shared_runs[0] correspond to r2[0]. shared_run[1] has the value 1. However, r2[1] has value 3. Therefore both idx2[1] and idx2[2] points to 1, since the shared_run[1] and shared_run[2] together decompose the value 3.
Further examples:
r1 = np.array([5])
r2 = np.array([1])
# [1, 4], [0, 0], [0, nan]
r1 = np.array([3, 2, 1])
r2 = np.array([1, 4])
# [1, 2, 2, 1], [0, 0, 1, 2], [0, 1, 1, nan]

Related

Given two arrays, `a` and `b`, how to find efficiently all combinations of elements in `b` that have equal value in `a`?

Given two arrays, a and b, how to find efficiently all combinations of elements in b that have equal value in a?
here is an example:
Given
a = [0, 0, 0, 1, 1, 2, 2, 2, 2]
b = [1, 2, 4, 5, 9, 3, 7, 22, 10]
how would you calculate
c = [[1, 2],
[1, 4],
[2, 4],
[5, 9],
[3, 7],
[3, 22],
[3, 10],
[7, 22],
[7, 10],
[22, 10]]
?
a can be assumed to be sorted.
I can do this with loops, a la:
import torch
a = torch.tensor([0, 0, 0, 1, 1, 2, 2, 2, 2])
b = torch.tensor([1, 2, 4, 5, 9, 3, 7, 22, 10])
jumps = torch.cat((torch.tensor([0]),
torch.where(a.diff() > 0)[0] + 1,
torch.tensor([len(a)])))
cs = []
for i in range(len(jumps) - 1):
cs.append(torch.combinations(b[jumps[i]:jumps[i + 1]]))
c = torch.cat(cs)
Is there any efficient way to avoid the loop? The solution should work for CPU and CUDA.
Also, the solution should have runtime O(m * m), where m is the largest number of equal elements in a and not O(n * n) where n is the length of of a.
I prefer solutions for pytorch, but I am curious for solution for numpy as well.

I think the overhead of using torch is only justified for bigger datasets, as there is basically no computational difficulty in the function, imho you can achieve same results with:
from collections import Counter
def find_combinations1(a, b):
count_a = Counter(a)
combinations = []
for x in set(b):
if count_a[x] == b.count(x):
combinations.append(x)
return combinations
or even a simpler:
def find_combinations2(a, b):
return list(set(a) & set(b))
With pytorch I assume the most simple approach is:
import torch
def find_combinations3(a, b):
a = torch.tensor(a)
b = torch.tensor(b)
eq = torch.eq(a, b.view(-1, 1))
indices = torch.nonzero(eq)
return indices[:, 1]
This option has of course a time complexity of O(n*m) where n is the size of a and m is the size of b, and O(n+m) is the memory for the tensors.

Elegantly generate result array in numpy

I have my X and Y numpy arrays:
X = np.array([0,1,2,3])
Y = np.array([0,1,2,3])
And my function which maps x,y values to Z points:
def z(x,y):
return x+y
I wish to produce the obvious thing required for a 3D plot: the 2-dimensional numpy array for the corresponding Z-values. I believe it should look like:
Z = np.array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
I can do this in several lines, but I'm looking for the briefest most elegant piece of code.

For a function that is array aware it is more economical to use an open grid:
>>> import numpy as np
>>>
>>> X = np.array([0,1,2,3])
>>> Y = np.array([0,1,2,3])
>>>
>>> def z(x,y):
... return x+y
...
>>> XX, YY = np.ix_(X, Y)
>>> XX, YY
(array([[0],
[1],
[2],
[3]]), array([[0, 1, 2, 3]]))
>>> z(XX, YY)
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
If your grid axes are ranges you can directly create the grid using np.ogrid
>>> XX, YY = np.ogrid[:4, :4]
>>> XX, YY
(array([[0],
[1],
[2],
[3]]), array([[0, 1, 2, 3]]))
If the function is not array aware you can make it so using np.vectorize:
>>> def f(x, y):
... if x > y:
... return x
... else:
... return -x
...
>>> np.vectorize(f)(*np.ogrid[-3:4, -3:4])
array([[ 3, 3, 3, 3, 3, 3, 3],
[-2, 2, 2, 2, 2, 2, 2],
[-1, -1, 1, 1, 1, 1, 1],
[ 0, 0, 0, 0, 0, 0, 0],
[ 1, 1, 1, 1, -1, -1, -1],
[ 2, 2, 2, 2, 2, -2, -2],
[ 3, 3, 3, 3, 3, 3, -3]])

One very short way to achieve what you want is to produce a meshgrid from your coordinates:
X,Y = np.meshgrid(x,y)
z = X+Y
or more general:
z = f(X,Y)
or even in one line:
z = f(*np.meshgrid(x,y))
EDIT:
If your function also may return a constant, you have to somehow infer the dimensions that the result should have. If you want to continue using meshgrids one very simple way would be re-write your function in this way:
def f(x,y):
return x*0+y*0+a
where a would be your constant. numpy would then take care of the dimensions for you. This is of course a bit weird looking, so instead you could write
def f(x,y):
return np.full(x.shape, a)
If you really want to go with functions that work both on scalars and arrays, it's probably best to go with np.vectorize as in #PaulPanzer's answer.

tensorflow expand counts into ranges

We have a Tensor of unknown length N, containing some int32 values.
How can we generate another Tensor that will contain N ranges concatenated together, each one between 0 and the int32 value from the original tensor ?
For example, if we have [4, 4, 5, 3, 1], the output Tensor should look like [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0].
Thank you for any advice.

You can make this work with a tensor as input by using a tf.RaggedTensor which can contain dimensions of non-uniform length.
# Or any other N length tensor
tf_counts = tf.convert_to_tensor([4, 4, 5, 3, 1])
tf.print(tf_counts)
# [4 4 5 3 1]
# Create a ragged tensor, each row is a sequence of length tf_counts[i]
tf_ragged = tf.ragged.range(tf_counts)
tf.print(tf_ragged)
# <tf.RaggedTensor [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2], [0]]>
# Read values
tf.print(tf_ragged.flat_values, summarize=-1)
# [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0]
For this 2-dimensional case the ragged tensor tf_ragged is a “matrix“ of rows with varying length:
[[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2],
[0]]
Check tf.ragged.range for more options on how to create the sequences on each row: starts for inclusive lower limits, limits for exclusive upper limit, deltas for increment. Each may vary for each sequence.
Also mind that the dtype of the tf_counts tensor will propagate to the final values.

If you want to have everything as a tensorflow object, then use tf.range() along with tf.concat().
In [88]: vals = [4, 4, 5, 3, 1]
In [89]: tf_range = [tf.range(0, limit=item, dtype=tf.int32) for item in vals]
# concat all `tf_range` objects into a single tensor
In [90]: concatenated_tensor = tf.concat(tf_range, 0)
In [91]: concatenated_tensor.eval()
Out[91]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
There're other approaches to do this as well. Here, I assume that you want a constant tensor but you can construct any tensor once you have the full range list.
First, we construct the full range list using a list comprehension, make a flat list out of it, and then construct a tensor.
In [78]: from itertools import chain
In [79]: vals = [4, 4, 5, 3, 1]
In [80]: range_list = list(chain(*[range(item) for item in vals]))
In [81]: range_list
Out[81]: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
In [82]: const_tensor = tf.constant(range_list, dtype=tf.int32)
In [83]: const_tensor.eval()
Out[83]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
On the other hand, we can also use tf.range() but then it returns an array when you evaluate it. So, you'd have to construct the list from the arrays and then make a flat list out of it and finally construct the tensor as in the following example.
list_of_arr = [tf.range(0, limit=item, dtype=tf.int32).eval() for item in vals]
range_list = list(chain(*[arr.tolist() for arr in list_of_arr]))
# output
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
const_tensor = tf.constant(range_list, dtype=tf.int32)
const_tensor.eval()
#output tensor as numpy array
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)

numpy: get indices where condition holds per row

I have an array such as the following:
In [70]: x
Out[70]:
array([[0, 1, 2],
[3, 4, 5]])
I am trying to get the indices per row where a condition holds, for example, x > 1.
Expected output is like ([2], [0, 1, 2])
I have tried numpy.where, numpy.nonzero, but they give strange results.

One approach -
r,c = np.where(x>1)
out = np.split(c, np.flatnonzero(r[1:] > r[:-1])+1)
Sample run -
In [140]: x
Out[140]:
array([[0, 2, 0, 1, 1],
[2, 2, 1, 2, 0],
[0, 2, 1, 1, 0],
[1, 0, 0, 2, 2]])
In [141]: r,c = np.where(x>1)
In [142]: np.split(c, np.flatnonzero(r[1:] > r[:-1])+1)
Out[142]: [array([1]), array([0, 1, 3]), array([1]), array([3, 4])]
Alternatively, we could use np.unique on the final step, like so -
np.split(c, np.unique(r, return_index=1)[1][1:])

referencing rows in a matrix using index from another matrix

You have an original sparse matrix X:
>>print type(X)
>>print X.todense()
<class 'scipy.sparse.csr.csr_matrix'>
[[1,4,3]
[3,4,1]
[2,1,1]
[3,6,3]]
You have a second sparse matrix Z, which is derived from some rows of X (say the values are doubled so we can see the difference between the two matrices). In pseudo-code:
>>Z = X[[0,2,3]]
>>print Z.todense()
[[1,4,3]
[2,1,1]
[3,6,3]]
>>Z = Z*2
>>print Z.todense()
[[2, 8, 6]
[4, 2, 2]
[6, 12,6]]
What's the best way of retrieving the rows in Z using the ORIGINAL indices from X. So for instance, in pseudo-code:
>>print Z[[0,3]]
[[2,8,6] #0 from Z, and what would be row **0** from X)
[6,12,6]] #2 from Z, but what would be row **3** from X)
That is, how can you retrieve rows from Z, using indices that refer to the original rows position in the original matrix X? To do this, you can't modify X in anyway (you can't add an index column to the matrix X), but there are no other limits.

If you have the original indices in an array i, and the values in i are in increasing order (as in your example), you can use numpy.searchsorted(i, [0, 3]) to find the indices in Z that correspond to indices [0, 3] in the original X. Here's a demonstration in an IPython session:
In [39]: X = csr_matrix([[1,4,3],[3,4,1],[2,1,1],[3,6,3]])
In [40]: X.todense()
Out[40]:
matrix([[1, 4, 3],
[3, 4, 1],
[2, 1, 1],
[3, 6, 3]])
In [41]: i = array([0, 2, 3])
In [42]: Z = 2 * X[i]
In [43]: Z.todense()
Out[43]:
matrix([[ 2, 8, 6],
[ 4, 2, 2],
[ 6, 12, 6]])
In [44]: Zsub = Z[searchsorted(i, [0, 3])]
In [45]: Zsub.todense()
Out[45]:
matrix([[ 2, 8, 6],
[ 6, 12, 6]])

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Decomposing pairs of run-lengths in NumPy - numpy

Related

Given two arrays, `a` and `b`, how to find efficiently all combinations of elements in `b` that have equal value in `a`?

Elegantly generate result array in numpy

tensorflow expand counts into ranges

numpy: get indices where condition holds per row

referencing rows in a matrix using index from another matrix

Categories

Resources