Reshape a variable numpy array - numpy

Suppose i have a numpy array u with a given shape, a a divisor d of the total number of entries in u. How can i fastly reshape u to be shaped (something,d) ?
The case where u is just a double should be included as well -> (1,1)
The case where u is empty should become a (0,d) shaped array

You want to use reshape
u.reshape(-1, d)
There is no double in Python you do you mean float ?
In short :
import numpy as np
def div_reshape(arr, div):
if arr.size == 0:
return np.empty(shape=(0, div))
elif arr.size == 1:
return arr.reshape(1, 1)
else:
return arr.reshape(-1, d)

Related

How to get the indices of x smallest elements in a large numpy matrix/multi-dimensional array (works for any number of dimensions)?

Given a large numpy matrix/multi-dimensional array, what is the best and fastest way to get the indices of the x smallest elements?
from typing import Tuple
import numpy as np
def get_indices_of_k_smallest_as_array(arr: np.ndarray, k: int) -> np.ndarray:
idx = np.argpartition(arr.ravel(), k)
return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()
def get_indices_of_k_smallest_as_tuple(arr: np.ndarray, k: int) -> Tuple:
idx = np.argpartition(arr.ravel(), k)
return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
This answer gives the correct indices, but those indices aren't sorted based on size of the elements. That's just how the introselect algorithm works, which is used by np.argpartition under the hood, https://en.wikipedia.org/wiki/Introselect.
It would be nice if the return was also sorted based on the size of the elements, ex. index 0 of the return points to the smallest element, index 1 points to the 2nd smallest element, etc.
Here's how to do it with sorting. Keep in mind that sorting the results after np.argpartition is going to be much faster than sorting the entire multi-dimensional array.
def get_indices_of_k_smallest_as_array(arr: np.ndarray, k: int) -> np.ndarray:
ravel_array = arr.ravel()
indices_on_ravel = np.argpartition(ravel_array, k)
sorted_indices_on_ravel = sorted(indices_on_ravel, key=lambda x: ravel_array[x])
sorted_indices_on_original = np.array(np.unravel_index(sorted_indices_on_ravel, arr.shape))[:, range(k)].transpose().tolist()
# for the fun of numpy indexing, you can do it this way too
# indices_on_original = np.array(np.unravel_index(indices_on_ravel, arr.shape))[:, range(k)].transpose().tolist()
# sorted_indices_on_original = sorted(indices_on_original, key=lambda x: arr[tuple(np.array(x).T)])
return sorted_indices_on_original
def get_indices_of_k_smallest_as_tuple(arr: np.ndarray, k: int) -> Tuple:
ravel_array = arr.ravel()
indices_on_ravel = np.argpartition(ravel_array, k)
sorted_indices_on_ravel = sorted(indices_on_ravel, key=lambda x: ravel_array[x])
sorted_indices_on_original = tuple(
np.array(np.unravel_index(sorted_indices_on_ravel, arr.shape))[:, range(min(k, 0), max(k, 0))]
)
return sorted_indices_on_original

Why do I have to use a.any() or a.all() in this code?

In this code below, I found that when I put a number it works, but when I put ndarray then it would post an error message.
Why do I have to use a.any() or a.all() in this case?
import numpy as np
def ht(x):
if x%2 == 1:
return 1
else:
return 0
ht(1)
[Example]
step(1): 1
step(np.array([1,2,3,4])) : The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
when evaluating if statements, you have to pass in a bool.
if var:
pass
var has to be of type bool.
if x is a number, then x%2 == 1 is a bool.
if x is a np.array, then x%2 == 1 is a np.array which isn't a bool, but rather an array of bool, in which each cell states whether *that cell* %2 == 1.
You can check if all elements in it are truthy (1) or if any of them are truthy with np.all or np.any.
This is because when np.array([1,2,3,4])%2 is performed, the output is also in np array format - array([1, 0, 1, 0]). To check whether these individual array elements are 1 or 0, one has to use the any() or all() function. There is no problem when we pass a single element.
So, here is the modified code -
import numpy as np
def ht(x):
if all(x%2 == 1): #denotes true when all modulus results are == 1
return 1
else:
return 0
ht(np.array([1,2,3,4]))
Output for the above code is 0
import numpy as np
def ht(x):
if any(x%2 == 1): #denotes true when any modulus result is == 1
return 1
else:
return 0
ht(np.array([1,2,3,4]))
Output for the above code is 1

Numpy masking 3d array

I'm not sure how to achieve the following (preferably without a loop).
I have a numpy array A having dimensions 100*100*3.
I also have a numpy array M having the same dimensions (100*100*3). M is actually a mask, and M[i,j] is [0,0,0] for most pairs (i,j) but for some pairs (i,j) it is not equal to [0,0,0].
What I would like to do is the following:
A[i,j] = M[i,j] when M[i,j] != [0,0,0]
A[ M != [0,0,0]] = M [ M != [0,0,0]] doesn't seem to work.
How can this be done efficiently with numpy?
You were needed to look for ALL match along the last axis and use that mask for boolean-indexing/masking -
mask = ~(M==0).all(-1) # or (M!=0).any(-1)
A[mask] = M[mask]
Or use np.where -
mask = ~(M==0).all(-1,keepdims=1)
Aout = np.where(mask, M, A)

Cython Typing List of Strings

I'm trying to use cython to improve the performance of a loop, but I'm running
into some issues declaring the types of the inputs.
How do I include a field in my typed struct which is a string that can be
either 'front' or 'back'
I have a np.recarray that looks like the following (note the length of the
recarray is unknown as compile time)
import numpy as np
weights = np.recarray(4, dtype=[('a', np.int64), ('b', np.str_, 5), ('c', np.float64)])
weights[0] = (0, "front", 0.5)
weights[1] = (0, "back", 0.5)
weights[2] = (1, "front", 1.0)
weights[3] = (1, "back", 0.0)
as well as inputs of a list of strings and a pandas.Timestamp
import pandas as pd
ts = pd.Timestamp("2015-01-01")
contracts = ["CLX16", "CLZ16"]
I am trying to cythonize the following loop
def ploop(weights, contracts, timestamp):
cwts = []
for gen_num, position, weighting in weights:
if weighting != 0:
if position == "front":
cntrct_idx = gen_num
elif position == "back":
cntrct_idx = gen_num + 1
else:
raise ValueError("transition.columns must contain "
"'front' or 'back'")
cwts.append((gen_num, contracts[cntrct_idx], weighting, timestamp))
return cwts
My attempt involved typing the weights input as a struct in cython,
in a file struct_test.pyx as follows
import numpy as np
cimport numpy as np
cdef packed struct tstruct:
np.int64_t gen_num
char[5] position
np.float64_t weighting
def cloop(tstruct[:] weights_array, contracts, timestamp):
cdef tstruct weights
cdef int i
cdef int cntrct_idx
cwts = []
for k in xrange(len(weights_array)):
w = weights_array[k]
if w.weighting != 0:
if w.position == "front":
cntrct_idx = w.gen_num
elif w.position == "back":
cntrct_idx = w.gen_num + 1
else:
raise ValueError("transition.columns must contain "
"'front' or 'back'")
cwts.append((w.gen_num, contracts[cntrct_idx], w.weighting,
timestamp))
return cwts
But I am receiving runtime errors, which I believe are related to the
char[5] position.
import pyximport
pyximport.install()
import struct_test
struct_test.cloop(weights, contracts, ts)
ValueError: Does not understand character buffer dtype format string ('w')
In addition I am a bit unclear how I would go about typing contracts as well
as timestamp.
Your ploop (without the timestamp variable) produces:
In [226]: ploop(weights, contracts)
Out[226]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]
Equivalent function without a loop:
def ploopless(weights, contracts):
arr_contracts = np.array(contracts) # to allow array indexing
wgts1 = weights[weights['c']!=0]
mask = wgts1['b']=='front'
wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]]
mask = wgts1['b']=='back'
wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]+1]
return wgts1.tolist()
In [250]: ploopless(weights, contracts)
Out[250]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]
I'm taking advantage of the fact that returned list of tuples has same (int, str, int) layout as the input weight array. So I'm just making a copy of weights and replacing selected values of the b field.
Note that I use the field selection index before the mask one. The boolean mask produces a copy, so we have to careful about indexing order.
I'm guessing that loop-less array version will be competitive in time with the cloop (on realistic arrays). The string and list operations in cloop probably limit its speedup.

Tensorflow indexing into 2d tensor with 1d tensor

I have a 2D tensor A with shape [batch_size, D] , and a 1D tensor B with shape [batch_size]. Each element of B is a column index of A, for each row of A, eg. B[i] in [0,D).
What is the best way in tensorflow to get the values A[B]
For example:
A = tf.constant([[0,1,2],
[3,4,5]])
B = tf.constant([2,1])
with desired output:
some_slice_func(A, B) -> [2,4]
There is another constraint. In practice, batch_size is actually None.
Thanks in advance!
I was able to get it working using a linear index:
def vector_slice(A, B):
""" Returns values of rows i of A at column B[i]
where A is a 2D Tensor with shape [None, D]
and B is a 1D Tensor with shape [None]
with type int32 elements in [0,D)
Example:
A =[[1,2], B = [0,1], vector_slice(A,B) -> [1,4]
[3,4]]
"""
linear_index = (tf.shape(A)[1]
* tf.range(0,tf.shape(A)[0]))
linear_A = tf.reshape(A, [-1])
return tf.gather(linear_A, B + linear_index)
This feels slightly hacky though.
If anyone knows a better (as in clearer or faster) please also leave an answer! (I won't accept my own for a while)
Code for what #Eugene Brevdo said:
def vector_slice(A, B):
""" Returns values of rows i of A at column B[i]
where A is a 2D Tensor with shape [None, D]
and B is a 1D Tensor with shape [None]
with type int32 elements in [0,D)
Example:
A =[[1,2], B = [0,1], vector_slice(A,B) -> [1,4]
[3,4]]
"""
B = tf.expand_dims(B, 1)
range = tf.expand_dims(tf.range(tf.shape(B)[0]), 1)
ind = tf.concat([range, B], 1)
return tf.gather_nd(A, ind)
the least hacky way is probably to build a proper 2d index by concatenating range(batch_size) and B, to get a batch_size x 2 matrix. then pass this to tf.gather_nd.
The simplest approach is to do:
def tensor_slice(target_tensor, index_tensor):
indices = tf.stack([tf.range(tf.shape(index_tensor)[0]), index_tensor], 1)
return tf.gather_nd(target_tensor, indices)
Consider to use tf.one_hot, tf.math.multiply and tf.reduce_sum to solve it.
e.g.
def vector_slice (inputs, inds, axis = None):
axis = axis if axis is not None else tf.rank(inds) - 1
inds = tf.one_hot(inds, inputs.shape[axis])
for i in tf.range(tf.rank(inputs) - tf.rank(inds)):
inds = tf.expand_dims(inds, axis = -1)
inds = tf.cast(inds, dtype = inputs.dtype)
x = tf.multiply(inputs, inds)
return tf.reduce_sum(x, axis = axis)