I have a for loop that creates vectors (tf tensors) of equal length, say
a1 = [0, 2, 4 ... ]
a2 = [1, 4, 6 ... ]
...
and I want to concatenate these vectors into a matrix, along the 0th axis
matrix = [[0,2,4...] , [1,4,6...] ... ]
I can do a
matrix = tf.concat(0, [matrix, a])
inside the for loop. However the first iteration does not work, since matrix does not exist and if I initialize it to a vector, I'm stuck with that vector at the top of the end matrix. Is there a quick way of doing this?
You can use tf.stack:
matrix = tf.stack([a1, a2, ...])
Related
I want to fill in an empty 4D array. I have created a pre-allocated array (data_4d_smoothed) with 80 x 80 x 44 x 50. I want to loop through all (50) volumes of the data (data_4d), smooth them separately and store the results in data_4d_smoothed. Basically:
data_4d_smoothed = np.zeros(data_4d.shape)
sigma = 0.7
for i in data_4d[:, :, :, i]:
smoothed_vol = gaussian_filter(i, sigma=sigma)
data_4d_smoothed.append(smoothed_vol)
The gaussian_filter should take every volume (the last dimension of the 4d array), do the operation, and save it into data_4d_smoothed. But obviously, this is not a 2D array and I think I need a nested loop to fill this empty list.
I think this should work without looping:
from scipy.ndimage import gaussian_filter
data_4d = np.random.rand(80,80,44,50)
data_4d_smoothed = gaussian_filter(data_4d, sigma = (sigma, sigma, sigma, 0))
Basically make the last dimension's sigma = 0, so that it doesn't do the convolution in that dimension.
Checking:
data_4d_0 = gaussian_filter(data_4d[..., 0], sigma = sigma) #filter first image
np.allclose(data_4d_0, data_4d_smoothed[..., 0]) #first image from global filter
True
Fairly new to numpy/python here, trying to figure out some less c-like, more numpy-like coding styles.
Background
I've got some code done that takes a fixed set of x values and multiple sets of corresponding y value sets and tries to find which set of the y values are the "most linear".
It does this by going through each set of y values in a loop, calculating and storing the residual from a straight line fit of those y's against the x's, then once the loop has finished finding the index of the minimum residual value.
...sorry this might make a bit more sense with the code below.
import numpy as np
import numpy.polynomial.polynomial as poly
# set of x values
xs = [1,22,33,54]
# multiple sets of y values for each of the x values in 'xs'
ys = np.array([[1, 22, 3, 4],
[2, 3, 1, 5],
[3, 2, 1, 1],
[34,23, 5, 4],
[23,24,29,33],
[5,19, 12, 3]])
# array to store the residual from a linear fit of each of the y's against x
residuals = np.empty(ys.shape[0])
# loop through the xs's and calculate the residual of a linear fit for each
for i in range(ys.shape[0]):
_, stats = poly.polyfit(xs, ys[i], 1, full=True)
residuals[i] = stats[0][0]
# the 'most linear' of the ys's is at np.argmin:
print('most linear at', np.argmin(residuals))
Question
I'd like to know if it's possible to "numpy'ize" that into a single expression, something like
residuals = get_residuals(xs, ys)
...I've tried:
I've tried the following, but no luck (it always passes the full arrays in, not row by row):
# ------ ok try to do it without a loop --------
def wrap(x, y):
_, stats = poly.polyfit(x, y, 1, full=True)
return stats[0][0]
res = wrap(xs, ys) # <- fails as passes ys as full 2D array
res = wrap(np.broadcast_to(xs, ys.shape), ys) # <- fails as passes both as 2D arrays
Could anyone give any tips on how to numpy'ize that?
From the numpy.polynomial.polynomial.polyfit docs (not to be confused with numpy.polyfit which is not interchangable)
:
x : array_like, shape (M,)
y : array_like, shape (M,) or (M, K)
Your ys needs to be transposed to have ys.shape[0] equal to xs.shape
def wrap(x, y):
_, stats = poly.polyfit(x, y.T, 1, full=True)
return stats[0]
res = wrap(xs, ys)
res
Out[]: array([284.57337884, 5.54709898, 0.41399317, 91.44641638,
6.34982935, 153.03515358])
I have 3 tensorflow arrays (a, b, valid_entries), which share the first two dimensionalities [T, N, ?]. One of these arrays 'valid_entries' has shape [T,N,1] with boolean values. I want to randomly sample T*M 2-tuples of indices (M < N) such that valid_entries[t,m] == 1 for all of these indices.
In other words, for each time step, I want to randomly select M valid entries from a and b.
I persume that in numpy, this task would be solved by doing the following (let's skip the first dimension T for simplicity):
M = 3
N = 5
valid_entries = [[0],[1],[0],[1],[0]]
valid_indices = np.where(a==1)
valid_indices = np.random.select(valid_indices,np.min(len(valid_indices),M))
a_new = a[valid_indices]
b_new = b[valid_indices]
valid_new = valid_entries[valid_indices]
However, all this needs to happen in Tensorflow.
Thanks a ton in advance for any help!
Here is a function that does that:
import tensorflow as tf
def sample_indices(valid, m, seed=None):
valid = tf.convert_to_tensor(valid)
n = tf.size(valid)
# Flatten boolean tensor
valid_flat = tf.reshape(valid, [n])
# Get flat indices where the tensor is true
valid_idx = tf.boolean_mask(tf.range(n), valid_flat)
# Shuffled valid indices
valid_idx_shuffled = tf.random.shuffle(valid_idx, seed=seed)
# Pick sample from shuffled indices
valid_idx_sample = valid_idx_shuffled[:m]
# Unravel indices
return tf.transpose(tf.unravel_index(valid_idx_sample, tf.shape(valid)))
with tf.Graph().as_default(), tf.Session() as sess:
valid = [[ True, True, False, True],
[False, True, True, False],
[False, True, False, False]]
m = 4
print(sess.run(sample_indices(valid, m, seed=0)))
# [[1 1]
# [1 2]
# [0 1]
# [2 1]]
This sample_indices is generic for any shape of boolean tensor. If in your case valid_entries has shape (T, N, 1) then you will get a tensor with shape (M, 3) as output, although you can ignore the last column since it is always going to be zero (or you can pass tf.squeeze(valid_entries, axis=2) instead).
Note: The last tf.transpose is just to have as output a tensor with shape (sample_size, num_dimensions) instead of the other way around. However, if m is rather big and you don't mind the order of the dimensions, you may skip it to save a bit of time and memory, since (unlike its NumPy counterpart) tf.transpose produces a whole new tensor.
I would like to pad my labels so that they would be of equal length to be passed into the ctc_loss function. Apparently, -1 is not allowed. If I were to apply padding, should the padding value be part of the labels for ctc?
Update
I have this code that converts dense labels into sparse ones to be passed to the ctc_loss function which I think is related to the problem.
def dense_to_sparse(dense_tensor, out_type):
indices = tf.where(tf.not_equal(dense_tensor, tf.constant(0, dense_tensor.dtype)
values = tf.gather_nd(dense_tensor, indices)
shape = tf.shape(dense_tensor, out_type=out_type)
return tf.SparseTensor(indices, values, shape)
Actually, -1 values are allowed to be present in the y_true argument of the ctc_batch_cost with one limitation - they should not appear within the actual label "content" which is specified by label_length (here i-th label "content" would start from the index 0 and end at the index label_length[i]).
So it is perfectly fine to pad labels with -1 so that they would be of equal length, as you intended. The only thing you should take care about is to correctly calculate and pass corresponding label_length values.
Here is the sample code which is a modified version of the test_ctc unit test from keras:
import numpy as np
from tensorflow.keras import backend as K
number_of_categories = 4
number_of_timesteps = 5
labels = np.asarray([[0, 1, 2, 1, 0], [0, 1, 1, 0, -1]])
label_lens = np.expand_dims(np.asarray([5, 4]), 1)
# dimensions are batch x time x categories
inputs = np.zeros((2, number_of_timesteps, number_of_categories), dtype=np.float32)
input_lens = np.expand_dims(np.asarray([5, 5]), 1)
k_labels = K.variable(labels, dtype="int32")
k_inputs = K.variable(inputs, dtype="float32")
k_input_lens = K.variable(input_lens, dtype="int32")
k_label_lens = K.variable(label_lens, dtype="int32")
res = K.eval(K.ctc_batch_cost(k_labels, k_inputs, k_input_lens, k_label_lens))
It runs perfectly fine even with -1 as the last element of the (second) labels sequence because corresponding label_lens item (second) specified that its length is 4.
If we change it to be 5 or if we change some other label value to be -1 then we have the All labels must be nonnegative integers exception that you've mentioned. But this just means that our label_lens is invalid.
Here's how I do it. I have a dense tensor labels that includes padding with -1, so that all targets in a batch have the same length. Then I use
labels_sparse = dense_to_sparse(labels, sparse_val=-1)
where
def dense_to_sparse(dense_tensor, sparse_val=0):
"""Inverse of tf.sparse_to_dense.
Parameters:
dense_tensor: The dense tensor. Duh.
sparse_val: The value to "ignore": Occurrences of this value in the
dense tensor will not be represented in the sparse tensor.
NOTE: When/if later restoring this to a dense tensor, you
will probably want to choose this as the default value.
Returns:
SparseTensor equivalent to the dense input.
"""
with tf.name_scope("dense_to_sparse"):
sparse_inds = tf.where(tf.not_equal(dense_tensor, sparse_val),
name="sparse_inds")
sparse_vals = tf.gather_nd(dense_tensor, sparse_inds,
name="sparse_vals")
dense_shape = tf.shape(dense_tensor, name="dense_shape",
out_type=tf.int64)
return tf.SparseTensor(sparse_inds, sparse_vals, dense_shape)
This creates a sparse tensor of the labels, which is what you need to put into the ctc loss. That is, you call tf.nn.ctc_loss(labels=labels_sparse, ...) The padding (i.e. all values equal to -1 in the dense tensor) is simply not represented in this sparse tensor.
I was trying to concatenate a 3-by-n 3d coordinate matrix called VTrans with a 1-by-n all one value vector called lr to augment the coordinate matrix to the 4-by-n homogeneous matrix. n in my case is the vertex Number 141669, which is pretty big.
The code below is not working while it does work in a very small dataset.
lr = np.ones(vertexNum).reshape((1, vertexNum))
VtransAppend = np.concatenate((VTrans, lr), axis=0)
update2:
Just found the problem, my vertexNum is wrong! IT is actually 47223 instead of 141669. 141669 is its size! All solution work and I will accept the first one. Thank you all!
The error says "all the input array dimensions except for the concatenation axis must match exactly"
I further verify lr and VtransAppend has the same length by printing the size out.
print lr.size
print VTrans.size
Anyone once has the same weird problem before and know how to solve it?
Here is the update:
My VTrans matrix is attached, where vertextNum is 141669
This is the code followed by YXD's suggestion, but the issue still exits...
vertexNum = VTrans.size # Total vertex in current model
lr = np.ones(vertexNum)
VtransAppend = np.concatenate((VTrans, lr.reshape(1, -1)), axis=0)
You have to fiddle lr to have the same number of dimensions as vTrans
>>> n = 4
>>> vTrans = np.random.random_sample((3, n))
>>> lr = np.ones(n)
>>> np.concatenate((vTrans, lr.reshape(1, -1)), axis=0)
array([[ 0.65769116, 0.41008341, 0.66046706, 0.86501781],
[ 0.51584699, 0.60601466, 0.93800371, 0.25077702],
[ 0.16696658, 0.41839794, 0.0938594 , 0.48484606],
[ 1. , 1. , 1. , 1. ]])
>>>
i.e. after the reshape, the non-concatenation dimension matches vTrans
>>> lr.shape
(4,)
>>> lr.reshape(1, -1).shape
(1, 4)
>>>
Try vstack instead of concatenate:
a = np.random.random((3,5))
b = np.random.random(5)
np.vstack((a, b))
Alternatively:
np.concatenate((a, b[None,:]))
The None adds an axis to the 1D array b.