Alternative implementation of sparse convolution in TensorFlow - tensorflow

I have a special convolution kernel: 1) it has a big size (600x600); 2) it is a sparse filter and consists of mostly 0 values and some 1s. I want to apply this kernel to another big image (2000x2000). Since the filter only has 0s and 1s, the convolution operation in this special case is equivalent to the following steps:
1) Compute the coordinates of the 1s relative to the centre point of the convolution filter; 2) Translate the image by each of the relative coordinates; 3) Sum the resulting translations together.
Assume there are in total n 1s, then the above will result in n translations. We can't have n images present in the memory, since the space needed will be nx2000x2000 and that will result in OOM. I tried using a while loop to save memory space and the code looks like the following:
def sum_conv(acc, curr):
"""
Apply translation to implement convolution. Summation is done in while loop.
:param acc: (batch, size, size, 1)
:param curr: (batch, 2)
:return:
"""
translation = tf.reshape(curr, [batch_size, 2])
trans_x = tf.expand_dims(translation[:, 1], axis=-1)
trans_y = tf.expand_dims(translation[:, 0], axis=-1)
ones, zeros = tf.ones_like(trans_x), tf.zeros_like(trans_x)
transform = tf.concat([ones, zeros, trans_x, zeros, ones, trans_y, zeros, zeros], axis=-1)
# Image: (batch, size, size, 1)
conv_image = tf.contrib.image.transform(image, transform)
return acc + conv_image
# Points: (num_points, batch, 2)
# Loop on points (i.e. positions of 1s) to get the results of convolution
i0 = tf.constant(0)
conv_image0 = tf.zeros((batch_size, size, size, 1))
c = lambda i, prev: i < num_points
b = lambda i, prev: (i + 1, sum_conv(prev, tf.gather(points, i, axis=0)))
i, conv_images = tf.while_loop(c, b, (i0, conv_image0))
I have two questions.
1) Can anyone help me come up with a simpler implementation of the above algorithm?
2) Is there any operation in TensorFlow that supports sparse convolution?

Related

How do I discover the values for variables of an equation with keras/tensorflow?

I have an equation that describes a curve in two dimensions. This equation has 5 variables. How do I discover the values of them with keras/tensorflow for a set of data? Is it possible? Someone know a tutorial of something similar?
I generated some data to train the network that has the format:
sample => [150, 66, 2] 150 sets with 66*2 with the data something like "time" x "acceleration"
targets => [150, 5] 150 sets with 5 variable numbers.
Obs: I know the range of the variables. I know too, that 150 sets of data are too few sample, but I need, after the code work, to train a new network with experimental data, and this is limited too. Visually, the curve is simple, it has a descendent linear part at the beggining and at the end it gets down "like an exponential".
My code is as follows:
def build_model():
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(66*2,)))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['mae'])
return model
def smooth_curve(points, factor=0.9):
[...]
return smoothed_points
#load the generated data
train_data = np.load('samples00.npy')
test_data = np.load('samples00.npy')
train_targets = np.load('labels00.npy')
test_targets = np.load('labels00.npy')
#normalizing the data
mean = train_data.mean()
train_data -= mean
std = train_data.std()
train_data /= std
test_data -= mean
test_data /= std
#k-fold validation:
k = 3
num_val_samples = len(train_data)//k
num_epochs = 100
all_mae_histories = []
for i in range(k):
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis=0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis=0)
model = build_model()
#reshape the data to get the format (100, 66*2)
partial_train_data = partial_train_data.reshape(100, 66 * 2)
val_data = val_data.reshape(50, 66 * 2)
history = model.fit(partial_train_data,
partial_train_targets,
validation_data = (val_data, val_targets),
epochs = num_epochs,
batch_size = 1,
verbose = 1)
mae_history = history.history['val_mean_absolute_error']
all_mae_histories.append(mae_history)
average_mae_history = [
np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]
smooth_mae_history = smooth_curve(average_mae_history[10:])
plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
Obviously as it is, I need to get the best accuracy possible, but I am getting an "median absolute error(MAE)" like 96%, and this is inaceptable.
I see some basic bugs in this methodology. Your final layer of the network has a softmax layer. This would mean it would output 5 values, which sum to 1, and behave as a probability distribution. What you actually want to predict is true numbers, or rather floating point values (under some fixed precision arithmetic).
If you have a range, then probably using a sigmoid and rescaling the final layer would to match the range (just multiply with the max value) would help you. By default sigmoid would ensure you get 5 numbers between 0 and 1.
The other thing should be to remove the cross entropy loss and use a loss like RMS, so that you predict your numbers well. You could also used 1D convolutions instead of using Fully connected layers.
There has been some work here: https://julialang.org/blog/2017/10/gsoc-NeuralNetDiffEq which tries to solve DEs and might be relevant to your work.

Implementing backpropagation gradient descent using scipy.optimize.minimize

I am trying to train an autoencoder NN (3 layers - 2 visible, 1 hidden) using numpy and scipy for the MNIST digits images dataset. The implementation is based on the notation given here Below is my code:
def autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, data):
"""
The input theta is a 1-dimensional array because scipy.optimize.minimize expects
the parameters being optimized to be a 1d array.
First convert theta from a 1d array to the (W1, W2, b1, b2)
matrix/vector format, so that this follows the notation convention of the
lecture notes and tutorial.
You must compute the:
cost : scalar representing the overall cost J(theta)
grad : array representing the corresponding gradient of each element of theta
"""
training_size = data.shape[1]
# unroll theta to get (W1,W2,b1,b2) #
W1 = theta[0:hidden_size*visible_size]
W1 = W1.reshape(hidden_size,visible_size)
W2 = theta[hidden_size*visible_size:2*hidden_size*visible_size]
W2 = W2.reshape(visible_size,hidden_size)
b1 = theta[2*hidden_size*visible_size:2*hidden_size*visible_size + hidden_size]
b2 = theta[2*hidden_size*visible_size + hidden_size: 2*hidden_size*visible_size + hidden_size + visible_size]
#feedforward pass
a_l1 = data
z_l2 = W1.dot(a_l1) + numpy.tile(b1,(training_size,1)).T
a_l2 = sigmoid(z_l2)
z_l3 = W2.dot(a_l2) + numpy.tile(b2,(training_size,1)).T
a_l3 = sigmoid(z_l3)
#backprop
delta_l3 = numpy.multiply(-(data-a_l3),numpy.multiply(a_l3,1-a_l3))
delta_l2 = numpy.multiply(W2.T.dot(delta_l3),
numpy.multiply(a_l2, 1 - a_l2))
b2_derivative = numpy.sum(delta_l3,axis=1)/training_size
b1_derivative = numpy.sum(delta_l2,axis=1)/training_size
W2_derivative = numpy.dot(delta_l3,a_l2.T)/training_size + lambda_*W2
#print(W2_derivative.shape)
W1_derivative = numpy.dot(delta_l2,a_l1.T)/training_size + lambda_*W1
W1_derivative = W1_derivative.reshape(hidden_size*visible_size)
W2_derivative = W2_derivative.reshape(visible_size*hidden_size)
b1_derivative = b1_derivative.reshape(hidden_size)
b2_derivative = b2_derivative.reshape(visible_size)
grad = numpy.concatenate((W1_derivative,W2_derivative,b1_derivative,b2_derivative))
cost = 0.5*numpy.sum((data-a_l3)**2)/training_size + 0.5*lambda_*(numpy.sum(W1**2) + numpy.sum(W2**2))
return cost,grad
I have also implemented a function to estimate the numerical gradient and verify the correctness of my implementation (below).
def compute_gradient_numerical_estimate(J, theta, epsilon=0.0001):
"""
:param J: a loss (cost) function that computes the real-valued loss given parameters and data
:param theta: array of parameters
:param epsilon: amount to vary each parameter in order to estimate
the gradient by numerical difference
:return: array of numerical gradient estimate
"""
gradient = numpy.zeros(theta.shape)
eps_vector = numpy.zeros(theta.shape)
for i in range(0,theta.size):
eps_vector[i] = epsilon
cost1,grad1 = J(theta+eps_vector)
cost2,grad2 = J(theta-eps_vector)
gradient[i] = (cost1 - cost2)/(2*epsilon)
eps_vector[i] = 0
return gradient
The norm of the difference between the numerical estimate and the one computed by the function is around 6.87165125021e-09 which seems to be acceptable. My main problem seems to be to get the gradient descent algorithm "L-BGFGS-B" working using the scipy.optimize.minimize function as below:
# theta is the 1-D array of(W1,W2,b1,b2)
J = lambda x: utils.autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, patches_train)
options_ = {'maxiter': 4000, 'disp': False}
result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_)
I get the below output from this:
scipy.optimize.minimize() details:
fun: 90.802022224079778
hess_inv: <16474x16474 LbfgsInvHessProduct with dtype=float64>
jac: array([ -6.83667742e-06, -2.74886002e-06, -3.23531941e-06, ...,
1.22425735e-01, 1.23425062e-01, 1.28091250e-01])
message: b'ABNORMAL_TERMINATION_IN_LNSRCH'
nfev: 21
nit: 0
status: 2
success: False
x: array([-0.06836677, -0.0274886 , -0.03235319, ..., 0. ,
0. , 0. ])
Now, this post seems to indicate that the error could mean that the gradient function implementation could be wrong? But my numerical gradient estimate seems to confirm that my implementation is correct. I have tried varying the initial weights by using a uniform distribution as specified here but the problem still persists. Is there anything wrong with my backprop implementation?
Turns out the issue was a syntax error (very silly) with this line:
J = lambda x: utils.autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, patches_train)
I don't even have the lambda parameter x in the function declaration. So the theta array wasn't even being passed whenever J was being invoked.
This fixed it:
J = lambda x: utils.autoencoder_cost_and_grad(x, visible_size, hidden_size, lambda_, patches_train)

Tensorflow symmetric matrix

I want to create a symmetric matrix of n*n and train this matrix in TensorFlow. Effectively I should only train (n+1)*n/2 parameters. How should I do this?
I saw some previous threads which suggest do the following:
X = tf.Variable(tf.random_uniform([d,d], minval=-.1, maxval=.1, dtype=tf.float64))
X_symm = 0.5 * (X + tf.transpose(X))
However, this means I have to train n*n variables, not n*(n+1)/2 variables.
Even there is no function to achieve this, a patch of self-written code would help!
Thanks!
You can use tf.matrix_band_part(input, 0, -1) to create an upper triangular matrix from a square one, so this code would allow you to train on n(n+1)/2 variables although it has you create n*n:
X = tf.Variable(tf.random_uniform([d,d], minval=-.1, maxval=.1, dtype=tf.float64))
X_upper = tf.matrix_band_part(X, 0, -1)
X_symm = 0.5 * (X_upper + tf.transpose(X_upper))
Referring to answer of gdelab: in Tensorflow 2.x, you have to use following code.
X_upper = tf.linalg.band_part(X, 0, -1)
gdelab's answer is correct and will work, since a neural network can adjust the 0.5 factor by itself. I aimed for a solution, where the neural network actually only has (n+1)*n/2 output neurons. The following function transforms these into a symmetric matrix:
def create_symmetric_matrix(x,n):
x_rev = tf.reverse(x[:, n:], [1])
xc = tf.concat([x, x_rev], axis=1)
x_res = tf.reshape(xc, [-1, n, n])
x_upper_triangular = tf.linalg.band_part(x_res, 0, -1)
x_lower_triangular = tf.linalg.set_diag( tf.transpose(x_upper_triangular, perm=[0, 2, 1]), tf.zeros([tf.shape(x)[0], n], dtype=tf.float32))
return x_upper_triangular + x_lower_triangular
with x as a vector of rank [batch,n*(n+1)/2] and n as the rank of the output matrix.
The code is inspired by tfp.math.fill_triangular.

Using numpy roll in Keras

I'm trying to make a custom regularizer in Keras and I need to be able to roll the coefficient array.
I know this may be impossible however any mechanism that can replicate this roll function would be extremely appreciated.
```
def __call__(self, x):
regularization = 0.
# Add components if they are given
if self.l1:
# \lambda ||x||
regularization += self.l1 * K.sum(K.abs(x))
if self.fuse:
# \lambda \sum{ |x - x_+1| }
regularization += self.fuse * K.sum(K.abs(x - np.roll(x, 1)))
if self.abs_fuse:
# \lambda \sum{ ||x| - |x_+1|| }
regularization += self.abs_fuse * K.sum(K.abs(K.abs(x) - K.abs(np.roll(x, 1))))
```
Given that x is of shape (m, 1), a possible solution is to use tile:
def roll_reg(x):
length = K.int_shape(x)[0]
x_tile = K.tile(x, [2, 1])
x_roll = x_tile[length - 1:-1]
return K.sum(K.abs(x - x_roll))
It will result in some extra memory usage, but if x is a 1-dim vector, I guess the overhead won't be too bad.

Visualizing output of convolutional layer in tensorflow

I'm trying to visualize the output of a convolutional layer in tensorflow using the function tf.image_summary. I'm already using it successfully in other instances (e. g. visualizing the input image), but have some difficulties reshaping the output here correctly. I have the following conv layer:
img_size = 256
x_image = tf.reshape(x, [-1,img_size, img_size,1], "sketch_image")
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
So the output of h_conv1 would have the shape [-1, img_size, img_size, 32]. Just using tf.image_summary("first_conv", tf.reshape(h_conv1, [-1, img_size, img_size, 1])) Doesn't account for the 32 different kernels, so I'm basically slicing through different feature maps here.
How can I reshape them correctly? Or is there another helper function I could use for including this output in the summary?
I don't know of a helper function but if you want to see all the filters you can pack them into one image with some fancy uses of tf.transpose.
So if you have a tensor that's images x ix x iy x channels
>>> V = tf.Variable()
>>> print V.get_shape()
TensorShape([Dimension(-1), Dimension(256), Dimension(256), Dimension(32)])
So in this example ix = 256, iy=256, channels=32
first slice off 1 image, and remove the image dimension
V = tf.slice(V,(0,0,0,0),(1,-1,-1,-1)) #V[0,...]
V = tf.reshape(V,(iy,ix,channels))
Next add a couple of pixels of zero padding around the image
ix += 4
iy += 4
V = tf.image.resize_image_with_crop_or_pad(image, iy, ix)
Then reshape so that instead of 32 channels you have 4x8 channels, lets call them cy=4 and cx=8.
V = tf.reshape(V,(iy,ix,cy,cx))
Now the tricky part. tf seems to return results in C-order, numpy's default.
The current order, if flattened, would list all the channels for the first pixel (iterating over cx and cy), before listing the channels of the second pixel (incrementing ix). Going across the rows of pixels (ix) before incrementing to the next row (iy).
We want the order that would lay out the images in a grid.
So you go across a row of an image (ix), before stepping along the row of channels (cx), when you hit the end of the row of channels you step to the next row in the image (iy) and when you run out or rows in the image you increment to the next row of channels (cy). so:
V = tf.transpose(V,(2,0,3,1)) #cy,iy,cx,ix
Personally I prefer np.einsum for fancy transposes, for readability, but it's not in tf yet.
newtensor = np.einsum('yxYX->YyXx',oldtensor)
anyway, now that the pixels are in the right order, we can safely flatten it into a 2d tensor:
# image_summary needs 4d input
V = tf.reshape(V,(1,cy*iy,cx*ix,1))
try tf.image_summary on that, you should get a grid of little images.
Below is an image of what one gets after following all the steps here.
In case someone would like to "jump" to numpy and visualize "there" here is an example how to display both Weights and processing result. All transformations are based on prev answer by mdaoust.
# to visualize 1st conv layer Weights
vv1 = sess.run(W_conv1)
# to visualize 1st conv layer output
vv2 = sess.run(h_conv1,feed_dict = {img_ph:x, keep_prob: 1.0})
vv2 = vv2[0,:,:,:] # in case of bunch out - slice first img
def vis_conv(v,ix,iy,ch,cy,cx, p = 0) :
v = np.reshape(v,(iy,ix,ch))
ix += 2
iy += 2
npad = ((1,1), (1,1), (0,0))
v = np.pad(v, pad_width=npad, mode='constant', constant_values=p)
v = np.reshape(v,(iy,ix,cy,cx))
v = np.transpose(v,(2,0,3,1)) #cy,iy,cx,ix
v = np.reshape(v,(cy*iy,cx*ix))
return v
# W_conv1 - weights
ix = 5 # data size
iy = 5
ch = 32
cy = 4 # grid from channels: 32 = 4x8
cx = 8
v = vis_conv(vv1,ix,iy,ch,cy,cx)
plt.figure(figsize = (8,8))
plt.imshow(v,cmap="Greys_r",interpolation='nearest')
# h_conv1 - processed image
ix = 30 # data size
iy = 30
v = vis_conv(vv2,ix,iy,ch,cy,cx)
plt.figure(figsize = (8,8))
plt.imshow(v,cmap="Greys_r",interpolation='nearest')
you may try to get convolution layer activation image this way:
h_conv1_features = tf.unpack(h_conv1, axis=3)
h_conv1_imgs = tf.expand_dims(tf.concat(1, h_conv1_features_padded), -1)
this gets one vertical stripe with all images concatenated vertically.
if you want them padded (in my case of relu activations to pad with white line):
h_conv1_features = tf.unpack(h_conv1, axis=3)
h_conv1_max = tf.reduce_max(h_conv1)
h_conv1_features_padded = map(lambda t: tf.pad(t-h_conv1_max, [[0,0],[0,1],[0,0]])+h_conv1_max, h_conv1_features)
h_conv1_imgs = tf.expand_dims(tf.concat(1, h_conv1_features_padded), -1)
I personally try to tile every 2d-filter in a single image.
For doing this -if i'm not terribly mistaken since I'm quite new to DL- I found out that it could be helpful to exploit the depth_to_space function, since it takes a 4d tensor
[batch, height, width, depth]
and produces an output of shape
[batch, height*block_size, width*block_size, depth/(block_size*block_size)]
Where block_size is the number of "tiles" in the output image. The only limitation to this is that the depth should be the square of block_size, which is an integer, otherwise it cannot "fill" the resulting image correctly.
A possible solution could be of padding the depth of the input tensor up to a depth that is accepted by the method, but I sill havn't tried this.
Another way, which I think very easy, is using the get_operation_by_name function. I had hard time visualizing the layers with other methods but this helped me.
#first, find out the operations, many of those are micro-operations such as add etc.
graph = tf.get_default_graph()
graph.get_operations()
#choose relevant operations
op_name = '...'
op = graph.get_operation_by_name(op_name)
out = sess.run([op.outputs[0]], feed_dict={x: img_batch, is_training: False})
#img_batch is a single image whose dimensions are (1,n,n,1).
# out is the output of the layer, do whatever you want with the output
#in my case, I wanted to see the output of a convolution layer
out2 = np.array(out)
print(out2.shape)
# determine, row, col, and fig size etc.
for each_depth in range(out2.shape[4]):
fig.add_subplot(rows, cols, each_depth+1)
plt.imshow(out2[0,0,:,:,each_depth], cmap='gray')
For example below is the input(colored cat) and output of the second conv layer in my model.
Note that I am aware this question is old and there are easier methods with Keras but for people who use an old model from other people (such as me), this may be useful.