Tensorflow symmetric matrix - tensorflow

I want to create a symmetric matrix of n*n and train this matrix in TensorFlow. Effectively I should only train (n+1)*n/2 parameters. How should I do this?
I saw some previous threads which suggest do the following:
X = tf.Variable(tf.random_uniform([d,d], minval=-.1, maxval=.1, dtype=tf.float64))
X_symm = 0.5 * (X + tf.transpose(X))
However, this means I have to train n*n variables, not n*(n+1)/2 variables.
Even there is no function to achieve this, a patch of self-written code would help!
Thanks!

You can use tf.matrix_band_part(input, 0, -1) to create an upper triangular matrix from a square one, so this code would allow you to train on n(n+1)/2 variables although it has you create n*n:
X = tf.Variable(tf.random_uniform([d,d], minval=-.1, maxval=.1, dtype=tf.float64))
X_upper = tf.matrix_band_part(X, 0, -1)
X_symm = 0.5 * (X_upper + tf.transpose(X_upper))

Referring to answer of gdelab: in Tensorflow 2.x, you have to use following code.
X_upper = tf.linalg.band_part(X, 0, -1)

gdelab's answer is correct and will work, since a neural network can adjust the 0.5 factor by itself. I aimed for a solution, where the neural network actually only has (n+1)*n/2 output neurons. The following function transforms these into a symmetric matrix:
def create_symmetric_matrix(x,n):
x_rev = tf.reverse(x[:, n:], [1])
xc = tf.concat([x, x_rev], axis=1)
x_res = tf.reshape(xc, [-1, n, n])
x_upper_triangular = tf.linalg.band_part(x_res, 0, -1)
x_lower_triangular = tf.linalg.set_diag( tf.transpose(x_upper_triangular, perm=[0, 2, 1]), tf.zeros([tf.shape(x)[0], n], dtype=tf.float32))
return x_upper_triangular + x_lower_triangular
with x as a vector of rank [batch,n*(n+1)/2] and n as the rank of the output matrix.
The code is inspired by tfp.math.fill_triangular.

Related

How to implement custom Keras ordinal loss function with tensor evaluation without disturbing TF>2.0 Model Graph?

I am trying to implement a custom loss function in Tensorflow 2.4 using the Keras backend.
The loss function is a ranking loss; I found the following paper with a somewhat log-likelihood loss: Chen et al. Single-Image Depth Perception in the Wild.
Similarly, I wanted to sample some (in this case 50) points from an image to compare the relative order between ground-truth and predicted depth maps using the NYU-Depth dataset. Being a fan of Numpy, I started working with that but came to the following exception:
ValueError: No gradients provided for any variable: [...]
I have learned that this is caused by the arguments not being filled when calling the loss function but instead, a C function is compiled which is then used later. So while I know the dimensions of my tensors (4, 480, 640, 1), I cannot work with the data as wanted and have to use the keras.backend functions on top so that in the end (if I understood correctly), there is supposed to be a path between the input tensors from the TF graph and the output tensor, which has to provide a gradient.
So my question now is: Is this a feasible loss function within keras?
I have already tried a few ideas and different approaches with different variations of my original code, which was something like:
def ranking_loss_function(y_true, y_pred):
# Chen et al. loss
y_true_np = K.eval(y_true)
y_pred_np = K.eval(y_pred)
if y_true_np.shape[0] != None:
num_sample_points = 50
total_samples = num_sample_points ** 2
err_list = [0 for x in range(y_true_np.shape[0])]
for i in range(y_true_np.shape[0]):
sample_points = create_random_samples(y_true, y_pred, num_sample_points)
for x1, y1 in sample_points:
for x2, y2 in sample_points:
if y_true[i][x1][y1] > y_true[i][x2][y2]:
#image_relation_true = 1
err_list[i] += np.log(1 + np.exp(-1 * y_pred[i][x1][y1] + y_pred[i][x2][y2]))
elif y_true[i][x1][y1] < y_true[i][x2][y2]:
#image_relation_true = -1
err_list[i] += np.log(1 + np.exp(y_pred[i][x1][y1] - y_pred[i][x2][y2]))
else:
#image_relation_true = 0
err_list[i] += np.square(y_pred[i][x1][y1] - y_pred[i][x2][y2])
err_list = np.divide(err_list, total_samples)
return K.constant(err_list)
As you can probably tell, the main idea was to first create the sample points and then based on the existing relation between them in y_true/y_pred continue with the corresponding computation from the cited paper.
Can anyone help me and provide some more helpful information or tips on how to correctly implement this loss using keras.backend functions? Trying to include the ordinal relation information really confused me compared to standard regression losses.
EDIT: Just in case this causes confusion: create_random_samples() just creates 50 random sample points (x, y) coordinate pairs based on the shape[1] and shape[2] of y_true (image width and height)
EDIT(2): After finding this variation on GitHub, I have tried out a variation using only TF functions to retrieve data from the tensors and compute the output. The adjusted and probably more correct version still throws the same exception though:
def ranking_loss_function(y_true, y_pred):
#In the Wild ranking loss
y_true_np = K.eval(y_true)
y_pred_np = K.eval(y_pred)
if y_true_np.shape[0] != None:
num_sample_points = 50
total_samples = num_sample_points ** 2
bs = y_true_np.shape[0]
w = y_true_np.shape[1]
h = y_true_np.shape[2]
total_samples = total_samples * bs
num_pairs = tf.constant([total_samples], dtype=tf.float32)
output = tf.Variable(0.0)
for i in range(bs):
sample_points = create_random_samples(y_true, y_pred, num_sample_points)
for x1, y1 in sample_points:
for x2, y2 in sample_points:
y_true_sq = tf.squeeze(y_true)
y_pred_sq = tf.squeeze(y_pred)
d1_t = tf.slice(y_true_sq, [i, x1, y1], [1, 1, 1])
d2_t = tf.slice(y_true_sq, [i, x2, y2], [1, 1, 1])
d1_p = tf.slice(y_pred_sq, [i, x1, y1], [1, 1, 1])
d2_p = tf.slice(y_pred_sq, [i, x2, y2], [1, 1, 1])
d1_t_sq = tf.squeeze(d1_t)
d2_t_sq = tf.squeeze(d2_t)
d1_p_sq = tf.squeeze(d1_p)
d2_p_sq = tf.squeeze(d2_p)
if d1_t_sq > d2_t_sq:
# --> Image relation = 1
output.assign_add(tf.math.log(1 + tf.math.exp(-1 * d1_p_sq + d2_p_sq)))
elif d1_t_sq < d2_t_sq:
# --> Image relation = -1
output.assign_add(tf.math.log(1 + tf.math.exp(d1_p_sq - d2_p_sq)))
else:
output.assign_add(tf.math.square(d1_p_sq - d2_p_sq))
return output/num_pairs
EDIT(3): This is the code for create_random_samples():
(FYI: Because it was weird to get the shape from y_true in this case, I first proceeded to hard-code it here as I know it for the dataset which I am currently using.)
def create_random_samples(y_true, y_pred, num_points=50):
y_true_shape = (4, 480, 640, 1)
y_pred_shape = (4, 480, 640, 1)
if y_true_shape[0] != None:
num_samples = num_points
population = [(x, y) for x in range(y_true_shape[1]) for y in range(y_true_shape[2])]
sample_points = random.sample(population, num_samples)
return sample_points

Slow computation on google colab while solving partial differential equation

I 'm using google colab to solve the homogeneous heat equation. I had made a program earlier with scipy using sparse matrices which worked upto N = 10(hyperparameter) but I need to run it for like N = 4... 1000 and thus it won't work on my pc. I therefore converted the code to tensorflow and here I 'm unable to use sparse matrices like I could in sympy but even the GPU/TPU computation is also slow and slower than my pc. Problems that I'm facing in the code and require solution for
1) tf.contrib is removed and thus I 've to use an older version of tensorflow for odeint function. Where is it in 2.0?
2)If the computation can be computed with sparse matrices it could be good since matrices are tridiagonal.I know about sparse_dense_mul() function but that returns dense tensor and it wouldn't do the job. The "func" function applies time independent boundary conditions and then requires matrix multiplication of (nxn) with (nX1) which gives (nX1) with multiple matrices.
Also the program was running faster without I created the class.
Also it's giving this
WARNING: Logging before flag parsing goes to stderr.
W0829 09:12:24.415445 139855355791232 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0829 09:12:24.645356 139855355791232 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/integrate/python/ops/odes.py:233: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
when I run code for loop in range(2, 10) and tqdm does not display and cell keeps running forever but it works fine for in (2, 5) and tqdm bar does appears.
#find a way to use sparse matrices
class Heat:
def __init__(self, N):
self.N = N
self.H = 1/N
self.A = ts.to_dense(ts.SparseTensor(indices=[[0, 0], [0, 1]] + \
[[i, i+j] for i in range(1, N) for j in [-1, 0, 1]] +[[N, N-1], [N, N]],
values=self.H*np.array([1/3, 1/6] + [1/6, 2/3, 1/6]*(N-1) + [1/6, 1/3], dtype=np.float32),
dense_shape=(N+1, N+1 )))
self.D = ts.to_dense(ts.SparseTensor(indices=[[0, 0], [0, 1]] + [[i, i+j] \
for i in range(1, N) for j in [-1, 0, 1]] +[[N, N-1], [N, N]],
values=N*np.array([1-(1), -1 -(-1)] + [-1, 2, -1]*(N-1) + [-1-(-1), 1-(1)], dtype=np.float32),
dense_shape=(N+1, N+1)))
self.domain = tf.linspace(0.0, 1.0, N+1)
def f(k):
if k == 0:
return (1 + math.pi**2)*(math.pi*self.H - math.sin(math.pi*self.H))/(math.pi**2*self.H)
elif k == N:
return -(1 + math.pi**2)*(-math.pi*self.H + math.sin(math.pi*self.H))/(math.pi**2*self.H)
else:
return -2*(1 + math.pi**2)*(math.cos(math.pi*self.H) - 1)*math.sin(math.pi*self.H*k)/(math.pi**2*self.H)
self.F = tf.constant([f(k) for k in range(N+1)], shape=(N+1,), dtype=tf.float32) #caution! shape changed caution caution 1, N+1(problem) is different from N+1,
self.exact = tm.scalar_mul(scalar=np.exp(1), x=tf.sin(math.pi*self.domain))
def error(self):
return np.linalg.norm(self.exact.numpy() - self.approx, 2)
def func (self, y, t):
y = tf.Variable(y)
y = y[0].assign(0.0)
y = y[self.N].assign(0.0)
if self.N**2> 100:
y_dash = tl.matvec(tf.linalg.inv(self.A), tl.matvec(a=tm.negative(self.D), b=y, a_is_sparse=True) + tm.scalar_mul(scalar=math.exp(t), x=self.F)) #caution! shape changed F is (1, N+1) others too
else:
y_dash = tl.matvec(tf.linalg.inv(self.A), tl.matvec(a=tm.negative(self.D), b=y) + tm.scalar_mul(scalar=math.exp(t), x=self.F)) #caution! shape changed F is (1, N+1) others too
y_dash = tf.Variable(y_dash) #!!y_dash performs Hadamard product like multiplication not matrix-like multiplication;returns 2-D
y_dash = y_dash[0].assign(0.0)
y_dash = y_dash[self.N].assign(0.0)
return y_dash
def algo_1(self):
self.approx = tf.contrib.integrate.odeint(
func=self.func,
y0=tf.sin(tm.scalar_mul(scalar=math.pi, x=self.domain)),
t=tf.constant([0.0, 1.0]),
rtol=1e-06,
atol=1e-12,
method='dopri5',
options={"max_num_steps":10**10},
full_output=False,
name=None
).numpy()[1]
def algo_2(self):
self.approx = tf.contrib.integrate.odeint_fixed(
func=self.func,
y0=tf.sin(tm.scalar_mul(scalar=math.pi, x=self.domain)),
t=tf.constant([0.0, 1.0]),
dt=tf.constant([self.H**2], dtype=tf.float32),
method='rk4',
name=None
).numpy()[1]
df = pd.DataFrame(columns=["NumBasis", "Errors"])
Ns = [2**r for r in range(2, 10)]
l =[]
for i in tqdm_notebook(Ns):
heateqn = Heat(i)
heateqn.algo_1()
l.append([i, heateqn.error()])
df.append({"NumBasis":i, "Errors":heateqn.error()}, ignore_index=True)
tf.keras.backend.clear_session()

How do I discover the values for variables of an equation with keras/tensorflow?

I have an equation that describes a curve in two dimensions. This equation has 5 variables. How do I discover the values of them with keras/tensorflow for a set of data? Is it possible? Someone know a tutorial of something similar?
I generated some data to train the network that has the format:
sample => [150, 66, 2] 150 sets with 66*2 with the data something like "time" x "acceleration"
targets => [150, 5] 150 sets with 5 variable numbers.
Obs: I know the range of the variables. I know too, that 150 sets of data are too few sample, but I need, after the code work, to train a new network with experimental data, and this is limited too. Visually, the curve is simple, it has a descendent linear part at the beggining and at the end it gets down "like an exponential".
My code is as follows:
def build_model():
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(66*2,)))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['mae'])
return model
def smooth_curve(points, factor=0.9):
[...]
return smoothed_points
#load the generated data
train_data = np.load('samples00.npy')
test_data = np.load('samples00.npy')
train_targets = np.load('labels00.npy')
test_targets = np.load('labels00.npy')
#normalizing the data
mean = train_data.mean()
train_data -= mean
std = train_data.std()
train_data /= std
test_data -= mean
test_data /= std
#k-fold validation:
k = 3
num_val_samples = len(train_data)//k
num_epochs = 100
all_mae_histories = []
for i in range(k):
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis=0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis=0)
model = build_model()
#reshape the data to get the format (100, 66*2)
partial_train_data = partial_train_data.reshape(100, 66 * 2)
val_data = val_data.reshape(50, 66 * 2)
history = model.fit(partial_train_data,
partial_train_targets,
validation_data = (val_data, val_targets),
epochs = num_epochs,
batch_size = 1,
verbose = 1)
mae_history = history.history['val_mean_absolute_error']
all_mae_histories.append(mae_history)
average_mae_history = [
np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]
smooth_mae_history = smooth_curve(average_mae_history[10:])
plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
Obviously as it is, I need to get the best accuracy possible, but I am getting an "median absolute error(MAE)" like 96%, and this is inaceptable.
I see some basic bugs in this methodology. Your final layer of the network has a softmax layer. This would mean it would output 5 values, which sum to 1, and behave as a probability distribution. What you actually want to predict is true numbers, or rather floating point values (under some fixed precision arithmetic).
If you have a range, then probably using a sigmoid and rescaling the final layer would to match the range (just multiply with the max value) would help you. By default sigmoid would ensure you get 5 numbers between 0 and 1.
The other thing should be to remove the cross entropy loss and use a loss like RMS, so that you predict your numbers well. You could also used 1D convolutions instead of using Fully connected layers.
There has been some work here: https://julialang.org/blog/2017/10/gsoc-NeuralNetDiffEq which tries to solve DEs and might be relevant to your work.

Implementing backpropagation gradient descent using scipy.optimize.minimize

I am trying to train an autoencoder NN (3 layers - 2 visible, 1 hidden) using numpy and scipy for the MNIST digits images dataset. The implementation is based on the notation given here Below is my code:
def autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, data):
"""
The input theta is a 1-dimensional array because scipy.optimize.minimize expects
the parameters being optimized to be a 1d array.
First convert theta from a 1d array to the (W1, W2, b1, b2)
matrix/vector format, so that this follows the notation convention of the
lecture notes and tutorial.
You must compute the:
cost : scalar representing the overall cost J(theta)
grad : array representing the corresponding gradient of each element of theta
"""
training_size = data.shape[1]
# unroll theta to get (W1,W2,b1,b2) #
W1 = theta[0:hidden_size*visible_size]
W1 = W1.reshape(hidden_size,visible_size)
W2 = theta[hidden_size*visible_size:2*hidden_size*visible_size]
W2 = W2.reshape(visible_size,hidden_size)
b1 = theta[2*hidden_size*visible_size:2*hidden_size*visible_size + hidden_size]
b2 = theta[2*hidden_size*visible_size + hidden_size: 2*hidden_size*visible_size + hidden_size + visible_size]
#feedforward pass
a_l1 = data
z_l2 = W1.dot(a_l1) + numpy.tile(b1,(training_size,1)).T
a_l2 = sigmoid(z_l2)
z_l3 = W2.dot(a_l2) + numpy.tile(b2,(training_size,1)).T
a_l3 = sigmoid(z_l3)
#backprop
delta_l3 = numpy.multiply(-(data-a_l3),numpy.multiply(a_l3,1-a_l3))
delta_l2 = numpy.multiply(W2.T.dot(delta_l3),
numpy.multiply(a_l2, 1 - a_l2))
b2_derivative = numpy.sum(delta_l3,axis=1)/training_size
b1_derivative = numpy.sum(delta_l2,axis=1)/training_size
W2_derivative = numpy.dot(delta_l3,a_l2.T)/training_size + lambda_*W2
#print(W2_derivative.shape)
W1_derivative = numpy.dot(delta_l2,a_l1.T)/training_size + lambda_*W1
W1_derivative = W1_derivative.reshape(hidden_size*visible_size)
W2_derivative = W2_derivative.reshape(visible_size*hidden_size)
b1_derivative = b1_derivative.reshape(hidden_size)
b2_derivative = b2_derivative.reshape(visible_size)
grad = numpy.concatenate((W1_derivative,W2_derivative,b1_derivative,b2_derivative))
cost = 0.5*numpy.sum((data-a_l3)**2)/training_size + 0.5*lambda_*(numpy.sum(W1**2) + numpy.sum(W2**2))
return cost,grad
I have also implemented a function to estimate the numerical gradient and verify the correctness of my implementation (below).
def compute_gradient_numerical_estimate(J, theta, epsilon=0.0001):
"""
:param J: a loss (cost) function that computes the real-valued loss given parameters and data
:param theta: array of parameters
:param epsilon: amount to vary each parameter in order to estimate
the gradient by numerical difference
:return: array of numerical gradient estimate
"""
gradient = numpy.zeros(theta.shape)
eps_vector = numpy.zeros(theta.shape)
for i in range(0,theta.size):
eps_vector[i] = epsilon
cost1,grad1 = J(theta+eps_vector)
cost2,grad2 = J(theta-eps_vector)
gradient[i] = (cost1 - cost2)/(2*epsilon)
eps_vector[i] = 0
return gradient
The norm of the difference between the numerical estimate and the one computed by the function is around 6.87165125021e-09 which seems to be acceptable. My main problem seems to be to get the gradient descent algorithm "L-BGFGS-B" working using the scipy.optimize.minimize function as below:
# theta is the 1-D array of(W1,W2,b1,b2)
J = lambda x: utils.autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, patches_train)
options_ = {'maxiter': 4000, 'disp': False}
result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_)
I get the below output from this:
scipy.optimize.minimize() details:
fun: 90.802022224079778
hess_inv: <16474x16474 LbfgsInvHessProduct with dtype=float64>
jac: array([ -6.83667742e-06, -2.74886002e-06, -3.23531941e-06, ...,
1.22425735e-01, 1.23425062e-01, 1.28091250e-01])
message: b'ABNORMAL_TERMINATION_IN_LNSRCH'
nfev: 21
nit: 0
status: 2
success: False
x: array([-0.06836677, -0.0274886 , -0.03235319, ..., 0. ,
0. , 0. ])
Now, this post seems to indicate that the error could mean that the gradient function implementation could be wrong? But my numerical gradient estimate seems to confirm that my implementation is correct. I have tried varying the initial weights by using a uniform distribution as specified here but the problem still persists. Is there anything wrong with my backprop implementation?
Turns out the issue was a syntax error (very silly) with this line:
J = lambda x: utils.autoencoder_cost_and_grad(theta, visible_size, hidden_size, lambda_, patches_train)
I don't even have the lambda parameter x in the function declaration. So the theta array wasn't even being passed whenever J was being invoked.
This fixed it:
J = lambda x: utils.autoencoder_cost_and_grad(x, visible_size, hidden_size, lambda_, patches_train)

Visualizing output of convolutional layer in tensorflow

I'm trying to visualize the output of a convolutional layer in tensorflow using the function tf.image_summary. I'm already using it successfully in other instances (e. g. visualizing the input image), but have some difficulties reshaping the output here correctly. I have the following conv layer:
img_size = 256
x_image = tf.reshape(x, [-1,img_size, img_size,1], "sketch_image")
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
So the output of h_conv1 would have the shape [-1, img_size, img_size, 32]. Just using tf.image_summary("first_conv", tf.reshape(h_conv1, [-1, img_size, img_size, 1])) Doesn't account for the 32 different kernels, so I'm basically slicing through different feature maps here.
How can I reshape them correctly? Or is there another helper function I could use for including this output in the summary?
I don't know of a helper function but if you want to see all the filters you can pack them into one image with some fancy uses of tf.transpose.
So if you have a tensor that's images x ix x iy x channels
>>> V = tf.Variable()
>>> print V.get_shape()
TensorShape([Dimension(-1), Dimension(256), Dimension(256), Dimension(32)])
So in this example ix = 256, iy=256, channels=32
first slice off 1 image, and remove the image dimension
V = tf.slice(V,(0,0,0,0),(1,-1,-1,-1)) #V[0,...]
V = tf.reshape(V,(iy,ix,channels))
Next add a couple of pixels of zero padding around the image
ix += 4
iy += 4
V = tf.image.resize_image_with_crop_or_pad(image, iy, ix)
Then reshape so that instead of 32 channels you have 4x8 channels, lets call them cy=4 and cx=8.
V = tf.reshape(V,(iy,ix,cy,cx))
Now the tricky part. tf seems to return results in C-order, numpy's default.
The current order, if flattened, would list all the channels for the first pixel (iterating over cx and cy), before listing the channels of the second pixel (incrementing ix). Going across the rows of pixels (ix) before incrementing to the next row (iy).
We want the order that would lay out the images in a grid.
So you go across a row of an image (ix), before stepping along the row of channels (cx), when you hit the end of the row of channels you step to the next row in the image (iy) and when you run out or rows in the image you increment to the next row of channels (cy). so:
V = tf.transpose(V,(2,0,3,1)) #cy,iy,cx,ix
Personally I prefer np.einsum for fancy transposes, for readability, but it's not in tf yet.
newtensor = np.einsum('yxYX->YyXx',oldtensor)
anyway, now that the pixels are in the right order, we can safely flatten it into a 2d tensor:
# image_summary needs 4d input
V = tf.reshape(V,(1,cy*iy,cx*ix,1))
try tf.image_summary on that, you should get a grid of little images.
Below is an image of what one gets after following all the steps here.
In case someone would like to "jump" to numpy and visualize "there" here is an example how to display both Weights and processing result. All transformations are based on prev answer by mdaoust.
# to visualize 1st conv layer Weights
vv1 = sess.run(W_conv1)
# to visualize 1st conv layer output
vv2 = sess.run(h_conv1,feed_dict = {img_ph:x, keep_prob: 1.0})
vv2 = vv2[0,:,:,:] # in case of bunch out - slice first img
def vis_conv(v,ix,iy,ch,cy,cx, p = 0) :
v = np.reshape(v,(iy,ix,ch))
ix += 2
iy += 2
npad = ((1,1), (1,1), (0,0))
v = np.pad(v, pad_width=npad, mode='constant', constant_values=p)
v = np.reshape(v,(iy,ix,cy,cx))
v = np.transpose(v,(2,0,3,1)) #cy,iy,cx,ix
v = np.reshape(v,(cy*iy,cx*ix))
return v
# W_conv1 - weights
ix = 5 # data size
iy = 5
ch = 32
cy = 4 # grid from channels: 32 = 4x8
cx = 8
v = vis_conv(vv1,ix,iy,ch,cy,cx)
plt.figure(figsize = (8,8))
plt.imshow(v,cmap="Greys_r",interpolation='nearest')
# h_conv1 - processed image
ix = 30 # data size
iy = 30
v = vis_conv(vv2,ix,iy,ch,cy,cx)
plt.figure(figsize = (8,8))
plt.imshow(v,cmap="Greys_r",interpolation='nearest')
you may try to get convolution layer activation image this way:
h_conv1_features = tf.unpack(h_conv1, axis=3)
h_conv1_imgs = tf.expand_dims(tf.concat(1, h_conv1_features_padded), -1)
this gets one vertical stripe with all images concatenated vertically.
if you want them padded (in my case of relu activations to pad with white line):
h_conv1_features = tf.unpack(h_conv1, axis=3)
h_conv1_max = tf.reduce_max(h_conv1)
h_conv1_features_padded = map(lambda t: tf.pad(t-h_conv1_max, [[0,0],[0,1],[0,0]])+h_conv1_max, h_conv1_features)
h_conv1_imgs = tf.expand_dims(tf.concat(1, h_conv1_features_padded), -1)
I personally try to tile every 2d-filter in a single image.
For doing this -if i'm not terribly mistaken since I'm quite new to DL- I found out that it could be helpful to exploit the depth_to_space function, since it takes a 4d tensor
[batch, height, width, depth]
and produces an output of shape
[batch, height*block_size, width*block_size, depth/(block_size*block_size)]
Where block_size is the number of "tiles" in the output image. The only limitation to this is that the depth should be the square of block_size, which is an integer, otherwise it cannot "fill" the resulting image correctly.
A possible solution could be of padding the depth of the input tensor up to a depth that is accepted by the method, but I sill havn't tried this.
Another way, which I think very easy, is using the get_operation_by_name function. I had hard time visualizing the layers with other methods but this helped me.
#first, find out the operations, many of those are micro-operations such as add etc.
graph = tf.get_default_graph()
graph.get_operations()
#choose relevant operations
op_name = '...'
op = graph.get_operation_by_name(op_name)
out = sess.run([op.outputs[0]], feed_dict={x: img_batch, is_training: False})
#img_batch is a single image whose dimensions are (1,n,n,1).
# out is the output of the layer, do whatever you want with the output
#in my case, I wanted to see the output of a convolution layer
out2 = np.array(out)
print(out2.shape)
# determine, row, col, and fig size etc.
for each_depth in range(out2.shape[4]):
fig.add_subplot(rows, cols, each_depth+1)
plt.imshow(out2[0,0,:,:,each_depth], cmap='gray')
For example below is the input(colored cat) and output of the second conv layer in my model.
Note that I am aware this question is old and there are easier methods with Keras but for people who use an old model from other people (such as me), this may be useful.