I am using the TensorFlow backend.
I am applying a convolution, max-pooling, flatten and a dense layer sequentially. The convolution requires a 3D input (height, width, color_channels_depth).
After the convolution, this becomes (height, width, Number_of_filters).
After applying max-pooling height and width changes. But, after applying the flatten layer, what happens exactly? For example, if the input before flatten is (24, 24, 32), then how it flattens it out?
Is it sequential like (24 * 24) for height, weight for each filter number sequentially, or in some other way? An example would be appreciated with actual values.
The Flatten() operator unrolls the values beginning at the last dimension (at least for Theano, which is "channels first", not "channels last" like TF. I can't run TensorFlow in my environment). This is equivalent to numpy.reshape with 'C' ordering:
āCā means to read / write the elements using C-like index order, with
the last axis index changing fastest, back to the first axis index
changing slowest.
Here is a standalone example illustrating Flatten operator with the Keras Functional API. You should be able to easily adapt for your environment.
import numpy as np
from keras.layers import Input, Flatten
from keras.models import Model
inputs = Input(shape=(3,2,4))
# Define a model consisting only of the Flatten operation
prediction = Flatten()(inputs)
model = Model(inputs=inputs, outputs=prediction)
X = np.arange(0,24).reshape(1,3,2,4)
print(X)
#[[[[ 0 1 2 3]
# [ 4 5 6 7]]
#
# [[ 8 9 10 11]
# [12 13 14 15]]
#
# [[16 17 18 19]
# [20 21 22 23]]]]
model.predict(X)
#array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
# 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.,
# 22., 23.]], dtype=float32)
Flattening a tensor means to remove all of the dimensions except for one.
A Flatten layer in Keras reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor.
This is the same thing as making a 1d-array of elements.
For example in the VGG16 model you may find it easy to understand:
>>> model.summary()
Layer (type) Output Shape Param #
================================================================
vgg16 (Model) (None, 4, 4, 512) 14714688
________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
________________________________________________________________
dense_1 (Dense) (None, 256) 2097408
________________________________________________________________
dense_2 (Dense) (None, 1) 257
===============================================================
Note how flatten_1 layer shape is (None, 8192), where 8192 is actually 4*4*512.
PS, None means any dimension (or dynamic dimension), but you can typically read it as 1. You can find more details in here.
It is sequential like 24*24*32 and reshape it as shown in following code.
def batch_flatten(x):
"""Turn a nD tensor into a 2D tensor with same 0th dimension.
In other words, it flattens each data samples of a batch.
# Arguments
x: A tensor or variable.
# Returns
A tensor.
"""
x = tf.reshape(x, tf.stack([-1, prod(shape(x)[1:])]))
return x
Related
Given an array of sentence embeddings (arrays of 512) with a shape of (1000000, 512) how do I calculate the cosine similarity of every one of the 1 million sentence embeddings of the array against every other sentence embedding of the array, ideally using tensorflow, so I can try and speed it up with a GPU?
in this way you can calculate the cosine distance
X = np.random.uniform(0,10, (100,512)).astype('float32')
X = tf.constant(X)
def compute_cosine_distances(a, b):
normalize_a = tf.nn.l2_normalize(a,1)
normalize_b = tf.nn.l2_normalize(b,1)
distance = 1 - tf.matmul(normalize_a, normalize_b, transpose_b=True)
return distance
compute_cosine_distances(X, X)
which is equal to
from sklearn.metrics.pairwise import pairwise_distances
pairwise_distances(X.numpy(), metric='cosine')
Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. So, Cosine similarity of array with itself will be -1 always.
import tensorflow as tf
y_true = [[2., 8.], [1., 7.]]
y_pred = [[2., 8.], [1., 7.]]
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
print(cosine_loss(y_true, y_pred).numpy())
output: -1.0000001
I'm interested in using the Networkx Python package to perform network analysis on convolutional neural networks. To achieve this I want to extract the edge and weight information from Keras model objects and put them into a Networkx Digraph object where it can be (1) written to a graphml file and (2) be subject to the graph analysis tools available in Networkx.
Before jumping in further, let me clarify and how to consider pooling. Pooling (examples: max, or average) means that the entries within a convolution window will be aggregated, creating an ambiguity on 'which' entry would be used in the graph I want to create. To resolve this, I would like every possible choice included in the graph as I can account for this later as needed.
For the sake of example, let's consider doing this with VGG16. Keras makes it pretty easy to access the weights while looping over the layers.
from keras.applications.vgg16 import VGG16
model = VGG16()
for layer_index, layer in enumerate(model.layers):
GW = layer.get_weights()
if layer_index == 0:
print(layer_index, layer.get_config()['name'], layer.get_config()['batch_input_shape'])
elif GW:
W, B = GW
print(layer_index, layer.get_config()['name'], W.shape, B.shape)
else:
print(layer_index, layer.get_config()['name'])
Which will print the following:
0 input_1 (None, 224, 224, 3)
1 block1_conv1 (3, 3, 3, 64) (64,)
2 block1_conv2 (3, 3, 64, 64) (64,)
3 block1_pool
4 block2_conv1 (3, 3, 64, 128) (128,)
5 block2_conv2 (3, 3, 128, 128) (128,)
6 block2_pool
7 block3_conv1 (3, 3, 128, 256) (256,)
8 block3_conv2 (3, 3, 256, 256) (256,)
9 block3_conv3 (3, 3, 256, 256) (256,)
10 block3_pool
11 block4_conv1 (3, 3, 256, 512) (512,)
12 block4_conv2 (3, 3, 512, 512) (512,)
13 block4_conv3 (3, 3, 512, 512) (512,)
14 block4_pool
15 block5_conv1 (3, 3, 512, 512) (512,)
16 block5_conv2 (3, 3, 512, 512) (512,)
17 block5_conv3 (3, 3, 512, 512) (512,)
18 block5_pool
19 flatten
20 fc1 (25088, 4096) (4096,)
21 fc2 (4096, 4096) (4096,)
22 predictions (4096, 1000) (1000,)
For the convolutional layers, I've read that the tuples will represent (filter_x, filter_y, filter_z, num_filters) where filter_x, filter_y, filter_z give the shape of the filter and num_filters is the number of filters. There's one bias term for each filter, so the last tuple in these rows will also equal the number of filters.
While I've read explanations of how the convolutions within a convolutional neural network behave conceptually, I seem to be having a mental block when I get to handling the shapes of the layers in the model object.
Once I know how to loop over the edges of the Keras model, with Networkx I should be able to easily code the construction of the Networkx object. The code for this might loosely resemble something like this, where keras_edges is an iterable that contains tuples formatted as (in_node, out_node, edge_weight).
import networkx as nx
g = nx.DiGraph()
g.add_weighted_edges_from(keras_edges)
nx.write_graphml(g, 'vgg16.graphml')
So to be specific, how do I loop over all the edges in a way that accounts for the shape of the layers and the pooling in the way I described above?
Since Keras doesn't have an edge element, and a Keras node seems to be something totally different (a Keras node is an entire layer when it's used, it's the layer as presented in the graph of the model)
So, assuming you are using the smallest image possible (which is equal to the kernel size), and that you're creating nodes manually (sorry, I don't know how it works in networkx):
For a convolution that:
Has i input channels (channels in the image that comes in)
Has o output channels (the selected number of filters in keras)
Has kernel_size = (x, y)
You already know the weights, which are shaped (x, y, i, o).
You would have something like:
#assuming a node here is one pixel from one channel only:
#kernel sizes x and y
kSizeX = weights.shape[0]
kSizeY = weights.shape[1]
#in and out channels
inChannels = weights.shape[2]
outChannels = weights.shape[3]
#slide steps x
stepsX = image.shape[0] - kSizeX + 1
stepsY = image.shape[1] - kSizeY + 1
#stores the final results
all_filter_results = []
for ko in range(outChannels): #for each output filter
one_image_results = np.zeros((stepsX, stepsY))
#for each position of the sliding window
#if you used the smallest size image, start here
for pos_x in range(stepsX):
for pos_y in range(stepsY):
#storing the results of a single step of a filter here:
one_slide_nodes = []
#for each weight in the filter
for kx in range(kSizeX):
for ky in range(kSizeY):
for ki in range(inChannels):
#the input node is a pixel in a single channel
in_node = image[pos_x + kx, pos_y + ky, ki]
#one multiplication, single weight x single pixel
one_slide_nodes.append(weights[kx, ky, ki, ko] * in_node)
#so, here, you have in_node and weights
#the results of each step in the slide is the sum of one_slide_nodes:
slide_result = sum(one_slide_nodes)
one_image_results[pos_x, pos_y] = slide_result
all_filter_results.append(one_image_results)
According to the code in: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py, it happens that the same names are used for the tensors variables such as:
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME') # Under conv1, line: 208
and,
conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME') # Under conv2, line 227
Therefore, why this is allowed in tensorflow? If for some reason, If I tried to say:
sess.run([conv], feed_dict{x: some_data})
Then which conv tensor we will be evaluated?
Second, if the conv tensor under CONV1 layer was referring to the tf.nn.conv2d operation. How could another conv tensor under CONV2 refer to the second tf.nn.conv2d operation? In other words, how they are treated separately?
Any help is much appreciated!!
for your question: latest "conv" is evaluated
For example:
import tensorflow as tf
a = tf.constant(5)
b = tf.constant(6)
c = tf.multiply(a,b)
print c
c = tf.multiply(c,b)
print c
sess = tf.Session()
c_val = sess.run(c)
print c_val
Output :
Tensor("Mul:0", shape=(), dtype=int32)
Tensor("Mul_1:0", shape=(), dtype=int32)
180
You can see TF names them differently. Whenever you call an TF operator, it creates a node independent of python variable name. But python variable name corresponds to latest tensor you used.
I hope this helps.
I'm trying to writting a layer to merge 2 tensors with such a formula
The shapes of x[0] and x[1] are both (?, 1, 500).
M is a 500*500 Matrix.
I want the output to be (?, 500, 500) which is theoretically feasible in my opinion. The layer will output (1,500,500) for every pair of inputs, as (1, 1, 500) and (1, 1, 500). As the batch_size is variable, or dynamic, the output must be (?, 500, 500).
However, I know little about axes and I have tried all the combinations of axes but it doesn't make sense.
I try with numpy.tensordot and keras.backend.batch_dot(TensorFlow). If the batch_size is fixed, taking a =
(100,1,500) for example, batch_dot(a,M,(2,0)), the output can be (100,1,500).
Newbie for Keras, sorry for such a stupid question but I have spent 2 days to figure out and it drove me crazy :(
def call(self,x):
input1 = x[0]
input2 = x[1]
#self.M is defined in build function
output = K.batch_dot(...)
return output
Update:
Sorry for being late. I try Daniel's answer with TensorFlow as Keras's backend and it still raises a ValueError for unequal dimensions.
I try the same code with Theano as backend and now it works.
>>> import numpy as np
>>> import keras.backend as K
Using Theano backend.
>>> from keras.layers import Input
>>> x1 = Input(shape=[1,500,])
>>> M = K.variable(np.ones([1,500,500]))
>>> firstMul = K.batch_dot(x1, M, axes=[1,2])
I don't know how to print tensors' shape in theano. It's definitely harder than tensorflow for me... However it works.
For that I scan 2 versions of codes for Tensorflow and Theano. Following are differences.
In this case, x = (?, 1, 500), y = (1, 500, 500), axes = [1, 2]
In tensorflow_backend:
return tf.matmul(x, y, adjoint_a=True, adjoint_b=True)
In theano_backend:
return T.batched_tensordot(x, y, axes=axes)
(If following changes of out._keras_shape don't make influence on out's value.)
Your multiplications should select which axes it uses in the batch dot function.
Axis 0 - the batch dimension, it's your ?
Axis 1 - the dimension you say has length 1
Axis 2 - the last dimension, of size 500
You won't change the batch dimension, so you will use batch_dot always with axes=[1,2]
But for that to work, you must ajust M to be (?, 500, 500).
For that define M not as (500,500), but as (1,500,500) instead, and repeat it in the first axis for the batch size:
import keras.backend as K
#Being M with shape (1,500,500), we repeat it.
BatchM = K.repeat_elements(x=M,rep=batch_size,axis=0)
#Not sure if repeating is really necessary, leaving M as (1,500,500) gives the same output shape at the end, but I haven't checked actual numbers for correctness, I believe it's totally ok.
#Now we can use batch dot properly:
firstMul = K.batch_dot(x[0], BatchM, axes=[1,2]) #will result in (?,500,500)
#we also need to transpose x[1]:
x1T = K.permute_dimensions(x[1],(0,2,1))
#and the second multiplication:
result = K.batch_dot(firstMul, x1T, axes=[1,2])
I prefer using TensorFlow so I tried to figure it out with TensorFlow in past few days.
The first one is much similar to Daniel's solution.
x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(None,3,3))
tf.matmul(x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>
It needs to feed values to M with fit shapes.
sess = tf.Session()
sess.run(tf.matmul(x,M), feed_dict = {x: [[[1,2,3]]], M: [[[1,2,3],[0,1,0],[0,0,1]]]})
# return : array([[[ 1., 4., 6.]]], dtype=float32)
Another way is simple with tf.einsum.
x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(3,3))
tf.einsum('ijk,lm->ikl', x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>
Let's feed some values.
sess.run(tf.einsum('ijk,kl->ijl', x, M), feed_dict = {x: [[[1,2,3]]], M: [[1,2,3],[0,1,0],[0,0,1]]})
# return: array([[[ 1., 4., 6.]]], dtype=float32)
Now M is a 2D tensor and no need to feed batch_size to M.
What's more, now it seems such a question can be solved in TensorFlow with tf.einsum. Does it mean it's a duty for Keras to invoke tf.einsum in some situations? At least I find no where Keras calls tf.einsum. And in my opinion, when batch_dot 3D tensor and 2D tensor Keras behaves weirdly. In Daniel's answer, he pads M to (1,500,500) but in K.batch_dot() M will be adjusted to (500,500,1) automatically. I find tf will adjust it with Broadcasting rules and I'm not sure Keras does the same.
In the expert mnist tutorial in tensorflow website, it have something like this :
x_image = tf.reshape(x, [-1,28,28,1])
I know that the reshape is like
tf.reshape(input,[batch_size,width,height,channel])
Q1 : why is the batch_size equals -1? What does the -1 means?
And when I go down the code there's one more thing I can not understand
W_fc1 = weight_variable([7 * 7 * 64, 1024])
Q2:What does the image_size * 64 means?
Q1 : why is the batch_size equals -1? What does the -1 means?
-1 means "figure this part out for me". For example, if I run:
reshape([1, 2, 3, 4, 5, 6, 7, 8], [-1, 2])
It creates two columns, and whatever number of rows it needs to get everything to fit:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Q2:What does the image_size * 64 means?
It is the number of filters in that particular filter activation. Shapes of filters in conv layers follow the format [height, width, # of input channels (number of filters in the previous layer), # of filters].
When you pass -1 as a dimension in tf.reshape, it preserves the existing dimension. From the docs:
If one component of shape is the special value -1, the size of that
dimension is computed so that the total size remains constant. In
particular, a shape of [-1] flattens into 1-D. At most one component
of shape can be -1.
The reference to 7 x 7 x 64 is because the convolutional layer being applied prior to this example has reduced the image to a shape of [7, 7, 64], and the input to the next fully connected layer needs to be a single dimension, so in the next line of the example, the tensor is reshaped from [7,7,64] to [7*7*64] so it can connect to the FC layer.
For more info on how convolutions and max pooling works, the wikipedia page has some helpful graphics:
e.g. network architecture:
and pooling: