I have an idea for a tensor operation that would not be difficult to implement via iteration, with batch size one. However I would like to parallelize it as much as possible.
I have two tensors with shape (n, 5) called X and Y. X is actually supposed to represent 5 one-dimensional tensors with shape (n, 1): (x_1, ..., x_n). Ditto for Y.
I would like to compute a tensor with shape (n, 25) where each column represents the output of the tensor operation f(x_i, y_j), where f is fixed for all 1 <= i, j <= 5. The operation f has output shape (n, 1), just like x_i and y_i.
I feel it is important to clarify that f is essentially a fully-connected layer from the concatenated [...x_i, ...y_i] tensor with shape (1, 10), to an output layer with shape (1,5).
Again, it is easy to see how to do this manually with iteration and slicing. However this is probably very slow. Performing this operation in batches, where the tensors X, Y now have shape (n, 5, batch_size) is also desirable, particularly for mini-batch gradient descent.
It is difficult to really articulate here why I desire to create this network; I feel it is suited for my domain of 'itemized tabular data' and cuts down significantly on the number of weights per operation, compared to a fully connected network.
Is this possible using tensorflow? Certainly not using just keras.
Below is an example in numpy per AloneTogether's request
import numpy as np
features = 16
batch_size = 256
X_batch = np.random.random((features, 5, batch_size))
Y_batch = np.random.random((features, 5, batch_size))
# one tensor operation to reduce weights in this custom 'layer'
f = np.random.random((features, 2 * features))
for b in range(batch_size):
X = X_batch[:, :, b]
Y = Y_batch[:, :, b]
for i in range(5):
x_i = X[:, i:i+1]
for j in range(5):
y_j = Y[:, j:j+1]
x_i_y_j = np.concatenate([x_i, y_j], axis=0)
# f(x_i, y_j)
# implemented by a fully-connected layer
f_i_j = np.matmul(f, x_i_y_j)
All operations you need (concatenation and matrix multiplication) can be batched.
Difficult part here is, that you want to concatenate features of all items in X with features of all items in Y (all combinations).
My recommended solution is to expand the dimensions of X to [batch, features, 5, 1], expand dimensions of Y to [batch, features, 1, 5]
Than tf.repeat() both tensors so their shapes become [batch, features, 5, 5].
Now you can concatenate X and Y. You will have a tensor of shape [batch, 2*features, 5, 5]. Observe that this way all combinations are built.
Next step is matrix multiplication. tf.matmul() can also do batch matrix multiplication, but I use here tf.einsum() because I want more control over which dimensions are considered as batch.
Full code:
import tensorflow as tf
import numpy as np
batch_size=3
features=6
items=5
x = np.random.uniform(size=[batch_size,features,items])
y = np.random.uniform(size=[batch_size,features,items])
f = np.random.uniform(size=[2*features,features])
x_reps= tf.repeat(x[:,:,:,tf.newaxis], items, axis=3)
y_reps= tf.repeat(y[:,:,tf.newaxis,:], items, axis=2)
xy_conc = tf.concat([x_reps,y_reps], axis=1)
f_i_j = tf.einsum("bfij, fg->bgij", xy_conc,f)
f_i_j = tf.reshape(f_i_j , [batch_size,features,items*items])
I just completed ANN course and started learning CNN. I have basic understanding of padding and stride operation works in CNN.
But have difficultly in mapping input image with neurons in first conv layer but i have basic
understanding of how input features are mapped to first hidden layer in ANN.
What is best way of understanding mapping between input image with neurons in first conv layer?
How can I clarify my doubts about the below code example? Code is taken from DL course in Coursera.
def initialize_parameters():
"""
Initializes weight parameters to build a neural network with tensorflow. The shapes are:
W1 : [4, 4, 3, 8]
W2 : [2, 2, 8, 16]
Returns:
parameters -- a dictionary of tensors containing W1, W2
"""
tf.set_random_seed(1) # so that your "random" numbers match ours
### START CODE HERE ### (approx. 2 lines of code)
W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
### END CODE HERE ###
parameters = {"W1": W1,
"W2": W2}
return parameters
def forward_propagation(X, parameters):
"""
Implements the forward propagation for the model:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "W2"
the shapes are given in initialize_parameters
Returns:
Z3 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
W2 = parameters['W2']
### START CODE HERE ###
# CONV2D: stride of 1, padding 'SAME'
Z1 = tf.nn.conv2d(X,W1, strides = [1,1,1,1], padding = 'SAME')
# RELU
A1 = tf.nn.relu(Z1)
# MAXPOOL: window 8x8, sride 8, padding 'SAME'
P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')
# CONV2D: filters W2, stride 1, padding 'SAME'
Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')
# RELU
A2 = tf.nn.relu(Z2)
# MAXPOOL: window 4x4, stride 4, padding 'SAME'
P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1], strides = [1,4,4,1], padding = 'SAME')
# FLATTEN
P2 = tf.contrib.layers.flatten(P2)
# FULLY-CONNECTED without non-linear activation function (not not call softmax).
# 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"
Z3 = tf.contrib.layers.fully_connected(P2, 6,activation_fn=None)
### END CODE HERE ###
return Z3
with tf.Session() as sess:
np.random.seed(1)
X, Y = create_placeholders(64, 64, 3, 6)
parameters = initialize_parameters()
Z3 = forward_propagation(X, parameters)
init = tf.global_variables_initializer()
sess.run(init)
a = sess.run(Z3, {X: np.random.randn(1,64,64,3), Y: np.random.randn(1,6)})
print("Z3 = " + str(a))
How is this input image of size 64*64*3 is processed by 8 filter of each size 4*4*3?
stride = 1, padding = same and batch_size = 1.
What I have understood till now is each neuron in first conv layer will have 8 filters and each of them having size 4*4*3. Each neuron in first convolution layer will take portion of the input image which is same as filter size (which is here 4*4*3) and apply the convolution operation and produces eight 64*64 features mapping.
If my understanding is correct then:
1> Why we need striding operation since kernel size and portion input image proceed by each neuron is same, If we apply stride = 1(or 2) then boundary of portion of input image is cross which is something we don't need right ?
2> How do we know which portion of input image (same as kernel size) is mapped which neuron in first conv layer?
If not then:
3> How input image is passed on neurons in first convolution layer, Is is complete input image is passed on to each neuron (Like in fully connected ANN, where all the input features are mapped to each neuron in first hidden layer)?
Or portion of input image ? How do we know which portion of input image is mapped which neuron in first conv layer?
4> Number of kernel specified above example (W1= [4, 4, 3, 8]) is per neuron or total number of kernel in fist conv layer ?
5> how do we know how may neurons used by above example in first convolution layer.
6> Is there any relationship between number of neurons and number of kernel first conv layer.
I found relevant answers to my questions and posting same here.
First of all concept of neuron is exist in conv layer as well but it's indirectly. Basically each neuron in conv layer deals with portion of input image which is same as the size of the kernel used in that conv layer.
Each neuron will focus on only particular portion of input image (Where in fully-connected ANN each neuron focus on whole image) and each neuron use n number of filters/kernels to get more insight of particular portion of image.
These n filters/kernels shared by all the neurons in given conv layer. Because of these weight(kernel/filter) sharing nature conv layer will have less number of parameter to learn. Where as in fully connected ANN network each neuron as it's own weight matrix and hence number of parameter to learn is more.
Now the number of neurons in given conv layer 'L' is depends on input_size (output of previous layer L-1), Kernel_size used in layer L , Padding used in layer L and Stride used in layer L.
Now let answer each of the questions specified above.
1> How do we know which portion of input image (same as kernel size) is mapped which neuron in first conv layer?
From above code example for conv layer 1:
Batch size = 1
Input image size = 64*64*3
Kernel size = 4*4*3 ==> Taken from W1
Number of kernel = 8 ==> Taken from W1
Padding = same
stride = 1
Stride = 1 means that you are sliding the kernel one pixel at a time. Let's consider x axis and number pixels 1, 2, 3 4 ... and 64.
The first neuron will see pixels 1 2,3 and 4, then the kernel is shifted by one pixel and the next neuron will see pixels 2 3, 4 and 5 and last neuron will see pixels 61, 62, 63 and 64 This happens if you use valid padding.
In case of same padding, first neuron will see pixels 0, 1, 2, and 3, the second neuron will see pixels 1, 2, 3 and 4, the last neuron will see pixels 62,63, 64 and (one zero padded).
In case the same padding case, you end up with the output of the same size as the image (64 x 64 x 8). In the case of valid padding, the output is (61 x 61 x 8).
Where 8 in output represent the number of filters.
2> How input image is passed on neurons in first convolution layer, Is is complete input image is passed on to each neuron (Like in fully connected ANN, where all the input features are mapped to each neuron in first hidden layer)?
Neurons looks for only portion of input image, Please refer the first question answer you will be able map between input image and neuron.
3> Number of kernel specified above example (W1= [4, 4, 3, 8]) is per neuron or total number of kernel in fist conv layer ?
It's total number of kernels for that layer and all the neuron i that layer will share same kernel for learning different portion of input image. Hence in convnet number of parameter to be learn is less compare to fully-connected ANN.
4> How do we know how may neurons used by above example in first convolution layer ?
It depends on input_size (output of previous layer L-1), Kernel_size used in layer L , Padding used in layer L and Stride used in layer L. Please refer first question answer above for more clarification.
5> Is there any relationship between number of neurons and number of kernel first conv layer
There is no relationship with respect numbers, But each neuron uses n number of filters/kernel (these kernel are shared among all the neurons in particular layer)to learn more about particular portion of input image.
Below sample code will help us clarify the internal implementation of convolution operation.
def conv_forward(A_prev, W, b, hparameters):
"""
Implements the forward propagation for a convolution function
Arguments:
A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
b -- Biases, numpy array of shape (1, 1, 1, n_C)
hparameters -- python dictionary containing "stride" and "pad"
Returns:
Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward() function
"""
# Retrieve dimensions from A_prev's shape (≈1 line)
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# Retrieve dimensions from W's shape (≈1 line)
(f, f, n_C_prev, n_C) = W.shape
# Retrieve information from "hparameters" (≈2 lines)
stride = hparameters['stride']
pad = hparameters['pad']
# Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
n_H = int(np.floor((n_H_prev-f+2*pad)/stride)) + 1
n_W = int(np.floor((n_W_prev-f+2*pad)/stride)) + 1
# Initialize the output volume Z with zeros. (≈1 line)
Z = np.zeros((m,n_H,n_W,n_C))
# Create A_prev_pad by padding A_prev
A_prev_pad = zero_pad(A_prev,pad)
for i in range(m): # loop over the batch of training examples
a_prev_pad = A_prev_pad[i] # Select ith training example's padded activation
for h in range(n_H): # loop over vertical axis of the output volume
for w in range(n_W): # loop over horizontal axis of the output volume
for c in range(n_C): # loop over channels (= #filters) of the output volume
# Find the corners of the current "slice" (≈4 lines)
vert_start = h*stride
vert_end = vert_start+f
horiz_start = w*stride
horiz_end = horiz_start+f
# Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
a_slice_prev = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]
# Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
Z[i, h, w, c] = conv_single_step(a_slice_prev,W[:,:,:,c],b[:,:,:,c])
return Z
A_prev = np.random.randn(1,64,64,3)
W = np.random.randn(4,4,3,8)
#Don't worry about bias , tensorflow will take care of this.
b = np.random.randn(1,1,1,8)
hparameters = {"pad" : 1,
"stride": 1}
Z = conv_forward(A_prev, W, b, hparameters)
Assume we have two TensorFlow tensors:
input and weights.
input is a tensor of n images, say. So its shape is [n, H, W, C].
weights is a simple list of n scalar weights: [w1 w2 ... wn]
The aim is to scalar-multiply each image by its corresponding weight.
How would one do that?
I tried to use tf.nn.conv2D with 1x1 kernels but I do not know how to reshape our rank 1 weight tensor into the required rank 4 kernel tensor.
Any help would be appreciated.
Thanks to user zihaozhihao:
The answer is to change the shape of weights to (-1, 1, 1, 1) and then multiply it with input.
weights = tf.reshape(weights, (-1, 1, 1, 1))
weighted_input = input * weights
When taking the one dimensional convolution of a one dimensional array, I receive an error which suggests my second dimension is not big enough.
Here is the overview of the relevant code:
inputs_ = tf.placeholder(tf.float32 ,(None, 45), name='inputs')
x1 = tf.expand_dims(inputs_, axis=1)
x1 = tf.layers.conv1d(x1, filters=64, kernel_size=1, strides=1, padding='valid')
I am hoping to increase the kernel size to 3 such that neighbouring points also influence the output of each input node, however I get the following error:
ValueError: Negative dimension size caused by subtracting 3 from 1 for
'conv1d_4/convolution/Conv2D' (op: 'Conv2D') with input shapes:
[?,1,1,45], [1,3,45,64].
My guess is that tensorflow is expecting me to reshape my input into two dimensions so that some depth can be used to do the kernel multiplication. Question is why is this the case and what to expect for the layer behaviour based on the input dimensions
You need to add a Channel dimension as last dimension even if you only have one channel.
So this code works:
inputs_ = tf.placeholder(tf.float32 ,(None, 45), name='inputs')
x1 = tf.expand_dims(inputs_, axis=-1)
x1 = tf.layers.conv1d(x1, filters=64, kernel_size=3, strides=1, padding='valid')
So basically the error was caused because your tensor looked like having a width of 1, with 45 channels. TensorFlow was trying to convolve with a kernel size 3 along a size 1 dimension.
Keras Dense layer needs an input_dim or input_shape to be specified. What value do I put in there?
My input is a matrix of 1,000,000 rows and only 3 columns. My output is 1,600 classes.
What do I put there?
dimensionality of the inputs (1000000, 1600)
2 because it's a 2D matrix
input_dim is the number of dimensions of the features, in your case that is just 3. The equivalent notation for input_shape, which is an actual dimensional shape, is (3,)
In your case
lets assume x and y=target variable and are look like as follows after feature engineering
x.shape
(1000000, 3)
y.shape
((1000000, 1600)
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=x.shape[1])) # Input layer
# now the model will take as input arrays of shape (*, 3)
# and output arrays of shape (*, 32)
...
...
model.add(Dense(y.shape[1],activation='softmax')) # Output layer
y.shape[1]= 1600, the number of output which is the number of classes you have, since you are dealing with Classification.
X = dataset.iloc[:, 3:13]
meaning the X parameter having all the rows and 3rd column till 12th column inclusive and 13th column exclusive.
We will also have a X0 parameter to be given to the neural network, so total
input layers becomes 10+1 = 11.
Dense(input_dim = 11, activation = 'relu', kernel_initializer = 'he_uniform')