I'm using Keras with TensorFlow 2 and I have a trained model with the weights corresponding to each layer of my model but the shape of some conv1d layers confused me.
I set the convolutional layers to have 64 filters with a length of 16 but the shape of my weight vector is like (16,64,64) at the end.
can someone explain this to me? I suppose that 16 is the length of every filter and the last 64 is my num_filters, what is the other one, I mean how is that 3-dimensional? it should be (16,64) or something.
and besides, isn't this odd to specify the length of every filter on z-axis ? (of course with assuming the computer science version of representing dimensions (z,x,y instead of x,y,z))
what I get is something like this :
name:conv1d/kernel:0 shape:(16,64,64) dtype:<dtype:'float32'> numpy=...
thank you guys in advance.
to answer my own question, the first 64 corresponds to depth of the data we are facing. for instance, if you want to have 5 filters with 32-element length for a data which has 10 features (in other words the input depth of the conv layer is 10) your variable shape will be : (32,10,5)
Related
I am trying to understand how values/losses are calculated in the following scenario while training a lstm network:
Input shape for lstm is (64,,5,66,150,3): 5 rgb images, with batch size of 64.
Corresponding output shape (64,5): Each input(5images) has output(5values), i.e, one int value for one image.
Question:
Training runs successfully for the following last layer scenario:
1)Dense(5) and 2)Dense(1)
and should ideally fail for all other Dense(n) layer with a shape mismatch error.
How does Dense(1) work if I have 5 fixed reference values per input? How are the losses and final value calculated. What is this final value in this scenario? Is it just average of all 5 (or) n values?
Kindly help me better understand this.
I'm using python2 with keras, tensorflow.
x = Input((32,), name="input1")
I think x's shape is (32,) but print(x) 's result is that 'shape(?,32)'.
What is means of 'shape(?,32)'?
And '?' means what and 32 means what..?
When you define tour input with Input((32,), name="input1") you are telling Keras that each input will be 1-dimensional with size 32. However you might send more than one input during training/predicting. For example if you send in 10 samples, each with length 32, you will actually send in a tensor with shape (10, 32).
Since the topology of the network is not dependent on the number of samples you send in, the shape may vary and is presented as (?,32) where ? is the number of samples.
I'm working on a problem using Keras that has been presenting me with issues:
My X data is all of shape (num_samples, 8192, 8), but my Y data is of shape (num_samples, 4), where 4 is a one-hot encoded vector.
Both X and Y data will be run through LSTM layers, but the layers are rejecting the Y data because it doesn't match the shape of the X data.
Is padding the Y data with 0s so that it matches the dimensions of the X data unreasonable? What kind of effects would that have? Is there a better solution?
Edited for clarification:
As requested, here is more information:
My Y data represents the expected output of passing the X data through my model. This is my first time working with LSTMs, so I don't have an architecture in mind, but I'd like to use an architecture that works well with classifying long (8192-length) sequences of words into one of several categories. Additionally, the dataset that I have is of an immense size when fed through an LSTM, so I'm currently using batch-training.
Technologies being used:
Keras (Tensorflow Backend)
TL;DR Is padding one tensor with zeroes in all dimensions to match another tensor's shape a bad idea? What could be a better approach?
First of all, let's make sure your representation is actually what you think it is; the input to an LSTM (or any recurrent layer, for that matter) must be of dimensionality: (timesteps, shape), i.e. if you have 1000 training samples, each consisting of 100 timesteps, with each timestep having 10 values, your input shape will be (100,10,). Therefore I assume from your question that each input sample in your X set has 8192 steps and 8 values per step. Great; a single LSTM layer can iterate over these and produce 4-dimensional representations with absolutely no problem, just like so:
myLongInput = Input(shape=(8192,8,))
myRecurrentFunction = LSTM(4)
myShortOutput = myRecurrentFunction(myLongInput)
myShortOutput.shape
TensorShape([Dimension(None), Dimension(4)])
I assume your problem stems from trying to apply yet another LSTM on top of the first one; the next LSTM expects a tensor that has a time dimension, but your output has none. If that is the case, you'll need to let your first LSTM also output the intermediate representations at each time step, like so:
myNewRecurrentFunction=LSTM(4, return_sequences=True)
myLongOutput = myNewRecurrentFunction(myLongInput)
myLongOutput.shape
TensorShape([Dimension(None), Dimension(None), Dimension(4)])
As you can see the new output is now a 3rd order tensor, with the second dimension now being the (yet unassigned) timesteps. You can repeat this process until your final output, where you usually don't need the intermediate representations but rather only the last one. (Sidenote: make sure to set the activation of your last layer to a softmax if your output is in one-hot format)
On to your original question, zero-padding has very little negative impact on your network. The network will strain itself a bit in the beginning trying to figure out the concept of the additional values you have just thrown at it, but will very soon be able to learn they're meaningless. This comes at a cost of a larger parameter space (therefore more time and memory complexity), but doesn't really affect predictive power most of the time.
I hope that was helpful.
I've been going through the docs recently and in many different functions like tf.layers.dense or the tf.nn.conv2d, I came across with the arguments units and filters respectively and I can't understand the point of them. Can someone clearly describe the meaning of
dimensionality of the output space
in the above cases or maybe more general terms? Thanks in advance.
from my opinion:
units in tf.layers.dense:
means that how many output nodes of dense layer should be returned.
Because the fully connected layer(dense layer) should consist of input and output.
Then , the mean of dimensionality of the output space could be translated to the number of ouput nodes.
if the units = 1 , it means all the input nodes connected to one output nodes
in inception v3 or other classifier model, we could found the units of dense layer always be the classifier number.
filters in tf.nn.conv2d:
like the state in api doc :
filter: A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]
maybe the confused point is out_channels
for out_channels , I try to understand it as how many filters we want to scan the input tensors.
so out_channels is regarded as the number of kernel.
I'm following udacity MNIST tutorial and MNIST data is originally 28*28 matrix. However right before feeding that data, they flatten the data into 1d array with 784 columns (784 = 28 * 28).
For example,
original training set shape was (200000, 28, 28).
200000 rows (data). Each data is 28*28 matrix
They converted this into the training set whose shape is (200000, 784)
Can someone explain why they flatten the data out before feeding to tensorflow?
Because when you're adding a fully connected layer, you always want your data to be a (1 or) 2 dimensional matrix, where each row is the vector representing your data. That way, the fully connected layer is just a matrix multiplication between your input (of size (batch_size, n_features)) and the weights (of shape (n_features, n_outputs)) (plus the bias and the activation function), and you get an output of shape (batch_size, n_outputs). Plus, you really don't need the original shape information in a fully connected layer, so it's OK to lose it.
It would be more complicated and less efficient to get the same result without reshaping first, that's why we always do it before a fully connected layer. For a convolutional layer, on the opposite, you'll want to keep the data in original format (width, height).
That is a convention with fully connected layers. Fully connected layers connect every node in the previous layer with every node in the successive layer so locality is not an issue for this type of layer.
Additionally by defining the layer like this we can efficiently calculate the next step by calculating the formula: f(Wx + b) = y. This would not be as easily possible with multidimensional input and reshaping the input is low cost and easy to accomplish.