Changing the number of interneurons in Dense - tensorflow

I am novice in tensorflow and keras. I have the code below but I do not know why when I change 1 in dense to 10 (Dense(10)) I get error. I think I should be able to arbitrarily change the number of neurons in each layer. How should I change the number of neurons in dense? and if I want to add more dense latyers is there any rule for the number in dense?
model=Sequential()
model.add(Dense(1029, input_dim=29))
model.add(Activation('tanh'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
#odel.add(Dropout (0.2))
sgd=SGD(lr=0.1)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(input, target, steps_per_epoch=4, epochs=1000)
error:
ValueError: Error when checking target: expected activation_65 to have shape (10,) but got array with shape (1,)

I figured out the problem and I will post here for who might face the same issue. Th reason is that I need to have the last layer number of neurons equal to 1 according to my output. My input dimension is 1029 rows and 29 columns and my target is 1029 rows. I can add another layers of dense with arbitrary number of neurons.

Related

Tensorflow Keras output layer shape weird error

I am fairly new to TF, Keras and ML in general.
I am trying to implement a very simple MLP with an input shape of (batch_size,3,2) and an output shape of (batch_size,3), that is (if I got it right): for every 3x2 feature, there is a corresponding 3 value array label.
Here is how I create the model:
model = tf.keras.Sequential([
tf.keras.layers.Dense(50,tf.keras.activations.relu,input_shape=((3,2)),
tf.keras.layers.Dense(3)
])
and these are the X and y shapes:
X_train.shape,y_train.shape
TensorShape([64,3,2]),TensorShape([64,3])
On model.fit I am facing a weird error I cannot understand:
ValueError: Dimensions must be equal, but are 3 and 32 for ... with input shapes: [32,3,3] and [32,3]
I have no clue what's going on, I understand the batch size is 32, but where does that [32,3,3] comes from?
Moreover, if from the original 64, I lower the number (shapes) of X_train and y_train, say, to: (19,3,2) and (19,3), I get the following error instead:
InvalidArgumentError: required broadcastable shapes at loc(unknown)
What's even more weird for me is that if I specify a single unit for the output (last) layer, instead of 3 like this:
model = tf.keras.Sequential([
tf.keras.layers.Dense(50,tf.keras.activations.relu,input_shape=((3,2)),
tf.keras.layers.Dense(1)
])
model.fit works, but the predictions have shape (1,3,1) instead of my expected (3,)
I am very confused.
Whenever you have not any idea about the journey of data throughout your model, use model.summary() to see the details and what happens to the shape of data in each layer.
In this case, the input is a 2D array, and the output is a 1D array, and you just used dense layers. Dense layers can not handle 2d features in nature. For example for an image as input, you can not feed it directly to a dense layer. Instead you should use other layers such as Conv2D or Flatten your input (make it 1D) before feeding your data to the dense layer. Otherwise you will get the other dimension in the output.
Inference: If your input dimension and output dimension differs, somewhere in your model, the shape need to be changed. Most common ways to do so, is using a Flatten layer or GlobalAveragePooling and so on.
When you pass an input to a dense layer, the input should be flattened first. There are 2 ways to deal with this:
Way 1: Adding a flatten input as a first layer of your model:
model = Sequential()
model.add(Flatten(input_shape=(3,2)))
model.add(Dense(50, 'relu'))
model.add(Dense(3))
Way 2: Converting the 2D array to 1D before passing the inputs to your model:
X_train = tf.reshape(X_train, shape=([6]))
or
X_train = tf.reshape(X_train, shape=((6,)))
Then change the input shape of the first layer as:
model.add(Dense(50, 'relu', input_shape=(6,))

1D convolution on flat, one-dimensional data (i.e. no timeseries)

I am training in a dataset in which (some of) the neighboring features exhibit very strong correlations. In order to help the neural network, I am thinking of adding some 1D convolutions as the first layers. Even though 1D convolutions are mostly used to time series/nlp data, I see no theoretical reason why they cannot be used vector-wise in any type of data.
But I am not able to make keras.layers.Conv1D work, since its apparently designed for time-series data. A MRV example is the following:
model = keras.Sequential([
keras.layers.Input(10,),
keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1"),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss=losses.categorical_crossentropy, metrics=['accuracy'])
ValueError: Input 0 of layer conv_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 10]
In that, I believe the "found ndim=2" corresponds to a tensor of [batch_size, 10] while it expects a tensor of shape [series_length, batch_size, 10] (or some other way around).
My question is: Is there a way to make 1D convolutions work in this situation in keras?
Note 1: this SO question has the same problem, though without elaborating and the accepted answer does not solve the problem.
Note 2: I suppose I can convert each datapoint of my dataset to a 2D tensor of two rows where the second would be just 0's and use Conv2D's, but I would like to avoid that.
In all Ccnv layers in Keras there is one dimension defined for the number of channels. For example you can have an image which has 2 Dimensions but Conv2D needs 3 dimension (without batch). The reason is simply because the image can have one channel (gray scale) or 3 for example (colored). the same is true for a 1D signal which can be any signal with any number of channels. you can simply add one dimension to you data. if you have an numpy array:
data = data[:, np.newaxis, :] and setting channels_first keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1, data_format="channels_first"). you can do the same through adding extra dimension at the end and setting `data_format="channels_last"

np array shape for conv1d input

I have a model with conv1d as the first layer.
My data is time series data where each sample consists of 41 time steps where each time step has 4 features.
I have about 1000 samples.
I have specified the input shape of the conve1d layer to be (41,4) as it supposed to be.
However, I keep getting the following error: Input 0 is incompatible with layer conv1d_48: expected ndim=3, found ndim=2.
I suspect that the problem is that the shape of X is (1000,) while the shape of X[0] is (41,4). Has anyone encountered this problem?
Thanks.
l1=Input(shape=(41,4))
x=Conv1D(64,(4))(l1)
x=GlobalMaxPooling1D()(x)
x=Dense(1)(x)
model=Model(l1,x)
model.compile('rmsprop','binary_crossentropy',metrics=['acc'])
model.fit(X,y,32,10)
You defined an expected input on your Conv1D to be be 2D -> (41, 4)
But you give to it an input of shape (41,), be consistant in your definitions !
If you specify the input_shape in your Conv1D layer, you don't need to feed an Input layer to it.
Or you can change the shape of this Input layer to be consistant with this input_shape.

Why not use Flatten followed by a Dense layer instead of TimeDistributed?

I am trying to understand the Keras layers better. I am working on a sequence to sequence model where I embed a sentence and pass it to a LSTM that returns sequences. Hereafter, I want to apply a Dense layer to each timestep (word) in the sentence and it seems like TimeDistributed does the job for three-dimensional tensors like this case.
In my understanding, Dense layers only work for two-dimensional tensors and TimeDistributed just applies the same dense on every timestep in three dimensions. Could one then not simply flatten the timesteps, apply a dense layer and perform a reshape to obtain the same result or are these not equivalent in some way that I am missing?
Imagine you have a batch of 4 time steps, each containing a 3-element vector. Let's represent that with this:
Now you want to transform this batch using a dense layer, so you get 5 features per time step. The output of the layer can be represented as something like this:
You consider two options, a TimeDistributed dense layer, or reshaping as a flat input, apply a dense layer and reshaping back to time steps.
In the first option, you would apply a dense layer with 3 inputs and 5 outputs to every single time step. This could look like this:
Each blue circle here is a unit in the dense layer. By doing this with every input time step you get the total output. Importantly, these five units are the same for all the time steps, so you only have the parameters of a single dense layer with 3 inputs and 5 outputs.
The second option would involve flattening the input into a 12-element vector, applying a dense layer with 12 inputs and 20 outputs, and then reshaping that back. This is how it would look:
Here the input connections of only one unit are drawn for clarity, but every unit would be connected to every input. Here, obviously, you have many more parameters (those of a dense layer with 12 inputs and 20 outputs), and also note that each output value is influenced by every input value, so values in one time step would affect outputs in other time steps. Whether this is something good or bad depends on your problem and model, but it is an important difference with respect to the previous, where each time step input and output were independent. In addition to that, this configuration requires you to use a fixed number of time steps on each batch, whereas the previous works independently of the number of time steps.
You could also consider the option of having four dense layers, each applied independently to each time step (I didn't draw it but hopefully you get the idea). That would be similar to the previous one, only each unit would receive input connections only from its respective time step inputs. I don't think there is a straightforward way to do that in Keras, you would have to split the input into four, apply dense layers to each part and merge the outputs. Again, in this case the number of time steps would be fixed.
Dense layer can act on any tensor, not necessarily rank 2. And I think that TimeDistributed wrapper does not change anything in the way Dense layer acts. Just applying Dense layer to a tensor of rank 3 will do exactly the same as applying TimeDistributed wrapper of the Dense layer. Here is illustration:
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
model = Sequential()
model.add(Dense(5,input_shape=(50,10)))
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_5 (Dense) (None, 50, 5) 55
=================================================================
Total params: 55
Trainable params: 55
Non-trainable params: 0
_________________________________________________________________
model1 = Sequential()
model1.add(TimeDistributed(Dense(5),input_shape=(50,10)))
model1.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_3 (TimeDist (None, 50, 5) 55
=================================================================
Total params: 55
Trainable params: 55
Non-trainable params: 0
_________________________________________________________________
Adding to the above answers,
here are few pictures comparing the output shapes of the two layers. So when using one of these layers after LSTM(for example) would have different behaviors.
"Could one then not simply flatten the timesteps, apply a dense layer and perform a reshape to obtain the same result"
No, flattening timesteps into input dimensions (input_dim) is the wrong operation. As illustrated by yuva-rajulu if you flatten a 3D input (batch_size,timesteps,input_dim) = (1000,50,10), you end up with a flattened input (batch_size,input_dim)=(1000,500), resulting in a network architecture with timesteps interacting with each others (see jdehesa). This is not what is intended (i.e., we want to apply the same dense layer to each timestep independently).
What need to be done instead is to reshape the 3D input as (batch_size * timesteps, input_dim) = (50000,10), then apply the dense layer on this 2D input. That way the same dense layer will operate 50000 times on each input vector (10,1) independently. You will end up with a (50000,n_units) output that you should reshape back as a (1000,50,n_units) output. Fortunately, when you pass a 3D input to a dense layer keras does this automatically for you. See official reference:
"If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 0 of the kernel (using tf.tensordot). For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units)."
Another way to see it is that the way Dense() computes the output is simply by applying the kernel , i.e., weigth matrix of size (input_dim, n_units) to the last dimension of your 3D input, considering all other dimensions as similar to batch sizes, then size the output accordingly.
I think that they may have been a time when the TimeDistributed layer was needed in keras with Dense() discussion here. Today, we do not need the TimeDistributed wrapper as Dense() and TimeDistributed(Dense()) do exactly the same thing, see Andrey Kite Gorin or mujjiga.

Feed tensorflow or keras neural nets input with custom dimensions

I would like to feed a neural net inputs of following shape:
Each training entry is a 2D array with dimensions 700x10. There are in total 204 training entries.
Labels is just 1-dimensional array of size 204 (binary output)
I tried to just use Dense layers:
model = Sequential()
model.add(Dense(300, activation='relu', input_shape=(700, 10)))
model.add(Dense(1, activation='sigmoid'))
But then I am getting following error (not related to input_shape on the first layer, but during validation of output):
ValueError: Error when checking target: expected dense_2 to have 3 dimensions, but got array with shape (204, 1)
204 - amount of training data.
Stacktrace:
model.fit(xTrain, yTrain, epochs=4, batch_size=6)
File "keras\models.py", line 867, in fit
initial_epoch=initial_epoch)
File "keras\engine\training.py", line 1522, in fit
batch_size=batch_size)
File "keras\engine\training.py", line 1382, in _standardize_user_data
exception_prefix='target')
File "keras\engine\training.py", line 132, in _standardize_input_data
What I found out while debugging Keras code:
It fails during validation before training. It validates output array.
According to the neural network structure, first Dense layer produces somehow 700, 1 dimensional output and it fails afterwards, since my output is just 1-d array with 204 in it.
How do I overcome this issue? I tried to add Flatten() after Dense() layer, but it probably affects accuracy in a bad way: I would like to keep information specific to one point from 700 array grouped.
The Dense layers works on only one dimension, the last.
If you're inputting (700,10) to it, it will output (700,units). Check your model.summary() to see this.
A simple solution is to flatten your data before applying dense:
model.add(Flatten(input_shape=(700,10)))
model.add(Dense(300,...))
model.add(Dense(1,...))
This way, the Dense layer will see a simple (7000,) input.
Now if you do want your model to understand those 2 dimensions separately, you should perhaps try more elaborated structures. What to do will depend a lot on what your data is and what you want to do, how you want your model to understand it, etc.