Dropout only on specific column in Keras - tensorflow

I am training an autoencoder using keras,with the encoder part as :
self.encoder = tf.keras.Sequential()
self.encoder.add(tf.keras.layers.Dropout(rate=0.2))
self.encoder.add(layers.Dense(14, activation='relu'))
self.encoder.add(layers.Dense(10, activation='relu'))
I am using Dropout at the start to create noise.My input is a 14-dimensional dataset.What dropout does now is dropping randomly each time 20% of the nodes meaning dropping 20% of the features at each time.What i would like to do is drop a specific feature,let's say feature_3(i suppose this means dropping a specific node),with a probability of 20% in each training step.
Could this be done using Keras?
If yes then how?

I do think you misunderstand how Dropout works.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout
Your expectations is what dropout actually is. Also keras.layers.Dropout does not "create noise"
If you'd like to set the dropout mask:
noise_shape: 1D integer tensor representing the shape of the binary dropout mask that will be multiplied with the input. For instance, if your inputs have shape (batch_size, timesteps, features) and you want the dropout mask to be the same for all timesteps, you can use noise_shape=(batch_size, 1, features).
Note that noise_shape describes the behavior of the feature's dropout and is not related to adding/substracting noise to your features.

Related

Dropout implementation in tf.Keras

Consider the following model:
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Dense(30, activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
I understand the idea behind Dropout for regularization. According to my understanding, Dropout is applied per layer with a rate p which determines the probability of a neuron being dropped. In the above example, I cannot understand whether the first dropout layer is applied to the first hidden layer or the second hidden layer. Because as I have mentioned before, the dropout is applied per layer, and what confuses me here is that Keras deals with dropout as a layer on its own. Moreover, if the first dropout layer is applied to the second hidden layer, what about the second dropout layer? Is it applied to the output layer (which is not valid at all to apply dropout to output neurons)? So please can someone clarify these points?
As per documentation in keras:
Applies Dropout to the input.
Therefore the input layer to the drop out is dropped at a probability of p. In your case it means the first layer. In your example, 20% of 60 Neurons from first layer will be dropped.
Also it doesn't make sense if drop out works on the layer succeeding it because, in that way you will drop out from the last layer - which in classification can be the result.

Improve multiclass text classification model with LSTM and Glove, Keras and Tensorflow

I have spent some time trying to improve my F1-Score for my multiclass text classification task. I am extraction aspects and sentiments from laptop reviews. Therefore there are 3 labels, B_A / I_A / O etc. I would really appreciate any suggestions to improve my network, for example additional layers or another embedding. (Maybe I should also try something else than multiclass classification for my task)
Now I have got a F1-Score of about 60% for the following code:
#vocab_size=4840, embedding is glove6B, max_seq_length=100
model = Sequential()
model.add(Embedding(vocab_size, 300, weights=[embedding_vectors], input_length=max_seq_length,
trainable= False))
model.add(Dropout(0.1))
model.add(Conv1D(3000, 1, activation='relu'))
model.add(Bidirectional(LSTM(units=150, recurrent_dropout=0, return_sequences=True)))
model.add(Dense(32, activation='relu'))
model.add(Dense(n_tags, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["categorical_accuracy"])
model.summary()
# fit model on train data
model.fit(x_train, y_train,
batch_size=64,
epochs=10)
I don't know about the data, but I do have a lot of suggestions in general for mult-text classification with keras:
Instead of adding 1 3000 Conv1D layer, try adding multiple Conv1D layers of a smaller filtering amount
For the 32 neuron Dense layer, try increasing the amount of neurons. Often, when you don't have enough neurons in the layer before the output layer, the model loses accuracy
Instead of adding activation='relu' into the layers, instead try adding a LeakyReLU, so it would fix the dying ReLU problem if it is there
Instead of adding the Dropout after the Embedding layer, add the Dropout after the Conv1D layer. I wouldn't see the need for a Dropout after an untrainable layer made just for vectorizing inputs
If you haven't tried any of my suggestions already, I would recommend trying it. I especially would try the 4th one, as a Dropout after an Embedding layer doesn't seem neccessary.

Tensorfow-lite PReLU Fusion and TransposeConv Bias

When we convert a tf.keras model with PReLU with tf 1.15, the PReLU layers becomes ReLU and seem to get fused with previous operators. As a result, the keras h5 file of 28 MB becomes 1.3 MB in size.It looks like number of parameters gets significantly less since i did not use share weights axes option with PReLU. So, does this conversion work properly without any accuracy loss? Are the weights of PReLU discarded altogether? Similarly does the fusion take into account the bias of transpose convolution layers(bias is not mentioned as input property in netron). Do these fusions preserve the trained weight parameters internally and do they effect the inference accuracy of tflite?
Prelu Fusion:-
input = Input(shape=(512,512,3), name='ip')
x = Conv2D(filters=8, kernel_size=2, strides=2, padding='valid')(input)
x = PReLU()(x) # shared_axes not used
It shows prelu/ReLU in output property
Transpose conv:-
cout1 = Conv2DTranspose(filters=8, kernel_size=2, strides=2, padding = 'same' )(pout1) # Bias is true by default
It does not show bias in output property
So, does the fusion work properly by combining weights or are they being discarded?
If all the values in the weights are zeros it automatically discards them during fusion/conversion. So, PReLU became ReLU after fusion and transpose conv+bias became transpose conv. The problem arises when you convert a model to tflite format before training, since the weights have their default values(zeros).

Dropout layer before or after LSTM. What is the difference?

Suppose that we have an LSTM model for time series forecasting. Also, this is a multivariate case, so we're using more than one feature for training the model.
ipt = Input(shape = (shape[0], shape[1])
x = Dropout(0.3)(ipt) ## Dropout before LSTM.
x = CuDNNLSTM(10, return_sequences = False)(x)
out = Dense(1, activation='relu')(x)
We can add Dropout layer before LSTM (like the above code) or after LSTM.
If we add it before LSTM, is it applying dropout on timesteps (different lags of time series), or different input features, or both of them?
If we add it after LSTM and because return_sequences is False, what is dropout doing here?
Is there any different between dropout option in LSTM and dropout layer before LSTM layer?
As default, Dropout creates a random tensor of zeros an ones. No pattern, no privileged axis. So, you can't say a specific thing is being dropped, just random coordinates in the tensor. (Well, it drops features, but different features for each step, and differently for each sample)
You can, if you want, use the noise_shape property, which will define the shape of the random tensor. Then you can select if you want to drop steps, features or samples, or maybe a combination.
Dropping time steps: noise_shape = (1,steps,1)
Dropping features: noise_shape = (1,1, features)
Dropping samples: noise_shape = (None, 1, 1)
There is also the SpatialDropout1D layer, which uses noise_shape = (input_shape[0], 1, input_shape[2]) automatically. This drops the same feature for all time steps, but treats each sample individually (each sample will drop a different group of features).
After the LSTM you have shape = (None, 10). So, you use Dropout the same way you would use in any fully connected network. It drops a different group of features for each sample.
A dropout as an argument to the LSTM has a lot of differences. It generates 4 different dropout masks, for creating different inputs for each of the different gates. (You can see the LSTMCell code to check this).
Also, there is the option of recurrent_dropout, which will generate 4 dropout masks, but to be applied to the states instead of the inputs, each step of the recurrent calculations.
You are confusing Dropout with it's variant SpatialDropoutND (either 1D, 2D or 3D). See documentation (apparently you can't link specific class).
Dropout applies random binary mask to input, no matter the shape, except first dimension (batch), so it applies to features and and timesteps in this case.
Here, if return_sequences=False, you only get output from last timestep, so it would be of size [batch, 10] in your case. Dropout will randomly drop value from the second dimension
Yes, there is a difference, as dropout is for time steps when LSTM produces sequences (e.g. sequences of 10 goes through the unrolled LSTM and some of the features are dropped before going into the next cell). Dropout would drop random elements (except batch dimension). SpatialDropout1D would drop entire channels, in this case some timesteps would be entirely dropped out (in the convolution case, you could use SpatialDropout2D to drop channels, either input or along the network).

make the output more sparse

I trained a MLP typed neural network for a prediction model. The predicted value is shown as follows. Is that possible to let the predicted value become more sparse.I would like those points corresponding to small peaks (painted with yellow) are enforced to have more smaller values. In other words, I would like this predicted sequence has smaller number of peaks. I can add a threshold to do the similar work. But I prefer to let model learn it automatically. I tried L1 type of activity regularizer. But it did not help a lot.
model= Sequential()
model.add(Conv1D(60,32, strides=1, activation='relu',padding='causal',input_shape=(64,1)))
model.add(Conv1D(80,10, strides=1, activation='relu',padding='causal'))
#model.add(Conv1D(100,5, strides=1, activation='relu',padding='causal'))
model.add(MaxPooling1D(2))
model.add(Dense(300,activity_regularizer=regularizers.l1(0.01),activation='relu'))
model.add(Flatten())
model.add(Dense(1,activation='linear'))
If you think that the smaller peaks are caused by overfitting to the training data, you can try to add a Dropout layer instead of your activity regularizer. For example:
model.add(Dropout(0.2, input_shape=(300,)))