why use dense in CNN? - tensorflow2.0

When I saw example code, the code was written as dense in layer.
Dense layer is just setting the number of nodes and use activation code I think.
so, I think use conv2d is better than use dense because conv2d can apply filter
why don`t use conv2d in code

Related

Multiple Activation Functions for multiple Layers (Neural Networks)

I have a binary classification problem for my neural network.
I already got good results using the ReLU activation function in my hidden layer and the sigmoid function in the output layer.
Now I'm trying to get even better results.
I added a second hidden layer with the ReLU activation function, and the results got even better.
I tried to use the leaky ReLU function for the second hidden layer instead of the ReLU function and got even better results, but I'm not sure if this is even allowed.
So I have something like that:
Hidden layer 1: ReLU activation function
Hidden layer 2: leaky ReLU activation function
Hidden layer 3: sigmoid activation function
I can't find many resources on it, and those I found always use the same activation function on all hidden layers.
If you mean the Leaky ReLU, I can say that, in fact, the Parametric ReLU (PReLU) is the activation function that generalizes the tradional rectified unit as well as the leaky ReLU. And yes, PReLU impoves model fitting with no significant extra computational cost and little overfitting risk.
For more details, you can check out this link Delving Deep into Rectifiers

Keras input layer

I am unsure if I need to add a Dense input layer before adding LSTM layers in my model. Forexample, with the following model:
# Model
model = Sequential()
model.add(LSTM(128, input_shape=(train_x.shape[1], train_x.shape[2])))
model.add(Dense(5, activation="linear"))
Will the LSTM layer be the input layer, and the Dense layer the output layer (meaning no hidden layers)? Or does Keras create an input layer meaning the LSTM layer will be a hidden layer?
You don't need too. It depends on what you want to accomplish.
Check here some cases.
In your case, yes the LSTm will be the first layer and the Dense layer will be the output layer.
The current configuration is okay for simple examples. Everything is based on what you want to get as results. The model and layers subject to change based on the target goal. So, if the model is complex, you can make mix model with different layers and shapes. See reference.
Mix model layering

Tensorflow: What is the equivalent of dense layer from Tensorflow in Caffe?

In tensorflow in order to add a dense layer I have:
model.add(Dense(1024, activation='selu'))
How can I pass this Dense Layer in Keras To Caffe?
Dense layer is called InnerProduct in caffe Look at this for a sample code
Also check this link to help you with other types of layers

Does Tensorflows tf.layers.dense flatten input dimensions?

I'm searching for a data leak in my model. I'm using tf.layers.dense before a masking operation and am concerned that the model could just learn to switch positions in the middle dimension of my input tensor.
When I have an input tensor x = tf.ones((2,3,4)) would tf.layers.dense(x,8) flatten x to a fully connected layer with 2*3*4=24 input neurons and 2*3*8=48 output neurons then reshape it again to [2,3,8], or would it create 2*3=6 fully connected layers with 4 input and 8 output neurons then concatenate them?
As for the Keras Dense layer, it has been already mentioned in another answer that its input is not flattened and instead, it is applied on the last axis of its input.
As for the TensorFlow Dense layer, it is actually inherited from Keras Dense layer and as a result, same as Keras Dense layer, it is applied on the last axis of its input.

Dense final layer vs. another rnn layer

It is common to add a dense fully-connected layer as the last layer on top of a recurrent neural network (which has one or more layers) in order to learn the reduction to the final output dimensionality.
Let's say I need one output with a -1 to 1 range, in which case I would use a dense layer with a tanh activation function.
My question is: Why not add another recurrent layer instead with an internal size of 1?
It will be different (in the sense of propagating that through time) but will it have a disadvantage over the dense layer?
If I understand correctly the two alternatives you present do the exact same computation, so they should behave identically.
In TensorFlow, if you're using dynamic_rnn, it's much easier if all time steps are identical, though, hence processing the output instead of having a different last step.