If I wanted to make a model that would take a single number and then just output a single number (not a linear relationship, not sure what kind), how would I shape the input and output layers, and what kind of loss/optimizer/activation functions should I use? Thanks.
Your question includes many things. What i will highly recommand you to
understand
Regression based problem
Classification based problem
Based on that you need to figure out which activation function or loss function or optimizer you need to use because for regression and classification those are different. Try to figure out things one after another.
For input/ouput see THIS
You have only one feature as input then the model based on,
Classification based Problem,
Loss Function - categorical_crossentropy || sparse_categorical_crossentropy
optimizer - Adam
output layer - number of class need to predict
output activation - softmax
model = tf.keras.Sequential()
model.add(layers.Dense(8, activation='relu', input_shape = (1, ))) #input shape as 1
model.add(layers.Dense(3, activation='softmax')) #3 is number of class
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
Regression based Problem,
Loss Function - mean_square_error
optimizer - Adam
output layer - 1
output activation - default (relu)
model = tf.keras.Sequential()
model.add(layers.Dense(8, activation='relu', input_shape = (1, ))) #input shape as 1
model.add(layers.Dense(1)) #1 is number of output
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='mean_square_error', metrics=['accuracy'])
Binary based Problem (0 or 1),
Loss Function - binary_crossentropy
optimizer - Adam
output activation - sigmoid
output layer - 1
model = tf.keras.Sequential()
model.add(layers.Dense(8, activation='relu', input_shape = (1, ))) #input shape as 1
model.add(layers.Dense(1, activation='sigmoid')) #1 is number of output
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])
Related
I am trying to implement a dense layer in keras. The input is EEG recording using 2 channels, each of them consist of a vector of 8 points and the total number of training points is 17. The y is also 17 points.
I used
x=x.reshape(17,2,8,1)
y=y.reshape(17,1,1,1)
model.add(Dense(1, input_shape=(2,8,1), activation='relu'))
print(model.summary())
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
print(model.compile)
model.fit(x, y, batch_size = 17,epochs=500, verbose=1)
but i get the following error
Error when checking target: expected dense_57 to have shape (2, 8, 1) but got array with shape (17, 1, 1)
Since the Dense layer has output dimension 1, it would expect y to be of the shape (2, 8, 1). An easy fix would be to do the following
x = x.reshape(17, 16)
y = y.reshape(17, 1)
model.add(Dense(1, input_shape=(16,), activation='relu'))
I tried to apply ReLU and PReLU with a CNN layer to compare the results and I tried these code:
with ReLU:
model.add(Conv1D(filters, kernel_size, activation='relu'))
with PReLU:
model.add(Conv1D(filters, kernel_size))
model.add(PReLU())
Does the Conv1D layer use the PReLU as the activation function?
I doubt because I printed the model summary and it shows separate layers between CNN and PReLU with a different number of parameters, meanwhile the CNN layer with the ReLU function they are in the same layer.
If I used the wrong code, how can I correct it?
Yes, the Conv1D layer will use the PReLu activation function. When you are defining a Conv2D layer like,
x = tf.keras.layers.Conv2D( 13 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' )( inputs )
The above statement is equivalent to,
x = tf.keras.layers.Conv2D( 13 , kernel_size=( 3 , 3 ) , strides=1 )( inputs )
x = tf.keras.layers.Activation( 'relu' )( x )
The reason for providing activation functions as separate layers is that sometimes we'll need to add our logic to the feature maps before passing the feature maps to the activation function.
For instance, a BatchNormalization layer is added before passing the feature maps to the activation function,
x = tf.keras.layers.Conv2D( 13 , kernel_size=( 3 , 3 ) , strides=1 )( inputs )
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation( 'relu' )( x )
Coming back to your question,
Some special activation functions like elu, LeakyReLU and PReLU are added as separate layers and we can't include them in the Conv1D layers using the activation= argument.
Regarding the trainable parameters, the conv1d_18 layer has 15050 parameters which form the kernel in 1D convolution. These parameters have nothing to do with the activation function.
The 4900 parameters of PReLU are the slope parameters which are optimized with backpropagation. These parameters, along with kernel weights, will update with every batch and hence are included in trainable parameters.
So, the outputs ( unactivated ) of the Conv1D layer will pass through the PReLU activation which indeed uses the slope parameter to calculate the activated outputs.
Why the number of parameters of the GRU layer is 9600?
Shouldn't it be ((16+32)*32 + 32) * 3 * 2 = 9,408 ?
or, rearranging,
32*(16 + 32 + 1)*3*2 = 9408
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=4500, output_dim=16, input_length=200),
tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
tf.keras.layers.Dense(6, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()
The key is that tensorflow will separate biases for input and recurrent kernels when the parameter reset_after=True in GRUCell. You can look at some of the source code in GRUCell as follow:
if self.use_bias:
if not self.reset_after:
bias_shape = (3 * self.units,)
else:
# separate biases for input and recurrent kernels
# Note: the shape is intentionally different from CuDNNGRU biases
# `(2 * 3 * self.units,)`, so that we can distinguish the classes
# when loading and converting saved weights.
bias_shape = (2, 3 * self.units)
Taking the reset gate as an example, we generally see the following formulas.
But if we set reset_after=True, the actual formula is as follows:
As you can see, the default parameter of GRU is reset_after=True in tensorflow2. But the default parameter of GRU is reset_after=False in tensorflow1.x.
So the number of parameters of a GRU layer should be ((16+32)*32 + 32 + 32) * 3 * 2 = 9600 in tensorflow2.
I figured out a little bit more about this, as an addition to the accepted answer. What Keras does in GRUCell.call() is:
With reset_after=False (default in TensorFlow 1):
With reset_after=True (default in TensorFlow 2):
After training with reset_after=False, b_xh equals b_hz, b_xr equals b_hrand b_xh equals b_hh, because (I assume) TensorFlow realizes that each of these pairs of vectors can be combined into one single parameter vector - just like the OP pointed out in a comment above. However, with reset_after=True, that's not the case for b_xh and b_hh - they can and will be different, so they can not be combined into one vector, and that's why the total parameter count is higher.
Keras Dense layer needs an input_dim or input_shape to be specified. What value do I put in there?
My input is a matrix of 1,000,000 rows and only 3 columns. My output is 1,600 classes.
What do I put there?
dimensionality of the inputs (1000000, 1600)
2 because it's a 2D matrix
input_dim is the number of dimensions of the features, in your case that is just 3. The equivalent notation for input_shape, which is an actual dimensional shape, is (3,)
In your case
lets assume x and y=target variable and are look like as follows after feature engineering
x.shape
(1000000, 3)
y.shape
((1000000, 1600)
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=x.shape[1])) # Input layer
# now the model will take as input arrays of shape (*, 3)
# and output arrays of shape (*, 32)
...
...
model.add(Dense(y.shape[1],activation='softmax')) # Output layer
y.shape[1]= 1600, the number of output which is the number of classes you have, since you are dealing with Classification.
X = dataset.iloc[:, 3:13]
meaning the X parameter having all the rows and 3rd column till 12th column inclusive and 13th column exclusive.
We will also have a X0 parameter to be given to the neural network, so total
input layers becomes 10+1 = 11.
Dense(input_dim = 11, activation = 'relu', kernel_initializer = 'he_uniform')
I am trying to rewrite a piece of tflearn code using Keras.
The goals is to combine two inputs where one input skips the first layer. The following code works in tflearn:
# Two different inputs.
inputs = tflearn.input_data(shape=[None, 10])
action = tflearn.input_data(shape=[None, 10])
#First layer used only by the inputs
net = tflearn.fully_connected(inputs, 400, activation='relu')
# Add the action tensor in the 2nd hidden layer
# Use two temp layers to get the corresponding weights and biases
t1 = tflearn.fully_connected(net, 300)
t2 = tflearn.fully_connected(action, 300)
# Combine the two layers using the weights from t1 and t2 and the bias from t2
net = tflearn.activation(tf.matmul(net,t1.W) + tf.matmul(action, t2.W) + t2.b, activation='relu')
I am trying to replicate this code in Keras using the following code:
# Two different inputs.
inputs = tf.placeholder(tf.float32, [None, 10])
action = tf.placeholder(tf.float32, [None, 10])
#First layer used only by the inputs
t1 = Sequential()
t1.add(Dense(400, activation='relu', input_shape=(1,10)))
# Add the action tensor in the 2nd hidden layer
# Use two temp layers to get the corresponding weights and biases
t1.add(Dense(300))
t2 = Sequential()
t2.add(Dense(300, input_shape=(1,10)))
# Combine the two layers
critnet = Sequential()
critnet.add(Merge([t1, t2], mode='sum'))
critnet.add(Activation('relu'))
# Create the net using the inputs and action placeholder
net = critnet([inputs, action])
The code in keras behaves differently. How to combine two layers in keras in order to get the same result as in tflearn?
You could use a Lambda layer take takes your 2 layers as input and using keras.backend to merge them the same way. I think there is K.dot for matmul.