CNTK Python: Dense Layer output size doesn't match expecation? - cntk

I'm training the tutorials/language understanding model in CNTK/Python
def create_model():
with C.layers.default_options(initial_state=0.1):
return C.layers.Sequential([
C.layers.Embedding(emb_dim, name='embed'),
C.layers.Recurrence(C.layers.LSTM(hidden_dim), go_backwards=False),
C.layers.Dense(num_labels, name='classify')
])
model = model_func(x)
For some reason, model.eval(data)[0].shape is (2 * 16) not (1 * 16), where num_labels = 16. I'm very confused. Why is it 2 * 16 instead of 1 * 16, given the last layer is a dense layer with size = num_labels=16?
Thanks!

Most likely the data element that you are passing in has a shape (2, x), i.e. you are passing in multiple values for evaluation, so eval() is returning a prediction for each of the values you passed in to the model.

Related

How to specify input layer with Keras

I came across this code for tuning the topology of the neural network. However I am unsure of how I can instantiate the first layer without flatening the input.
My input is like this:
With M features (the rows) and N samples (the columns).
How can I create the first (input) layer?
# Initialize sequential API and start building model.
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
# Tune the number of hidden layers and units in each.
# Number of hidden layers: 1 - 5
# Number of Units: 32 - 512 with stepsize of 32
for i in range(1, hp.Int("num_layers", 2, 6)):
model.add(
keras.layers.Dense(
units=hp.Int("units_" + str(i), min_value=32, max_value=512, step=32),
activation="relu")
)
# Tune dropout layer with values from 0 - 0.3 with stepsize of 0.1.
model.add(keras.layers.Dropout(hp.Float("dropout_" + str(i), 0, 0.3, step=0.1)))
# Add output layer.
model.add(keras.layers.Dense(units=10, activation="softmax"))
I know that Keras usually instantiates the first hidden layer along with the input layer, but I don't see how I can do it in this framework. Below is the code for instantiating input + first hidden layer at once.
model.add(Dense(100, input_shape=(CpG_num,), kernel_initializer='normal', activation='relu')
If you have multiple inputs and want to set your input shape, let's suppose you have a dataframe with m-> rows, n-> columns... then simply do this...
m = no_of_rows #1000
n = no_of_columns #10
no_of_layers = 64
#we will not write m because m will be taken as a batch here.
_input = tf.keras.layers.Input(shape=(n))
dense = tf.keras.layers.Dense(no_of_layers)(_input)
output = tf.keras.backend.function(_input , dense)
#Now, I can see that it is working or not...!
x = np.random.randn(1000 , 10)
print(output(x).shape)

ValueError: Dimensions must be equal in Tensorflow/Keras

My codes are as follow:
v = tf.Variable(initial_value=v, trainable=True)
v.shape is (1, 768)
In the model:
inputs_sents = keras.Input(shape=(50,3))
inputs_events = keras.Input(shape=(50,768))
x_1 = tf.matmul(v,tf.transpose(inputs_events))
x_2 = tf.matmul(x_1,inputs_sents)
But I got an error,
ValueError: Dimensions must be equal, but are 768 and 50 for
'{{node BatchMatMulV2_3}} =
BatchMatMulV2[T=DT_FLOAT,
adj_x=false,
adj_y=false](BatchMatMulV2_3/ReadVariableOp,
Transpose_3)' with input shapes: [1,768], [768,50,?]
I think it takes consideration of the batch? But how shall I deal with this?
v is a trainable vector (or 2d array with first dimension being 1), I want it to be trained in the training process.
PS: This is the result I got using the codes provided by the first answer, I think it is incorrect cause keras already takes consideration of the first batch dimension.
Plus, from the keras documentation,
shape: A shape tuple (integers), not including the batch size. For instance, shape=(32,) indicates that the expected input will be batches of 32-dimensional vectors. Elements of this tuple can be None; 'None' elements represent dimensions where the shape is not known.
https://keras.io/api/layers/core_layers/input/
Should I rewrite my codes without keras?
The shape of a batch is denoted by None:
import numpy as np
inputs_sents = keras.Input(shape=(None,1,3))
inputs_events = keras.Input(shape=(None,1,768))
v = np.ones(shape=(1,768), dtype=np.float32)
v = tf.Variable(initial_value=v, trainable=True)
x_1 = tf.matmul(v,tf.transpose(inputs_events))
x_2 = tf.matmul(x_1,inputs_sents)

Two input layers for LSTM Neural Network?

I am now building a neural network, and I am facing the task of adding another input layer (since now I just needed one).
In particular, this was the code previously:
###...
if(self.net_embedding==0):
l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(l_input)
toBePassed=emb_input
elif(self.net_embedding==1):
self.getWord2VecEmbeddings(params['word2vec_size'])
X_train=self.encodePrefixes(params['word2vec_size'],X_train)
l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
toBePassed=l_input
l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(toBePassed)
l1 = BatchNormalization()(l1)
#and so on with the rest of the layers...
The input of the model (X_train) was just an array of arrays (with size = self.win_size) of integers (e.g. [[0 1 2 3] [1 2 3 4]...] if self.win_size = 4), where the integers represent categorical elements.
As you can see, I also have two types of embeddings for this input:
Embedding layer
Word2Vec encoding
Now, I need to add another input to the net, which is as well an array of arrays (with size = self.win_size again) of integers (eg. [[0 123 334 2212][123 334 2212 4888]...], but this time I don't need to apply any embedding (I think) because the elements here are not categorical (they represent elapsed time in seconds).
I tried by simply changing the net to:
#...
if(self.net_embedding==0):
l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(l_input)
toBePassed=emb_input
elif(self.net_embedding==1):
self.getWord2VecEmbeddings(params['word2vec_size'])
X_train=self.encodePrefixes(params['word2vec_size'],X_train)
l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
toBePassed=l_input
elapsed_time_input = Input(shape=self.win_size, name='input_time')
input_concat = Concatenate(axis=1)([toBePassed, elapsed_time_input])
l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(input_concat)
l1 = BatchNormalization()(l1)
#and so on with other layers...
but I get the error:
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 4, 12), (None, 4)]
Do you please have any solution for this? Any kind of help would be really appreciated, since I have a deadline in a few days and I'm smashing my head on this for so long now! Thanks :)
There are two problems with your approach.
First, inputs to LSTM should have a shape of (batch_size, num_steps, num_feats), yet your elapsed_time_input has shape (None, 4). You need to expand its dimension to get the proper shape (None, 4, 1).
elapsed_time_input = tf.keras.layers.Reshape((-1, 1))(elapsed_time_input)
or
elapsed_time_input = tf.expand_dims(elapsed_time_input, axis=-1)
With this, "elapsed time in seconds" will be seen as just another feature of a timestep.
Secondly, you'll want to concatenate the two inputs in the feature dimension (not the timestep dimension).
input_concat = Concatenate(axis=-1)([toBePassed, elapsed_time_input])
or
input_concat = Concatenate(axis=2)([toBePassed, elapsed_time_input])
After this, you'll get a keras tensor with a shape of (None, 4, 13). It represents a batch of time series, each having 4 timesteps and 13 features per step (12 original features + elapsed time in second for each step).

calculating the number of parameters of a GRU layer (Keras)

Why the number of parameters of the GRU layer is 9600?
Shouldn't it be ((16+32)*32 + 32) * 3 * 2 = 9,408 ?
or, rearranging,
32*(16 + 32 + 1)*3*2 = 9408
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=4500, output_dim=16, input_length=200),
tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
tf.keras.layers.Dense(6, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()
The key is that tensorflow will separate biases for input and recurrent kernels when the parameter reset_after=True in GRUCell. You can look at some of the source code in GRUCell as follow:
if self.use_bias:
if not self.reset_after:
bias_shape = (3 * self.units,)
else:
# separate biases for input and recurrent kernels
# Note: the shape is intentionally different from CuDNNGRU biases
# `(2 * 3 * self.units,)`, so that we can distinguish the classes
# when loading and converting saved weights.
bias_shape = (2, 3 * self.units)
Taking the reset gate as an example, we generally see the following formulas.
But if we set reset_after=True, the actual formula is as follows:
As you can see, the default parameter of GRU is reset_after=True in tensorflow2. But the default parameter of GRU is reset_after=False in tensorflow1.x.
So the number of parameters of a GRU layer should be ((16+32)*32 + 32 + 32) * 3 * 2 = 9600 in tensorflow2.
I figured out a little bit more about this, as an addition to the accepted answer. What Keras does in GRUCell.call() is:
With reset_after=False (default in TensorFlow 1):
With reset_after=True (default in TensorFlow 2):
After training with reset_after=False, b_xh equals b_hz, b_xr equals b_hrand b_xh equals b_hh, because (I assume) TensorFlow realizes that each of these pairs of vectors can be combined into one single parameter vector - just like the OP pointed out in a comment above. However, with reset_after=True, that's not the case for b_xh and b_hh - they can and will be different, so they can not be combined into one vector, and that's why the total parameter count is higher.

How to calculate input_dim for a keras sequential model?

Keras Dense layer needs an input_dim or input_shape to be specified. What value do I put in there?
My input is a matrix of 1,000,000 rows and only 3 columns. My output is 1,600 classes.
What do I put there?
dimensionality of the inputs (1000000, 1600)
2 because it's a 2D matrix
input_dim is the number of dimensions of the features, in your case that is just 3. The equivalent notation for input_shape, which is an actual dimensional shape, is (3,)
In your case
lets assume x and y=target variable and are look like as follows after feature engineering
x.shape
(1000000, 3)
y.shape
((1000000, 1600)
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=x.shape[1])) # Input layer
# now the model will take as input arrays of shape (*, 3)
# and output arrays of shape (*, 32)
...
...
model.add(Dense(y.shape[1],activation='softmax')) # Output layer
y.shape[1]= 1600, the number of output which is the number of classes you have, since you are dealing with Classification.
X = dataset.iloc[:, 3:13]
meaning the X parameter having all the rows and 3rd column till 12th column inclusive and 13th column exclusive.
We will also have a X0 parameter to be given to the neural network, so total
input layers becomes 10+1 = 11.
Dense(input_dim = 11, activation = 'relu', kernel_initializer = 'he_uniform')