Dependent hyperparameters with keras tuner - tensorflow

My goal is to tune over possible network architectures that meet the following criteria:
Layer 1 can have any number of hidden units from this list: [32, 64, 128, 256, 512]
Then, the number of hidden units to be explored for the rest of the layers should always depend on the particular selection that was made in the layer above it, specifically:
Layer 2 can have the same or half as many units as layer 1.
Layer 3 can have the same or half as many units as layer 2.
Layer 4 can have the same or half as many units as layer 3.
As I am currently implementing it, the hp.Choice options for layers 2, 3 and 4 are never updating once they have been established for the first time.
For example, pretend on the first pass of the tuner num_layers = 4 which means all four layers will get created. If, for example, layer 1 selects 256 hidden units, the options become:
Layer 2 --> [128, 256]
Layer 3 --> [64, 128]
Layer 4 --> [32, 64]
Layers 2, 3 and 4 stay stuck with these choices for every iteration that follows, rather than updating to adapt to future selections for layer 1.
This means in future iterations when the number of hidden units in layer 1 changes, the options for layers 2, 3 and 4 no longer meet the intended goal of exploring options where each subsequent layer can either contain the same or half as many hidden units as the previous layer.
def build_and_tune_model(hp, train_ds, normalize_features, ohe_features, max_tokens, passthrough_features):
all_inputs, encoded_features = get_all_preprocessing_layers(train_ds,
normalize_features=normalize_features,
ohe_features=ohe_features,
max_tokens=max_tokens,
passthrough=passthrough_features)
# Possible values for the number of hidden units in layer 1.
# Defining here because we will always have at least 1 layer.
layer_1_hidden_units = hp.Choice('layer1_hidden_units', values=[32, 64, 128, 256, 512])
# Possible number of layers to include
num_layers = hp.Choice('num_layers', values=[1, 2, 3, 4])
print("================= starting new round =====================")
print(f"Layer 1 hidden units = {hp.get('layer1_hidden_units')}")
print(f"Num layers is {hp.get('num_layers')}")
all_features = layers.concatenate(encoded_features)
x = layers.Dense(layer_1_hidden_units,
activation="relu")(all_features)
if hp.get('num_layers') >= 2:
with hp.conditional_scope("num_layers", [2, 3, 4]):
# Layer 2 hidden units can either be half the layer 1 hidden units or the same.
layer_2_hidden_units = hp.Choice('layer2_hidden_units', values=[(int(hp.get('layer1_hidden_units') / 2)),
hp.get('layer1_hidden_units')])
print("\n==========================================================")
print(f"In layer 2")
print(f"num_layers param = {hp.get('num_layers')}")
print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
print("==============================================================\n")
x = layers.Dense(layer_2_hidden_units,
activation="relu")(x)
if hp.get('num_layers') >= 3:
with hp.conditional_scope("num_layers", [3, 4]):
# Layer 3 hidden units can either be half the layer 2 hidden units or the same.
layer_3_hidden_units = hp.Choice('layer3_hidden_units', values=[(int(hp.get('layer2_hidden_units') / 2)),
hp.get('layer2_hidden_units')])
print("\n==========================================================")
print(f"In layer 3")
print(f"num_layers param = {hp.get('num_layers')}")
print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
print(f"layer_3_hidden_units = {hp.get('layer3_hidden_units')}")
print("==============================================================\n")
x = layers.Dense(layer_3_hidden_units,
activation="relu")(x)
if hp.get('num_layers') >= 4:
with hp.conditional_scope("num_layers", [4]):
# Layer 4 hidden units can either be half the layer 3 hidden units or the same.
# Extra stipulation applied here, layer 4 hidden units can never be less than 8.
layer_4_hidden_units = hp.Choice('layer4_hidden_units', values=[max(int(hp.get('layer3_hidden_units') / 2), 8),
hp.get('layer3_hidden_units')])
print("\n==========================================================")
print(f"In layer 4")
print(f"num_layers param = {hp.get('num_layers')}")
print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
print(f"layer_3_hidden_units = {hp.get('layer3_hidden_units')}")
print(f"layer_4_hidden_units = {hp.get('layer4_hidden_units')}")
print("==============================================================\n")
x = layers.Dense(layer_4_hidden_units,
activation="relu")(x)
output = layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(all_inputs, output)
model.compile(optimizer=tf.keras.optimizers.Adam(),
metrics = ['accuracy'],
loss='binary_crossentropy')
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>> End of round <<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
return model
Does anyone know the correct way to tell Keras Tuner to explore all possible options for each layers hidden units, where the area to explore satisfies the criteria that each layer after the first is allowed to have the same or half as many hidden units as the previous layer, and the first layer can have a number hidden units from the list [32, 64, 128, 256, 512]?

For this we first need to understand how the hyper parameters and their values are getting selected, before control reaches to our application, Keras tuner selects all the active hyper parameters from the hyper parameter space, an active hyper parameter means it’s associated condition is getting satisfied(note: by default hyper parameters don't have any condition assigned to them) and then Keras tuner will generate random value from a list of values associated to each active hyper-parameter, that means selection of hyper parameter and it’s value is already done before control reaches to our application, in our application it just pulls the already generated value, that's why you will always see the hyper parameters never updating once they have been established for the first time.
In your case, let's consider a scenario, let's say in first trial, it generates 256 as unit count for first layer then below code will create a hyper parameter 'layer2_hidden_units' for second layer with possible set of values as [128, 256]
layer_2_hidden_units = hp.Choice('layer2_hidden_units', values=[(int(hp.get('layer1_hidden_units') / 2)), hp.get('layer1_hidden_units')])
In the second trial, before reaching control to your application, it has already taken a value from the list [128, 256], let's say 128, so the value of hyper-parameter 'layer2_hidden_units' will be 128 and then at your application it just pulls the already generated value.
Solution to your query is to generate hyper parameter dynamically like below
hidden_units = hp.Choice('units_layer_' + str(layer_index), values=[(int(hp.get('layer1_hidden_units') / 2)), hp.get('layer1_hidden_units')])
# where
# hp.get('layer1_hidden_units') = 256 and layer_index = 2
# or hp.get('layer1_hidden_units') = 128 and layer_index = 1
# and so on...
Now let's take our already discussed scenario, where Keras tuner selected 256 as unit count for first layer in first trial, then for the same trial above code will allow Keras tuner to set hyper parameters for remaining layers as hidden_units_layer_2 = [128, 256], hidden_units_layer_1 = [64, 128], hidden_units_layer_0 = [32, 64]
But now we will face second challenge, it will always activate all hyper parameters in forthcoming trials although some of them will not be required, for example in second trial if the selected unit count for first layer is 64 then also it will activate the hidden_units_layer_2=[128, 256] and hidden_units_layer_1=[64, 128], that means now we need to disable them by adding them under condition scope as below
with hp.conditional_scope(parent_units_name, parent_units_value):
hidden_units = hp.Choice(child_units_name, values=child_units_value)
The final code will look as below
# List possible units
possible_units = [32, 64, 128, 256, 512]
possible_layer_units = []
for index, item in enumerate(possible_units[:-1]):
possible_layer_units.append([item, possible_units[index + 1]])
# possible_layer_units = [[32, 64], [64, 128], [128, 256], [256, 512]]
# where list index represent layer number
# and list element represent list of unit possibilities for each layer
first_layer_units = hp.Choice('first_layer_units', values=possible_units)
# Then add first layer
all_features = layers.concatenate(encoded_features)
x = layers.Dense(first_layer_units, activation="relu")(all_features)
# Get the number of hidden layers based on first layer unit count
hidden_layer_count = possible_units.index(first_layer_units)
if 0 < hidden_layer_count:
iter_count = 0
for hidden_layer_index in range(hidden_layer_count - 1, -1, -1):
if iter_count == 0:
# Collect HP 'units' details for the second layer
# Suppose first_layer_units = 512, then
# HP example: <units_layer_43=[256, 512] condition={first_layer_units:[256, 512]}>
# where for units_layer_43, 4 indicates there will be total 5 layers and 3 indicates 4th layer from last
# we are using total hidden layer count in HP name to avoid an issue while getting the unit count value.
parent_units_name = 'first_layer_units'
parent_units_value = possible_layer_units[hidden_layer_index]
child_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index)
child_units_value = parent_units_value
else:
# Collect HP 'units' details for the next layers
# Suppose units_layer_43 = 256, then
# HP example: <units_layer_42=[128, 256] condition={units_layer_43:[256, 512]}>
parent_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index + 1)
parent_units_value = possible_layer_units[hidden_layer_index + 1]
child_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index)
child_units_value = possible_layer_units[hidden_layer_index]
# Add and Activate child HP under parent HP using conditional scope
with hp.conditional_scope(parent_units_name, parent_units_value):
hidden_units = hp.Choice(child_units_name, values=child_units_value)
# Add remaining NN layers one by one
x = layers.Dense(hidden_units, activation="relu")(x)
iter_count += 1
So this way only those hyper-parameters will get activated for which the associated condition gets satisfied, hence in our case if in second trial and for first layer the selected unit count is 64 then hyper-parameters 'units_layer_2' and 'units_layer_1' will be disabled because of the conditional scope and only hyper parameter 'units_layer_0' will be kept as active.

Related

How to specify input layer with Keras

I came across this code for tuning the topology of the neural network. However I am unsure of how I can instantiate the first layer without flatening the input.
My input is like this:
With M features (the rows) and N samples (the columns).
How can I create the first (input) layer?
# Initialize sequential API and start building model.
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
# Tune the number of hidden layers and units in each.
# Number of hidden layers: 1 - 5
# Number of Units: 32 - 512 with stepsize of 32
for i in range(1, hp.Int("num_layers", 2, 6)):
model.add(
keras.layers.Dense(
units=hp.Int("units_" + str(i), min_value=32, max_value=512, step=32),
activation="relu")
)
# Tune dropout layer with values from 0 - 0.3 with stepsize of 0.1.
model.add(keras.layers.Dropout(hp.Float("dropout_" + str(i), 0, 0.3, step=0.1)))
# Add output layer.
model.add(keras.layers.Dense(units=10, activation="softmax"))
I know that Keras usually instantiates the first hidden layer along with the input layer, but I don't see how I can do it in this framework. Below is the code for instantiating input + first hidden layer at once.
model.add(Dense(100, input_shape=(CpG_num,), kernel_initializer='normal', activation='relu')
If you have multiple inputs and want to set your input shape, let's suppose you have a dataframe with m-> rows, n-> columns... then simply do this...
m = no_of_rows #1000
n = no_of_columns #10
no_of_layers = 64
#we will not write m because m will be taken as a batch here.
_input = tf.keras.layers.Input(shape=(n))
dense = tf.keras.layers.Dense(no_of_layers)(_input)
output = tf.keras.backend.function(_input , dense)
#Now, I can see that it is working or not...!
x = np.random.randn(1000 , 10)
print(output(x).shape)

Two input layers for LSTM Neural Network?

I am now building a neural network, and I am facing the task of adding another input layer (since now I just needed one).
In particular, this was the code previously:
###...
if(self.net_embedding==0):
l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(l_input)
toBePassed=emb_input
elif(self.net_embedding==1):
self.getWord2VecEmbeddings(params['word2vec_size'])
X_train=self.encodePrefixes(params['word2vec_size'],X_train)
l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
toBePassed=l_input
l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(toBePassed)
l1 = BatchNormalization()(l1)
#and so on with the rest of the layers...
The input of the model (X_train) was just an array of arrays (with size = self.win_size) of integers (e.g. [[0 1 2 3] [1 2 3 4]...] if self.win_size = 4), where the integers represent categorical elements.
As you can see, I also have two types of embeddings for this input:
Embedding layer
Word2Vec encoding
Now, I need to add another input to the net, which is as well an array of arrays (with size = self.win_size again) of integers (eg. [[0 123 334 2212][123 334 2212 4888]...], but this time I don't need to apply any embedding (I think) because the elements here are not categorical (they represent elapsed time in seconds).
I tried by simply changing the net to:
#...
if(self.net_embedding==0):
l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(l_input)
toBePassed=emb_input
elif(self.net_embedding==1):
self.getWord2VecEmbeddings(params['word2vec_size'])
X_train=self.encodePrefixes(params['word2vec_size'],X_train)
l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
toBePassed=l_input
elapsed_time_input = Input(shape=self.win_size, name='input_time')
input_concat = Concatenate(axis=1)([toBePassed, elapsed_time_input])
l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(input_concat)
l1 = BatchNormalization()(l1)
#and so on with other layers...
but I get the error:
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 4, 12), (None, 4)]
Do you please have any solution for this? Any kind of help would be really appreciated, since I have a deadline in a few days and I'm smashing my head on this for so long now! Thanks :)
There are two problems with your approach.
First, inputs to LSTM should have a shape of (batch_size, num_steps, num_feats), yet your elapsed_time_input has shape (None, 4). You need to expand its dimension to get the proper shape (None, 4, 1).
elapsed_time_input = tf.keras.layers.Reshape((-1, 1))(elapsed_time_input)
or
elapsed_time_input = tf.expand_dims(elapsed_time_input, axis=-1)
With this, "elapsed time in seconds" will be seen as just another feature of a timestep.
Secondly, you'll want to concatenate the two inputs in the feature dimension (not the timestep dimension).
input_concat = Concatenate(axis=-1)([toBePassed, elapsed_time_input])
or
input_concat = Concatenate(axis=2)([toBePassed, elapsed_time_input])
After this, you'll get a keras tensor with a shape of (None, 4, 13). It represents a batch of time series, each having 4 timesteps and 13 features per step (12 original features + elapsed time in second for each step).

How to map input image with neurons in first conv layer in CNN?

I just completed ANN course and started learning CNN. I have basic understanding of padding and stride operation works in CNN.
But have difficultly in mapping input image with neurons in first conv layer but i have basic
understanding of how input features are mapped to first hidden layer in ANN.
What is best way of understanding mapping between input image with neurons in first conv layer?
How can I clarify my doubts about the below code example? Code is taken from DL course in Coursera.
def initialize_parameters():
"""
Initializes weight parameters to build a neural network with tensorflow. The shapes are:
W1 : [4, 4, 3, 8]
W2 : [2, 2, 8, 16]
Returns:
parameters -- a dictionary of tensors containing W1, W2
"""
tf.set_random_seed(1) # so that your "random" numbers match ours
### START CODE HERE ### (approx. 2 lines of code)
W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
### END CODE HERE ###
parameters = {"W1": W1,
"W2": W2}
return parameters
def forward_propagation(X, parameters):
"""
Implements the forward propagation for the model:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "W2"
the shapes are given in initialize_parameters
Returns:
Z3 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
W2 = parameters['W2']
### START CODE HERE ###
# CONV2D: stride of 1, padding 'SAME'
Z1 = tf.nn.conv2d(X,W1, strides = [1,1,1,1], padding = 'SAME')
# RELU
A1 = tf.nn.relu(Z1)
# MAXPOOL: window 8x8, sride 8, padding 'SAME'
P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')
# CONV2D: filters W2, stride 1, padding 'SAME'
Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')
# RELU
A2 = tf.nn.relu(Z2)
# MAXPOOL: window 4x4, stride 4, padding 'SAME'
P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1], strides = [1,4,4,1], padding = 'SAME')
# FLATTEN
P2 = tf.contrib.layers.flatten(P2)
# FULLY-CONNECTED without non-linear activation function (not not call softmax).
# 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"
Z3 = tf.contrib.layers.fully_connected(P2, 6,activation_fn=None)
### END CODE HERE ###
return Z3
with tf.Session() as sess:
np.random.seed(1)
X, Y = create_placeholders(64, 64, 3, 6)
parameters = initialize_parameters()
Z3 = forward_propagation(X, parameters)
init = tf.global_variables_initializer()
sess.run(init)
a = sess.run(Z3, {X: np.random.randn(1,64,64,3), Y: np.random.randn(1,6)})
print("Z3 = " + str(a))
How is this input image of size 64*64*3 is processed by 8 filter of each size 4*4*3?
stride = 1, padding = same and batch_size = 1.
What I have understood till now is each neuron in first conv layer will have 8 filters and each of them having size 4*4*3. Each neuron in first convolution layer will take portion of the input image which is same as filter size (which is here 4*4*3) and apply the convolution operation and produces eight 64*64 features mapping.
If my understanding is correct then:
1> Why we need striding operation since kernel size and portion input image proceed by each neuron is same, If we apply stride = 1(or 2) then boundary of portion of input image is cross which is something we don't need right ?
2> How do we know which portion of input image (same as kernel size) is mapped which neuron in first conv layer?
If not then:
3> How input image is passed on neurons in first convolution layer, Is is complete input image is passed on to each neuron (Like in fully connected ANN, where all the input features are mapped to each neuron in first hidden layer)?
Or portion of input image ? How do we know which portion of input image is mapped which neuron in first conv layer?
4> Number of kernel specified above example (W1= [4, 4, 3, 8]) is per neuron or total number of kernel in fist conv layer ?
5> how do we know how may neurons used by above example in first convolution layer.
6> Is there any relationship between number of neurons and number of kernel first conv layer.
I found relevant answers to my questions and posting same here.
First of all concept of neuron is exist in conv layer as well but it's indirectly. Basically each neuron in conv layer deals with portion of input image which is same as the size of the kernel used in that conv layer.
Each neuron will focus on only particular portion of input image (Where in fully-connected ANN each neuron focus on whole image) and each neuron use n number of filters/kernels to get more insight of particular portion of image.
These n filters/kernels shared by all the neurons in given conv layer. Because of these weight(kernel/filter) sharing nature conv layer will have less number of parameter to learn. Where as in fully connected ANN network each neuron as it's own weight matrix and hence number of parameter to learn is more.
Now the number of neurons in given conv layer 'L' is depends on input_size (output of previous layer L-1), Kernel_size used in layer L , Padding used in layer L and Stride used in layer L.
Now let answer each of the questions specified above.
1> How do we know which portion of input image (same as kernel size) is mapped which neuron in first conv layer?
From above code example for conv layer 1:
Batch size = 1
Input image size = 64*64*3
Kernel size = 4*4*3 ==> Taken from W1
Number of kernel = 8 ==> Taken from W1
Padding = same
stride = 1
Stride = 1 means that you are sliding the kernel one pixel at a time. Let's consider x axis and number pixels 1, 2, 3 4 ... and 64.
The first neuron will see pixels 1 2,3 and 4, then the kernel is shifted by one pixel and the next neuron will see pixels 2 3, 4 and 5 and last neuron will see pixels 61, 62, 63 and 64 This happens if you use valid padding.
In case of same padding, first neuron will see pixels 0, 1, 2, and 3, the second neuron will see pixels 1, 2, 3 and 4, the last neuron will see pixels 62,63, 64 and (one zero padded).
In case the same padding case, you end up with the output of the same size as the image (64 x 64 x 8). In the case of valid padding, the output is (61 x 61 x 8).
Where 8 in output represent the number of filters.
2> How input image is passed on neurons in first convolution layer, Is is complete input image is passed on to each neuron (Like in fully connected ANN, where all the input features are mapped to each neuron in first hidden layer)?
Neurons looks for only portion of input image, Please refer the first question answer you will be able map between input image and neuron.
3> Number of kernel specified above example (W1= [4, 4, 3, 8]) is per neuron or total number of kernel in fist conv layer ?
It's total number of kernels for that layer and all the neuron i that layer will share same kernel for learning different portion of input image. Hence in convnet number of parameter to be learn is less compare to fully-connected ANN.
4> How do we know how may neurons used by above example in first convolution layer ?
It depends on input_size (output of previous layer L-1), Kernel_size used in layer L , Padding used in layer L and Stride used in layer L. Please refer first question answer above for more clarification.
5> Is there any relationship between number of neurons and number of kernel first conv layer
There is no relationship with respect numbers, But each neuron uses n number of filters/kernel (these kernel are shared among all the neurons in particular layer)to learn more about particular portion of input image.
Below sample code will help us clarify the internal implementation of convolution operation.
def conv_forward(A_prev, W, b, hparameters):
"""
Implements the forward propagation for a convolution function
Arguments:
A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
b -- Biases, numpy array of shape (1, 1, 1, n_C)
hparameters -- python dictionary containing "stride" and "pad"
Returns:
Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward() function
"""
# Retrieve dimensions from A_prev's shape (≈1 line)
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# Retrieve dimensions from W's shape (≈1 line)
(f, f, n_C_prev, n_C) = W.shape
# Retrieve information from "hparameters" (≈2 lines)
stride = hparameters['stride']
pad = hparameters['pad']
# Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
n_H = int(np.floor((n_H_prev-f+2*pad)/stride)) + 1
n_W = int(np.floor((n_W_prev-f+2*pad)/stride)) + 1
# Initialize the output volume Z with zeros. (≈1 line)
Z = np.zeros((m,n_H,n_W,n_C))
# Create A_prev_pad by padding A_prev
A_prev_pad = zero_pad(A_prev,pad)
for i in range(m): # loop over the batch of training examples
a_prev_pad = A_prev_pad[i] # Select ith training example's padded activation
for h in range(n_H): # loop over vertical axis of the output volume
for w in range(n_W): # loop over horizontal axis of the output volume
for c in range(n_C): # loop over channels (= #filters) of the output volume
# Find the corners of the current "slice" (≈4 lines)
vert_start = h*stride
vert_end = vert_start+f
horiz_start = w*stride
horiz_end = horiz_start+f
# Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
a_slice_prev = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]
# Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
Z[i, h, w, c] = conv_single_step(a_slice_prev,W[:,:,:,c],b[:,:,:,c])
return Z
A_prev = np.random.randn(1,64,64,3)
W = np.random.randn(4,4,3,8)
#Don't worry about bias , tensorflow will take care of this.
b = np.random.randn(1,1,1,8)
hparameters = {"pad" : 1,
"stride": 1}
Z = conv_forward(A_prev, W, b, hparameters)

Hidden state tensors have a different order than the returned tensors

As part of GRU training, I want to retrieve the hidden state tensors.
I have defined a GRU with two layers:
self.lstm = nn.GRU(params.vid_embedding_dim, params.hidden_dim , 2)
The forward function is defined as follows (the following is just a part of the implementation):
def forward(self, s, order, batch_size, where, anchor_is_phrase = False):
"""
Forward prop.
"""
# s is of shape [128 , 1 , 300] , 128 is batch size
output, (a,b) = self.lstm(s.cuda())
output.data.contiguous()
And out is of shape: [128 , 400] (128 is the number of samples which each one is embedded in 400 dimensional vector).
I understand that out is the output of the last hidden state and thus I expect it to be equal to b. However, after I checked the values I saw that it's indeed equal but b contains the tensor in a different order, that is for example output[0] is b[49]. Am I missing something here ?
Thanks.
I understand your confusion. Have a look the example bellow and the comments:
# [Batch size, Sequence length, Embedding size]
inputs = torch.rand(128, 5, 300)
gru = nn.GRU(input_size=300, hidden_size=400, num_layers=2, batch_first=True)
with torch.no_grad():
# output is all hidden states, for each element in the batch of the last layer in the RNN
# a is the last hidden state of the first layer
# b is the last hidden state of the second (last) layer
output, (a, b) = gru(inputs)
If we print out the shapes, they will confirm our understanding:
print(output.shape) # torch.Size([128, 5, 400])
print(a.shape) # torch.Size([128, 400])
print(b.shape) # torch.Size([128, 400])
Also, we can test whether the last hidden state, for each element in the batch, of the last layer, obtained from output is equal to b:
np.testing.assert_almost_equal(b.numpy(), output[:,:-1,:].numpy())
Finally, we can create an RNN with 3 layers, and run the same tests:
gru = nn.GRU(input_size=300, hidden_size=400, num_layers=3, batch_first=True)
with torch.no_grad():
output, (a, b, c) = gru(inputs)
np.testing.assert_almost_equal(c.numpy(), output[:,-1,:].numpy())
Again, the assertion passes but only if we do it for c, which is now the last layer of the RNN. Otherwise:
np.testing.assert_almost_equal(b.numpy(), output[:,-1,:].numpy())
Raises an error:
AssertionError: Arrays are not almost equal to 7 decimals
I hope that this makes things clear for you.

How does a 1D multi-channel convolutional layer (Keras) train?

I am working with time series EEG data recorded from 10 individual locations on the body to classify future behavior in terms of increasing heart activity. I would like to better understand how my labeled data corresponds to the training inputs.
So far, several RNN configurations as well as countless combinations of vanilla dense networks have not gotten me great results and I'd figure a 1D convnet is worth a try.
The things I'm having trouble understanding are:
1.) Feeding data into the model.
orig shape = (30000 timesteps, 10 channels)
array fed to layer = (300 slices, 100 timesteps, 10 channels)
Are the slices separated by 1 time step, giving me 300 slices of timesteps at either end of the original array, or are they separated end to end? If the second is true, how could I create an array of (30000 - 100) slices separated by one ts and is also compatible with the 1D CNN layer?
2) Matching labels with the training and testing data
My understanding is that when you feed in a sequence of train_x_shape = (30000, 10), there are 30000 labels with train_y_shape = (30000, 2) (2 classes) associated with the train_x data.
So, when (300 slices of) 100 timesteps of train_x data with shape = (300, 100, 10) are fed into the model, does the label value correspond to the entire 100 ts (one label per 100 ts, with this label being equal to the last time step's label), or are each 100 rows/vectors in the slice labeled- one for each ts?
Train input:
train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
n_timesteps = 100
n_channels = 10
layer : model.add(Convolution1D(filters = n_channels * 2, padding = 'same', kernel_size = 3, input_shape = (n_timesteps, n_channels)))
final layer : model.add(Dense(2, activation = 'softmax'))
I use categorical_crossentropy for loss.
Answer 1
This will really depend on "how did you get those slices"?
The answer is totally dependent on what "you're doing". So, what do you want?
If you have simply reshaped (array.reshape(...)) the original array from shape (30000,10) to shape (300,100,10), the model will see:
300 individual (and not connected) sequences
100 timesteps in each sequence
Sequence 1 goes from step 0 to 299;
Sequence 2 goes from step 300 to 599 and so on.
Creating overlapping slices - Sliding window
If you want to create sequences shifted by only one timestep, make a loop for that.
import numpy as np
originalSequence = someArrayWithShape((30000,10))
newSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newSlices.append(originalSequence[start:end])
start+=1
end+=1
newSlices = np.asarray(newSlices)
Beware: if you do this in the input data, you will have to do a similar thing in your output data as well.
Answer2
Again, that's totally up to you. What do you want to achieve?
Convolutional layers will keep the timesteps with these options:
If you use padding='same', the final length will be the same as the input
If you don't, the final length will be reduced depending on the kernel size you choose
Recurrent layers will keep the timesteps or not depending on:
Whether you use return_sequences=True - Output has timesteps
Or you use return_sequences=False - Output has no timesteps
If you want only one output for each sequence (not per timestep):
Recurrent models:
Use LSTM(...., return_sequences=True) until the last LSTM
The last LSTM will be LSTM(..., return_sequences=False)
Convolutional models:
At some point after the convolutions, choose one of these to add:
GlobalMaxPooling1D
GlobalAveragePooling1D
Flatten (but treat the number of channels later with a Dense(2)
Reshape((2,))
I think I'd go with GlobalMaxPooling2D if using convoltions, but recurrent models seem better for this. (Not a rule, though).
You can choose to use intermediate MaxPooling1D layers to gradually reduce the length from 100 to 50, then to 25 and so on. This will probably reach a better output.
Remember to keep X and Y paired:
import numpy as np
train_x = someArrayWithShape((30000,10))
train_y = someArrayWithShape((30000,2))
newXSlices = [] #empty list
newYSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newXSlices.append(train_x[start:end])
newYSlices.append(train_y[end-1:end])
start+=1
end+=1
newXSlices = np.asarray(newXSlices)
newYSlices = np.asarray(newYSlices)