How to specify input layer with Keras - tensorflow

I came across this code for tuning the topology of the neural network. However I am unsure of how I can instantiate the first layer without flatening the input.
My input is like this:
With M features (the rows) and N samples (the columns).
How can I create the first (input) layer?
# Initialize sequential API and start building model.
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
# Tune the number of hidden layers and units in each.
# Number of hidden layers: 1 - 5
# Number of Units: 32 - 512 with stepsize of 32
for i in range(1, hp.Int("num_layers", 2, 6)):
model.add(
keras.layers.Dense(
units=hp.Int("units_" + str(i), min_value=32, max_value=512, step=32),
activation="relu")
)
# Tune dropout layer with values from 0 - 0.3 with stepsize of 0.1.
model.add(keras.layers.Dropout(hp.Float("dropout_" + str(i), 0, 0.3, step=0.1)))
# Add output layer.
model.add(keras.layers.Dense(units=10, activation="softmax"))
I know that Keras usually instantiates the first hidden layer along with the input layer, but I don't see how I can do it in this framework. Below is the code for instantiating input + first hidden layer at once.
model.add(Dense(100, input_shape=(CpG_num,), kernel_initializer='normal', activation='relu')

If you have multiple inputs and want to set your input shape, let's suppose you have a dataframe with m-> rows, n-> columns... then simply do this...
m = no_of_rows #1000
n = no_of_columns #10
no_of_layers = 64
#we will not write m because m will be taken as a batch here.
_input = tf.keras.layers.Input(shape=(n))
dense = tf.keras.layers.Dense(no_of_layers)(_input)
output = tf.keras.backend.function(_input , dense)
#Now, I can see that it is working or not...!
x = np.random.randn(1000 , 10)
print(output(x).shape)

Related

Dependent hyperparameters with keras tuner

My goal is to tune over possible network architectures that meet the following criteria:
Layer 1 can have any number of hidden units from this list: [32, 64, 128, 256, 512]
Then, the number of hidden units to be explored for the rest of the layers should always depend on the particular selection that was made in the layer above it, specifically:
Layer 2 can have the same or half as many units as layer 1.
Layer 3 can have the same or half as many units as layer 2.
Layer 4 can have the same or half as many units as layer 3.
As I am currently implementing it, the hp.Choice options for layers 2, 3 and 4 are never updating once they have been established for the first time.
For example, pretend on the first pass of the tuner num_layers = 4 which means all four layers will get created. If, for example, layer 1 selects 256 hidden units, the options become:
Layer 2 --> [128, 256]
Layer 3 --> [64, 128]
Layer 4 --> [32, 64]
Layers 2, 3 and 4 stay stuck with these choices for every iteration that follows, rather than updating to adapt to future selections for layer 1.
This means in future iterations when the number of hidden units in layer 1 changes, the options for layers 2, 3 and 4 no longer meet the intended goal of exploring options where each subsequent layer can either contain the same or half as many hidden units as the previous layer.
def build_and_tune_model(hp, train_ds, normalize_features, ohe_features, max_tokens, passthrough_features):
all_inputs, encoded_features = get_all_preprocessing_layers(train_ds,
normalize_features=normalize_features,
ohe_features=ohe_features,
max_tokens=max_tokens,
passthrough=passthrough_features)
# Possible values for the number of hidden units in layer 1.
# Defining here because we will always have at least 1 layer.
layer_1_hidden_units = hp.Choice('layer1_hidden_units', values=[32, 64, 128, 256, 512])
# Possible number of layers to include
num_layers = hp.Choice('num_layers', values=[1, 2, 3, 4])
print("================= starting new round =====================")
print(f"Layer 1 hidden units = {hp.get('layer1_hidden_units')}")
print(f"Num layers is {hp.get('num_layers')}")
all_features = layers.concatenate(encoded_features)
x = layers.Dense(layer_1_hidden_units,
activation="relu")(all_features)
if hp.get('num_layers') >= 2:
with hp.conditional_scope("num_layers", [2, 3, 4]):
# Layer 2 hidden units can either be half the layer 1 hidden units or the same.
layer_2_hidden_units = hp.Choice('layer2_hidden_units', values=[(int(hp.get('layer1_hidden_units') / 2)),
hp.get('layer1_hidden_units')])
print("\n==========================================================")
print(f"In layer 2")
print(f"num_layers param = {hp.get('num_layers')}")
print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
print("==============================================================\n")
x = layers.Dense(layer_2_hidden_units,
activation="relu")(x)
if hp.get('num_layers') >= 3:
with hp.conditional_scope("num_layers", [3, 4]):
# Layer 3 hidden units can either be half the layer 2 hidden units or the same.
layer_3_hidden_units = hp.Choice('layer3_hidden_units', values=[(int(hp.get('layer2_hidden_units') / 2)),
hp.get('layer2_hidden_units')])
print("\n==========================================================")
print(f"In layer 3")
print(f"num_layers param = {hp.get('num_layers')}")
print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
print(f"layer_3_hidden_units = {hp.get('layer3_hidden_units')}")
print("==============================================================\n")
x = layers.Dense(layer_3_hidden_units,
activation="relu")(x)
if hp.get('num_layers') >= 4:
with hp.conditional_scope("num_layers", [4]):
# Layer 4 hidden units can either be half the layer 3 hidden units or the same.
# Extra stipulation applied here, layer 4 hidden units can never be less than 8.
layer_4_hidden_units = hp.Choice('layer4_hidden_units', values=[max(int(hp.get('layer3_hidden_units') / 2), 8),
hp.get('layer3_hidden_units')])
print("\n==========================================================")
print(f"In layer 4")
print(f"num_layers param = {hp.get('num_layers')}")
print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
print(f"layer_3_hidden_units = {hp.get('layer3_hidden_units')}")
print(f"layer_4_hidden_units = {hp.get('layer4_hidden_units')}")
print("==============================================================\n")
x = layers.Dense(layer_4_hidden_units,
activation="relu")(x)
output = layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(all_inputs, output)
model.compile(optimizer=tf.keras.optimizers.Adam(),
metrics = ['accuracy'],
loss='binary_crossentropy')
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>> End of round <<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
return model
Does anyone know the correct way to tell Keras Tuner to explore all possible options for each layers hidden units, where the area to explore satisfies the criteria that each layer after the first is allowed to have the same or half as many hidden units as the previous layer, and the first layer can have a number hidden units from the list [32, 64, 128, 256, 512]?
For this we first need to understand how the hyper parameters and their values are getting selected, before control reaches to our application, Keras tuner selects all the active hyper parameters from the hyper parameter space, an active hyper parameter means it’s associated condition is getting satisfied(note: by default hyper parameters don't have any condition assigned to them) and then Keras tuner will generate random value from a list of values associated to each active hyper-parameter, that means selection of hyper parameter and it’s value is already done before control reaches to our application, in our application it just pulls the already generated value, that's why you will always see the hyper parameters never updating once they have been established for the first time.
In your case, let's consider a scenario, let's say in first trial, it generates 256 as unit count for first layer then below code will create a hyper parameter 'layer2_hidden_units' for second layer with possible set of values as [128, 256]
layer_2_hidden_units = hp.Choice('layer2_hidden_units', values=[(int(hp.get('layer1_hidden_units') / 2)), hp.get('layer1_hidden_units')])
In the second trial, before reaching control to your application, it has already taken a value from the list [128, 256], let's say 128, so the value of hyper-parameter 'layer2_hidden_units' will be 128 and then at your application it just pulls the already generated value.
Solution to your query is to generate hyper parameter dynamically like below
hidden_units = hp.Choice('units_layer_' + str(layer_index), values=[(int(hp.get('layer1_hidden_units') / 2)), hp.get('layer1_hidden_units')])
# where
# hp.get('layer1_hidden_units') = 256 and layer_index = 2
# or hp.get('layer1_hidden_units') = 128 and layer_index = 1
# and so on...
Now let's take our already discussed scenario, where Keras tuner selected 256 as unit count for first layer in first trial, then for the same trial above code will allow Keras tuner to set hyper parameters for remaining layers as hidden_units_layer_2 = [128, 256], hidden_units_layer_1 = [64, 128], hidden_units_layer_0 = [32, 64]
But now we will face second challenge, it will always activate all hyper parameters in forthcoming trials although some of them will not be required, for example in second trial if the selected unit count for first layer is 64 then also it will activate the hidden_units_layer_2=[128, 256] and hidden_units_layer_1=[64, 128], that means now we need to disable them by adding them under condition scope as below
with hp.conditional_scope(parent_units_name, parent_units_value):
hidden_units = hp.Choice(child_units_name, values=child_units_value)
The final code will look as below
# List possible units
possible_units = [32, 64, 128, 256, 512]
possible_layer_units = []
for index, item in enumerate(possible_units[:-1]):
possible_layer_units.append([item, possible_units[index + 1]])
# possible_layer_units = [[32, 64], [64, 128], [128, 256], [256, 512]]
# where list index represent layer number
# and list element represent list of unit possibilities for each layer
first_layer_units = hp.Choice('first_layer_units', values=possible_units)
# Then add first layer
all_features = layers.concatenate(encoded_features)
x = layers.Dense(first_layer_units, activation="relu")(all_features)
# Get the number of hidden layers based on first layer unit count
hidden_layer_count = possible_units.index(first_layer_units)
if 0 < hidden_layer_count:
iter_count = 0
for hidden_layer_index in range(hidden_layer_count - 1, -1, -1):
if iter_count == 0:
# Collect HP 'units' details for the second layer
# Suppose first_layer_units = 512, then
# HP example: <units_layer_43=[256, 512] condition={first_layer_units:[256, 512]}>
# where for units_layer_43, 4 indicates there will be total 5 layers and 3 indicates 4th layer from last
# we are using total hidden layer count in HP name to avoid an issue while getting the unit count value.
parent_units_name = 'first_layer_units'
parent_units_value = possible_layer_units[hidden_layer_index]
child_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index)
child_units_value = parent_units_value
else:
# Collect HP 'units' details for the next layers
# Suppose units_layer_43 = 256, then
# HP example: <units_layer_42=[128, 256] condition={units_layer_43:[256, 512]}>
parent_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index + 1)
parent_units_value = possible_layer_units[hidden_layer_index + 1]
child_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index)
child_units_value = possible_layer_units[hidden_layer_index]
# Add and Activate child HP under parent HP using conditional scope
with hp.conditional_scope(parent_units_name, parent_units_value):
hidden_units = hp.Choice(child_units_name, values=child_units_value)
# Add remaining NN layers one by one
x = layers.Dense(hidden_units, activation="relu")(x)
iter_count += 1
So this way only those hyper-parameters will get activated for which the associated condition gets satisfied, hence in our case if in second trial and for first layer the selected unit count is 64 then hyper-parameters 'units_layer_2' and 'units_layer_1' will be disabled because of the conditional scope and only hyper parameter 'units_layer_0' will be kept as active.

How to map input image with neurons in first conv layer in CNN?

I just completed ANN course and started learning CNN. I have basic understanding of padding and stride operation works in CNN.
But have difficultly in mapping input image with neurons in first conv layer but i have basic
understanding of how input features are mapped to first hidden layer in ANN.
What is best way of understanding mapping between input image with neurons in first conv layer?
How can I clarify my doubts about the below code example? Code is taken from DL course in Coursera.
def initialize_parameters():
"""
Initializes weight parameters to build a neural network with tensorflow. The shapes are:
W1 : [4, 4, 3, 8]
W2 : [2, 2, 8, 16]
Returns:
parameters -- a dictionary of tensors containing W1, W2
"""
tf.set_random_seed(1) # so that your "random" numbers match ours
### START CODE HERE ### (approx. 2 lines of code)
W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
### END CODE HERE ###
parameters = {"W1": W1,
"W2": W2}
return parameters
def forward_propagation(X, parameters):
"""
Implements the forward propagation for the model:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "W2"
the shapes are given in initialize_parameters
Returns:
Z3 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
W2 = parameters['W2']
### START CODE HERE ###
# CONV2D: stride of 1, padding 'SAME'
Z1 = tf.nn.conv2d(X,W1, strides = [1,1,1,1], padding = 'SAME')
# RELU
A1 = tf.nn.relu(Z1)
# MAXPOOL: window 8x8, sride 8, padding 'SAME'
P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')
# CONV2D: filters W2, stride 1, padding 'SAME'
Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')
# RELU
A2 = tf.nn.relu(Z2)
# MAXPOOL: window 4x4, stride 4, padding 'SAME'
P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1], strides = [1,4,4,1], padding = 'SAME')
# FLATTEN
P2 = tf.contrib.layers.flatten(P2)
# FULLY-CONNECTED without non-linear activation function (not not call softmax).
# 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"
Z3 = tf.contrib.layers.fully_connected(P2, 6,activation_fn=None)
### END CODE HERE ###
return Z3
with tf.Session() as sess:
np.random.seed(1)
X, Y = create_placeholders(64, 64, 3, 6)
parameters = initialize_parameters()
Z3 = forward_propagation(X, parameters)
init = tf.global_variables_initializer()
sess.run(init)
a = sess.run(Z3, {X: np.random.randn(1,64,64,3), Y: np.random.randn(1,6)})
print("Z3 = " + str(a))
How is this input image of size 64*64*3 is processed by 8 filter of each size 4*4*3?
stride = 1, padding = same and batch_size = 1.
What I have understood till now is each neuron in first conv layer will have 8 filters and each of them having size 4*4*3. Each neuron in first convolution layer will take portion of the input image which is same as filter size (which is here 4*4*3) and apply the convolution operation and produces eight 64*64 features mapping.
If my understanding is correct then:
1> Why we need striding operation since kernel size and portion input image proceed by each neuron is same, If we apply stride = 1(or 2) then boundary of portion of input image is cross which is something we don't need right ?
2> How do we know which portion of input image (same as kernel size) is mapped which neuron in first conv layer?
If not then:
3> How input image is passed on neurons in first convolution layer, Is is complete input image is passed on to each neuron (Like in fully connected ANN, where all the input features are mapped to each neuron in first hidden layer)?
Or portion of input image ? How do we know which portion of input image is mapped which neuron in first conv layer?
4> Number of kernel specified above example (W1= [4, 4, 3, 8]) is per neuron or total number of kernel in fist conv layer ?
5> how do we know how may neurons used by above example in first convolution layer.
6> Is there any relationship between number of neurons and number of kernel first conv layer.
I found relevant answers to my questions and posting same here.
First of all concept of neuron is exist in conv layer as well but it's indirectly. Basically each neuron in conv layer deals with portion of input image which is same as the size of the kernel used in that conv layer.
Each neuron will focus on only particular portion of input image (Where in fully-connected ANN each neuron focus on whole image) and each neuron use n number of filters/kernels to get more insight of particular portion of image.
These n filters/kernels shared by all the neurons in given conv layer. Because of these weight(kernel/filter) sharing nature conv layer will have less number of parameter to learn. Where as in fully connected ANN network each neuron as it's own weight matrix and hence number of parameter to learn is more.
Now the number of neurons in given conv layer 'L' is depends on input_size (output of previous layer L-1), Kernel_size used in layer L , Padding used in layer L and Stride used in layer L.
Now let answer each of the questions specified above.
1> How do we know which portion of input image (same as kernel size) is mapped which neuron in first conv layer?
From above code example for conv layer 1:
Batch size = 1
Input image size = 64*64*3
Kernel size = 4*4*3 ==> Taken from W1
Number of kernel = 8 ==> Taken from W1
Padding = same
stride = 1
Stride = 1 means that you are sliding the kernel one pixel at a time. Let's consider x axis and number pixels 1, 2, 3 4 ... and 64.
The first neuron will see pixels 1 2,3 and 4, then the kernel is shifted by one pixel and the next neuron will see pixels 2 3, 4 and 5 and last neuron will see pixels 61, 62, 63 and 64 This happens if you use valid padding.
In case of same padding, first neuron will see pixels 0, 1, 2, and 3, the second neuron will see pixels 1, 2, 3 and 4, the last neuron will see pixels 62,63, 64 and (one zero padded).
In case the same padding case, you end up with the output of the same size as the image (64 x 64 x 8). In the case of valid padding, the output is (61 x 61 x 8).
Where 8 in output represent the number of filters.
2> How input image is passed on neurons in first convolution layer, Is is complete input image is passed on to each neuron (Like in fully connected ANN, where all the input features are mapped to each neuron in first hidden layer)?
Neurons looks for only portion of input image, Please refer the first question answer you will be able map between input image and neuron.
3> Number of kernel specified above example (W1= [4, 4, 3, 8]) is per neuron or total number of kernel in fist conv layer ?
It's total number of kernels for that layer and all the neuron i that layer will share same kernel for learning different portion of input image. Hence in convnet number of parameter to be learn is less compare to fully-connected ANN.
4> How do we know how may neurons used by above example in first convolution layer ?
It depends on input_size (output of previous layer L-1), Kernel_size used in layer L , Padding used in layer L and Stride used in layer L. Please refer first question answer above for more clarification.
5> Is there any relationship between number of neurons and number of kernel first conv layer
There is no relationship with respect numbers, But each neuron uses n number of filters/kernel (these kernel are shared among all the neurons in particular layer)to learn more about particular portion of input image.
Below sample code will help us clarify the internal implementation of convolution operation.
def conv_forward(A_prev, W, b, hparameters):
"""
Implements the forward propagation for a convolution function
Arguments:
A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
b -- Biases, numpy array of shape (1, 1, 1, n_C)
hparameters -- python dictionary containing "stride" and "pad"
Returns:
Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward() function
"""
# Retrieve dimensions from A_prev's shape (≈1 line)
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# Retrieve dimensions from W's shape (≈1 line)
(f, f, n_C_prev, n_C) = W.shape
# Retrieve information from "hparameters" (≈2 lines)
stride = hparameters['stride']
pad = hparameters['pad']
# Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
n_H = int(np.floor((n_H_prev-f+2*pad)/stride)) + 1
n_W = int(np.floor((n_W_prev-f+2*pad)/stride)) + 1
# Initialize the output volume Z with zeros. (≈1 line)
Z = np.zeros((m,n_H,n_W,n_C))
# Create A_prev_pad by padding A_prev
A_prev_pad = zero_pad(A_prev,pad)
for i in range(m): # loop over the batch of training examples
a_prev_pad = A_prev_pad[i] # Select ith training example's padded activation
for h in range(n_H): # loop over vertical axis of the output volume
for w in range(n_W): # loop over horizontal axis of the output volume
for c in range(n_C): # loop over channels (= #filters) of the output volume
# Find the corners of the current "slice" (≈4 lines)
vert_start = h*stride
vert_end = vert_start+f
horiz_start = w*stride
horiz_end = horiz_start+f
# Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
a_slice_prev = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]
# Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
Z[i, h, w, c] = conv_single_step(a_slice_prev,W[:,:,:,c],b[:,:,:,c])
return Z
A_prev = np.random.randn(1,64,64,3)
W = np.random.randn(4,4,3,8)
#Don't worry about bias , tensorflow will take care of this.
b = np.random.randn(1,1,1,8)
hparameters = {"pad" : 1,
"stride": 1}
Z = conv_forward(A_prev, W, b, hparameters)

calculating the number of parameters of a GRU layer (Keras)

Why the number of parameters of the GRU layer is 9600?
Shouldn't it be ((16+32)*32 + 32) * 3 * 2 = 9,408 ?
or, rearranging,
32*(16 + 32 + 1)*3*2 = 9408
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=4500, output_dim=16, input_length=200),
tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
tf.keras.layers.Dense(6, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()
The key is that tensorflow will separate biases for input and recurrent kernels when the parameter reset_after=True in GRUCell. You can look at some of the source code in GRUCell as follow:
if self.use_bias:
if not self.reset_after:
bias_shape = (3 * self.units,)
else:
# separate biases for input and recurrent kernels
# Note: the shape is intentionally different from CuDNNGRU biases
# `(2 * 3 * self.units,)`, so that we can distinguish the classes
# when loading and converting saved weights.
bias_shape = (2, 3 * self.units)
Taking the reset gate as an example, we generally see the following formulas.
But if we set reset_after=True, the actual formula is as follows:
As you can see, the default parameter of GRU is reset_after=True in tensorflow2. But the default parameter of GRU is reset_after=False in tensorflow1.x.
So the number of parameters of a GRU layer should be ((16+32)*32 + 32 + 32) * 3 * 2 = 9600 in tensorflow2.
I figured out a little bit more about this, as an addition to the accepted answer. What Keras does in GRUCell.call() is:
With reset_after=False (default in TensorFlow 1):
With reset_after=True (default in TensorFlow 2):
After training with reset_after=False, b_xh equals b_hz, b_xr equals b_hrand b_xh equals b_hh, because (I assume) TensorFlow realizes that each of these pairs of vectors can be combined into one single parameter vector - just like the OP pointed out in a comment above. However, with reset_after=True, that's not the case for b_xh and b_hh - they can and will be different, so they can not be combined into one vector, and that's why the total parameter count is higher.

How does a 1D multi-channel convolutional layer (Keras) train?

I am working with time series EEG data recorded from 10 individual locations on the body to classify future behavior in terms of increasing heart activity. I would like to better understand how my labeled data corresponds to the training inputs.
So far, several RNN configurations as well as countless combinations of vanilla dense networks have not gotten me great results and I'd figure a 1D convnet is worth a try.
The things I'm having trouble understanding are:
1.) Feeding data into the model.
orig shape = (30000 timesteps, 10 channels)
array fed to layer = (300 slices, 100 timesteps, 10 channels)
Are the slices separated by 1 time step, giving me 300 slices of timesteps at either end of the original array, or are they separated end to end? If the second is true, how could I create an array of (30000 - 100) slices separated by one ts and is also compatible with the 1D CNN layer?
2) Matching labels with the training and testing data
My understanding is that when you feed in a sequence of train_x_shape = (30000, 10), there are 30000 labels with train_y_shape = (30000, 2) (2 classes) associated with the train_x data.
So, when (300 slices of) 100 timesteps of train_x data with shape = (300, 100, 10) are fed into the model, does the label value correspond to the entire 100 ts (one label per 100 ts, with this label being equal to the last time step's label), or are each 100 rows/vectors in the slice labeled- one for each ts?
Train input:
train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
n_timesteps = 100
n_channels = 10
layer : model.add(Convolution1D(filters = n_channels * 2, padding = 'same', kernel_size = 3, input_shape = (n_timesteps, n_channels)))
final layer : model.add(Dense(2, activation = 'softmax'))
I use categorical_crossentropy for loss.
Answer 1
This will really depend on "how did you get those slices"?
The answer is totally dependent on what "you're doing". So, what do you want?
If you have simply reshaped (array.reshape(...)) the original array from shape (30000,10) to shape (300,100,10), the model will see:
300 individual (and not connected) sequences
100 timesteps in each sequence
Sequence 1 goes from step 0 to 299;
Sequence 2 goes from step 300 to 599 and so on.
Creating overlapping slices - Sliding window
If you want to create sequences shifted by only one timestep, make a loop for that.
import numpy as np
originalSequence = someArrayWithShape((30000,10))
newSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newSlices.append(originalSequence[start:end])
start+=1
end+=1
newSlices = np.asarray(newSlices)
Beware: if you do this in the input data, you will have to do a similar thing in your output data as well.
Answer2
Again, that's totally up to you. What do you want to achieve?
Convolutional layers will keep the timesteps with these options:
If you use padding='same', the final length will be the same as the input
If you don't, the final length will be reduced depending on the kernel size you choose
Recurrent layers will keep the timesteps or not depending on:
Whether you use return_sequences=True - Output has timesteps
Or you use return_sequences=False - Output has no timesteps
If you want only one output for each sequence (not per timestep):
Recurrent models:
Use LSTM(...., return_sequences=True) until the last LSTM
The last LSTM will be LSTM(..., return_sequences=False)
Convolutional models:
At some point after the convolutions, choose one of these to add:
GlobalMaxPooling1D
GlobalAveragePooling1D
Flatten (but treat the number of channels later with a Dense(2)
Reshape((2,))
I think I'd go with GlobalMaxPooling2D if using convoltions, but recurrent models seem better for this. (Not a rule, though).
You can choose to use intermediate MaxPooling1D layers to gradually reduce the length from 100 to 50, then to 25 and so on. This will probably reach a better output.
Remember to keep X and Y paired:
import numpy as np
train_x = someArrayWithShape((30000,10))
train_y = someArrayWithShape((30000,2))
newXSlices = [] #empty list
newYSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newXSlices.append(train_x[start:end])
newYSlices.append(train_y[end-1:end])
start+=1
end+=1
newXSlices = np.asarray(newXSlices)
newYSlices = np.asarray(newYSlices)

How to calculate input_dim for a keras sequential model?

Keras Dense layer needs an input_dim or input_shape to be specified. What value do I put in there?
My input is a matrix of 1,000,000 rows and only 3 columns. My output is 1,600 classes.
What do I put there?
dimensionality of the inputs (1000000, 1600)
2 because it's a 2D matrix
input_dim is the number of dimensions of the features, in your case that is just 3. The equivalent notation for input_shape, which is an actual dimensional shape, is (3,)
In your case
lets assume x and y=target variable and are look like as follows after feature engineering
x.shape
(1000000, 3)
y.shape
((1000000, 1600)
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=x.shape[1])) # Input layer
# now the model will take as input arrays of shape (*, 3)
# and output arrays of shape (*, 32)
...
...
model.add(Dense(y.shape[1],activation='softmax')) # Output layer
y.shape[1]= 1600, the number of output which is the number of classes you have, since you are dealing with Classification.
X = dataset.iloc[:, 3:13]
meaning the X parameter having all the rows and 3rd column till 12th column inclusive and 13th column exclusive.
We will also have a X0 parameter to be given to the neural network, so total
input layers becomes 10+1 = 11.
Dense(input_dim = 11, activation = 'relu', kernel_initializer = 'he_uniform')