How does a 1D multi-channel convolutional layer (Keras) train? - tensorflow

I am working with time series EEG data recorded from 10 individual locations on the body to classify future behavior in terms of increasing heart activity. I would like to better understand how my labeled data corresponds to the training inputs.
So far, several RNN configurations as well as countless combinations of vanilla dense networks have not gotten me great results and I'd figure a 1D convnet is worth a try.
The things I'm having trouble understanding are:
1.) Feeding data into the model.
orig shape = (30000 timesteps, 10 channels)
array fed to layer = (300 slices, 100 timesteps, 10 channels)
Are the slices separated by 1 time step, giving me 300 slices of timesteps at either end of the original array, or are they separated end to end? If the second is true, how could I create an array of (30000 - 100) slices separated by one ts and is also compatible with the 1D CNN layer?
2) Matching labels with the training and testing data
My understanding is that when you feed in a sequence of train_x_shape = (30000, 10), there are 30000 labels with train_y_shape = (30000, 2) (2 classes) associated with the train_x data.
So, when (300 slices of) 100 timesteps of train_x data with shape = (300, 100, 10) are fed into the model, does the label value correspond to the entire 100 ts (one label per 100 ts, with this label being equal to the last time step's label), or are each 100 rows/vectors in the slice labeled- one for each ts?
Train input:
train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
n_timesteps = 100
n_channels = 10
layer : model.add(Convolution1D(filters = n_channels * 2, padding = 'same', kernel_size = 3, input_shape = (n_timesteps, n_channels)))
final layer : model.add(Dense(2, activation = 'softmax'))
I use categorical_crossentropy for loss.

Answer 1
This will really depend on "how did you get those slices"?
The answer is totally dependent on what "you're doing". So, what do you want?
If you have simply reshaped (array.reshape(...)) the original array from shape (30000,10) to shape (300,100,10), the model will see:
300 individual (and not connected) sequences
100 timesteps in each sequence
Sequence 1 goes from step 0 to 299;
Sequence 2 goes from step 300 to 599 and so on.
Creating overlapping slices - Sliding window
If you want to create sequences shifted by only one timestep, make a loop for that.
import numpy as np
originalSequence = someArrayWithShape((30000,10))
newSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newSlices.append(originalSequence[start:end])
start+=1
end+=1
newSlices = np.asarray(newSlices)
Beware: if you do this in the input data, you will have to do a similar thing in your output data as well.
Answer2
Again, that's totally up to you. What do you want to achieve?
Convolutional layers will keep the timesteps with these options:
If you use padding='same', the final length will be the same as the input
If you don't, the final length will be reduced depending on the kernel size you choose
Recurrent layers will keep the timesteps or not depending on:
Whether you use return_sequences=True - Output has timesteps
Or you use return_sequences=False - Output has no timesteps
If you want only one output for each sequence (not per timestep):
Recurrent models:
Use LSTM(...., return_sequences=True) until the last LSTM
The last LSTM will be LSTM(..., return_sequences=False)
Convolutional models:
At some point after the convolutions, choose one of these to add:
GlobalMaxPooling1D
GlobalAveragePooling1D
Flatten (but treat the number of channels later with a Dense(2)
Reshape((2,))
I think I'd go with GlobalMaxPooling2D if using convoltions, but recurrent models seem better for this. (Not a rule, though).
You can choose to use intermediate MaxPooling1D layers to gradually reduce the length from 100 to 50, then to 25 and so on. This will probably reach a better output.
Remember to keep X and Y paired:
import numpy as np
train_x = someArrayWithShape((30000,10))
train_y = someArrayWithShape((30000,2))
newXSlices = [] #empty list
newYSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newXSlices.append(train_x[start:end])
newYSlices.append(train_y[end-1:end])
start+=1
end+=1
newXSlices = np.asarray(newXSlices)
newYSlices = np.asarray(newYSlices)

Related

How to train network in tensorflow with different number of samples in input and output?

I am trying to implement the Diet network (https://arxiv.org/pdf/1611.09340.pdf) in TensorFlow. To give you a brief about the network, the Diet network tries to predict the weights of the input layer of the main network by using an auxiliary network, and I am trying to train both networks simultaneously. The input of the auxiliary network does not depend on the number of samples of the main network. Thus, both the input may have different numbers of samples. In my implementation, the number of samples in the auxiliary layer will always be 1.
class BinryMSECustom(keras.losses.Loss):
def __init__(self):
super().__init__()
def __call__(self, y_true, y_pred):
bn = binary_crossentropy(y_true[0], y_pred[0])
ms = mse(y_true[1], y_pred[1])
return bn + ms
input1 = keras.Input(shape=(n_features,), name='classification_input_NxNd')
input2 = keras.Input(shape=(n_features,), name='auxilary_input_NdxN')
#Common
emb = layers.Embedding(input_dim=n_features, output_dim=extracted_features_count,
embeddings_initializer=tf.keras.initializers.Constant(pca_data),
trainable=False, name='feature_Embedding_NdxNf')(input2)
#Aux network as encoder
aux_enc_mlp = layers.Dense(100, activation='tanh', name='aux1_mlp_NdxNh1')(emb)
#Aux network as decoder
aux_dec_mlp = layers.Dense(100, activation='tanh', name='aux2_mlp_NdxNh1')(emb)
#Classifier network
x = layers.Dot(axes=1)([input1, aux_enc_mlp]) #Insert aux encoder
x_branch = layers.Dense(hidden_layer_1, activation='relu', name='classification_mlp_1_Nh1xNh2')(x)
x = layers.Dense(hidden_layer_2, activation='relu', name='classification_mlp_2_Nh2x1')(x_branch)
clf_out = layers.Dense(1, activation='sigmoid', name='Output')(x)
decode_branch = layers.Dot(axes=-1)([aux_dec_mlp, x_branch])
model = keras.Model(inputs=[input1, input2], outputs=[clf_out, decode_branch])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=BinryMSECustom(),
metrics=[tf.keras.metrics.BinaryAccuracy()])
Training data :
np.random.seed(0)
n_samples = 200 #N
n_features = 10000 #Nd
data = np.random.random((n_samples, n_features))
labels = np.random.permutation([0]*50+[1]*50)
pca_data = skde.PCA(n_components=0.95).fit_transform(data.T)
extracted_features_count = pca_data.shape[1] #Nf
hidden_layer_1 = 100 #Nh1
hidden_layer_2 = 50 #Nh2
The training step looks like this:
emb_inp = np.array(list(range(0, n_features))).reshape(1, -1)
model.fit(x=(data, emb_inp), y=(labels, data), epochs=1)
When I call the above command for training, I get the below error:
ValueError: Data cardinality is ambiguous:
x sizes: 200, 1
y sizes: 200, 200
Make sure all arrays contain the same number of samples.
One possible solution could be to repeat the second input 200 times (for this example). But I am curious to know why TensorFlow has such a check and what could be the potential solution without changing the input?
#princethewinner, I could suggest two possible solutions for the problem. First, we'll have a look at the architecture of the your model (made with tf.keras.utils.plot_model),
Assuming that N is the number of samples, equal to 10000 in this case, the shapes of inputs and the outputs are,
# Shape of input tensors
(200, N )
(1, N )
# Shape of output tensors
(100,)
(200, N )
Clearly, the number of samples are not equal ( 200 vs 1 ). The second input, that has a shape of ( 1 , N ), meaning that this tensor would remain a constant input for all 200 samples of the first input, of shape ( 200 , N ). Moreover, the feature_Embedding_NdxNf layer is also set to trainable=False making its outputs constant for any given inputs.
If you observe, you'll realize that the output of the feature_Embedding_NdxNf is constant for all 200 samples because of constant ( 1 , N ) input and trainable=False nature.
As you've mentioned in the question, the appropriate solution to the the problem would be to repeat the constant vector ( 1 , N ) 200 times so that the shape becomes ( 200 , N ) hence resolving the ambiguity around the number of samples. This shouldn't affect the model, as a constant embedding output is provided to aux1_mlp_NdxNh1andaux2_mlp_NdxNh1`.
You can pre-compute the output of feature_Embedding_NdxNf as it remains constant for all 200 samples. By doing so, you will get emb_output = Embedding( ... )( input2 ) of shape ( 1 , N , 187 ). Repeat the tensor along axis 0, for 200 times, to get a tensor of shape ( 200 , N , 187 ) and feed this tensor as input to your model in model.fit( ... ). This solution inherits from the 1st solution, the difference being that we're eliminating layers from the model which would always deal with constant inputs.

How to specify input layer with Keras

I came across this code for tuning the topology of the neural network. However I am unsure of how I can instantiate the first layer without flatening the input.
My input is like this:
With M features (the rows) and N samples (the columns).
How can I create the first (input) layer?
# Initialize sequential API and start building model.
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
# Tune the number of hidden layers and units in each.
# Number of hidden layers: 1 - 5
# Number of Units: 32 - 512 with stepsize of 32
for i in range(1, hp.Int("num_layers", 2, 6)):
model.add(
keras.layers.Dense(
units=hp.Int("units_" + str(i), min_value=32, max_value=512, step=32),
activation="relu")
)
# Tune dropout layer with values from 0 - 0.3 with stepsize of 0.1.
model.add(keras.layers.Dropout(hp.Float("dropout_" + str(i), 0, 0.3, step=0.1)))
# Add output layer.
model.add(keras.layers.Dense(units=10, activation="softmax"))
I know that Keras usually instantiates the first hidden layer along with the input layer, but I don't see how I can do it in this framework. Below is the code for instantiating input + first hidden layer at once.
model.add(Dense(100, input_shape=(CpG_num,), kernel_initializer='normal', activation='relu')
If you have multiple inputs and want to set your input shape, let's suppose you have a dataframe with m-> rows, n-> columns... then simply do this...
m = no_of_rows #1000
n = no_of_columns #10
no_of_layers = 64
#we will not write m because m will be taken as a batch here.
_input = tf.keras.layers.Input(shape=(n))
dense = tf.keras.layers.Dense(no_of_layers)(_input)
output = tf.keras.backend.function(_input , dense)
#Now, I can see that it is working or not...!
x = np.random.randn(1000 , 10)
print(output(x).shape)

Two input layers for LSTM Neural Network?

I am now building a neural network, and I am facing the task of adding another input layer (since now I just needed one).
In particular, this was the code previously:
###...
if(self.net_embedding==0):
l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(l_input)
toBePassed=emb_input
elif(self.net_embedding==1):
self.getWord2VecEmbeddings(params['word2vec_size'])
X_train=self.encodePrefixes(params['word2vec_size'],X_train)
l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
toBePassed=l_input
l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(toBePassed)
l1 = BatchNormalization()(l1)
#and so on with the rest of the layers...
The input of the model (X_train) was just an array of arrays (with size = self.win_size) of integers (e.g. [[0 1 2 3] [1 2 3 4]...] if self.win_size = 4), where the integers represent categorical elements.
As you can see, I also have two types of embeddings for this input:
Embedding layer
Word2Vec encoding
Now, I need to add another input to the net, which is as well an array of arrays (with size = self.win_size again) of integers (eg. [[0 123 334 2212][123 334 2212 4888]...], but this time I don't need to apply any embedding (I think) because the elements here are not categorical (they represent elapsed time in seconds).
I tried by simply changing the net to:
#...
if(self.net_embedding==0):
l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(l_input)
toBePassed=emb_input
elif(self.net_embedding==1):
self.getWord2VecEmbeddings(params['word2vec_size'])
X_train=self.encodePrefixes(params['word2vec_size'],X_train)
l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
toBePassed=l_input
elapsed_time_input = Input(shape=self.win_size, name='input_time')
input_concat = Concatenate(axis=1)([toBePassed, elapsed_time_input])
l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(input_concat)
l1 = BatchNormalization()(l1)
#and so on with other layers...
but I get the error:
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 4, 12), (None, 4)]
Do you please have any solution for this? Any kind of help would be really appreciated, since I have a deadline in a few days and I'm smashing my head on this for so long now! Thanks :)
There are two problems with your approach.
First, inputs to LSTM should have a shape of (batch_size, num_steps, num_feats), yet your elapsed_time_input has shape (None, 4). You need to expand its dimension to get the proper shape (None, 4, 1).
elapsed_time_input = tf.keras.layers.Reshape((-1, 1))(elapsed_time_input)
or
elapsed_time_input = tf.expand_dims(elapsed_time_input, axis=-1)
With this, "elapsed time in seconds" will be seen as just another feature of a timestep.
Secondly, you'll want to concatenate the two inputs in the feature dimension (not the timestep dimension).
input_concat = Concatenate(axis=-1)([toBePassed, elapsed_time_input])
or
input_concat = Concatenate(axis=2)([toBePassed, elapsed_time_input])
After this, you'll get a keras tensor with a shape of (None, 4, 13). It represents a batch of time series, each having 4 timesteps and 13 features per step (12 original features + elapsed time in second for each step).

How to correctly ignore padded or missing timesteps at decoding time in multi-feature sequences with LSTM autonecoder

I am trying to learn a latent representation for text sequence (multiple features (3)) by doing reconstruction USING AUTOENCODER. As some of the sequences are shorter than the maximum pad length or a number of time steps I am considering (seq_length=15), I am not sure if reconstruction will learn to ignore the timesteps or not for calculating loss or accuracies.
I followed suggestions from this answer to crop the outputs but my losses are nan and several of accuracies as well.
input1 = keras.Input(shape=(seq_length,),name='input_1')
input2 = keras.Input(shape=(seq_length,),name='input_2')
input3 = keras.Input(shape=(seq_length,),name='input_3')
input1_emb = layers.Embedding(70,32,input_length=seq_length,mask_zero=True)(input1)
input2_emb = layers.Embedding(462,192,input_length=seq_length,mask_zero=True)(input2)
input3_emb = layers.Embedding(84,36,input_length=seq_length,mask_zero=True)(input3)
merged = layers.Concatenate()([input1_emb, input2_emb,input3_emb])
activ_func = 'tanh'
encoded = layers.LSTM(120,activation=activ_func,input_shape=(seq_length,),return_sequences=True)(merged) #
encoded = layers.LSTM(60,activation=activ_func,return_sequences=True)(encoded)
encoded = layers.LSTM(15,activation=activ_func)(encoded)
# Decoder reconstruct inputs
decoded1 = layers.RepeatVector(seq_length)(encoded)
decoded1 = layers.LSTM(60, activation= activ_func , return_sequences=True)(decoded1)
decoded1 = layers.LSTM(120, activation= activ_func , return_sequences=True,name='decoder1_last')(decoded1)
Decoder one has an output shape of (None, 15, 120).
input_copy_1 = layers.TimeDistributed(layers.Dense(70, activation='softmax'))(decoded1)
input_copy_2 = layers.TimeDistributed(layers.Dense(462, activation='softmax'))(decoded1)
input_copy_3 = layers.TimeDistributed(layers.Dense(84, activation='softmax'))(decoded1)
For each output, I am trying to crop the O padded timesteps as suggested by this answer. padding has 0 where actual input was missing (had zero due to padding) and 1 otherwise
#tf.function
def cropOutputs(x):
#x[0] is softmax of respective feature (time distributed) on top of decoder
#x[1] is the actual input feature
padding = tf.cast( tf.not_equal(x[1][1],0), dtype=tf.keras.backend.floatx())
print(padding)
return x[0]*tf.tile(tf.expand_dims(padding, axis=-1),tf.constant([1,x[0].shape[2]], tf.int32))
Applying crop function to all three outputs.
input_copy_1 = layers.Lambda(cropOutputs, name='input_copy_1', output_shape=(None, 15, 70))([input_copy_1,input1])
input_copy_2 = layers.Lambda(cropOutputs, name='input_copy_2', output_shape=(None, 15, 462))([input_copy_2,input2])
input_copy_3 = layers.Lambda(cropOutputs, name='input_copy_3', output_shape=(None, 15, 84))([input_copy_3,input3])
My logic is to crop timesteps of each feature (all 3 features for sequence have the same length, meaning they miss timesteps together). But for timestep, they have been applied softmax as per their feature size (70,462,84) so I have to zero out timestep by making a multi-dimensional mask array of zeros or ones equal to this feature size with help of mask padding, and multiply by respective softmax representation using this using multi-dimensional mask array.
I am not sure I am doing this right or not as I have Nan losses for these inputs as well as other accuracies have that I am learning jointly with this task (it happens only with this cropping thing).
If it helps someone, I end up cropping the padded entries from the loss directly (taking some keras code pointer from these answers).
#tf.function
def masked_cc_loss(y_true, y_pred):
mask = tf.keras.backend.all(tf.equal(y_true, masked_val_hotencoded), axis=-1)
mask = 1 - tf.cast(mask, tf.keras.backend.floatx())
loss = tf.keras.losses.CategoricalCrossentropy()(y_true, y_pred) * mask
return tf.keras.backend.sum(loss) / tf.keras.backend.sum(mask) # averaging by the number of unmasked entries

Variable batch size in tensorflow and CNN

I want to feed in a 1-D CNN a sequence of fixed length and want it to make a prediction (regression), but I want to have a variable batch size during training. The tutorials are not really helpful.
In my input layer I have something like this:
input = tf.placeholder(tf.float32, [None, sequence_length], name="input")
y = tf.placeholder(tf.float32, [None, 1], name="y")
so I assume the None dimension, can be the a variable batch size of any number, so the current input dimension is batch_size * sequence_length and I am supposed to feed the network a 2d np array with dimensions any * sequence_length
tf.nn.conv1d expects 3-D, since my input is a single channel that is 1 np array of sequence_length observations the input I will need to feed to the cnn should be 1*batch_size * sequence_length, if I had on the other hand 2 different sequences that I combine to predict a single value in the end it would have been 2*batch_size * sequence_length and I would also need to concatenate the 2 different channels. So in my case I need
input = tf.expand_dims(input, -1)
and then the filter also follow the same:
filter_size = 5
channel_size = 1
num_filters = 10
filter_shape = [filter_size, channel_size, num_filters]
filters = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="filters")
tf.nn.conv1d(value=input, filters=filters, stride=1)
After that I add a FC layer, but the network isn't able to learn anything, even the a basic function such as sin(x), does the code above look correct?
Also how can I do a maxpooling?