Convolutional layer without summation over different channels - Keras - tensorflow

Assuming I have 5x5x3 image and I have different filter for each channel - for example 3x3x3.
In Cov2D first, each of the kernels in the filter are applied to three channels in the input layer, separately (which gives 3x3x3 - without padding and stride 1) and the these three channels are summed together (element-wise addition), gives 3x3x1.
I want instead of summation over channels (3x3x1), concatenate the three channels (3x3x3).
Thanks for help.

What you are referring to is depthwise convolution where outputs' channels are concatenated rather than summed. (See for details)
x = np.random.rand(1,5,5,3)
l = tf.keras.layers.DepthwiseConv2D(3, depth_multiplier=1)
(1, 3, 3, 3)
You can use depth_multiplier to control the number of depthwise kernels applied to each channel.
l2 = tf.keras.layers.DepthwiseConv2D(3, depth_multiplier=2)
(1, 3, 3, 6)


Two input layers for LSTM Neural Network?

I am now building a neural network, and I am facing the task of adding another input layer (since now I just needed one).
In particular, this was the code previously:
l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(l_input)
l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(toBePassed)
l1 = BatchNormalization()(l1)
#and so on with the rest of the layers...
The input of the model (X_train) was just an array of arrays (with size = self.win_size) of integers (e.g. [[0 1 2 3] [1 2 3 4]...] if self.win_size = 4), where the integers represent categorical elements.
As you can see, I also have two types of embeddings for this input:
Embedding layer
Word2Vec encoding
Now, I need to add another input to the net, which is as well an array of arrays (with size = self.win_size again) of integers (eg. [[0 123 334 2212][123 334 2212 4888]...], but this time I don't need to apply any embedding (I think) because the elements here are not categorical (they represent elapsed time in seconds).
I tried by simply changing the net to:
l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(l_input)
l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
elapsed_time_input = Input(shape=self.win_size, name='input_time')
input_concat = Concatenate(axis=1)([toBePassed, elapsed_time_input])
l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(input_concat)
l1 = BatchNormalization()(l1)
#and so on with other layers...
but I get the error:
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 4, 12), (None, 4)]
Do you please have any solution for this? Any kind of help would be really appreciated, since I have a deadline in a few days and I'm smashing my head on this for so long now! Thanks :)
There are two problems with your approach.
First, inputs to LSTM should have a shape of (batch_size, num_steps, num_feats), yet your elapsed_time_input has shape (None, 4). You need to expand its dimension to get the proper shape (None, 4, 1).
elapsed_time_input = tf.keras.layers.Reshape((-1, 1))(elapsed_time_input)
elapsed_time_input = tf.expand_dims(elapsed_time_input, axis=-1)
With this, "elapsed time in seconds" will be seen as just another feature of a timestep.
Secondly, you'll want to concatenate the two inputs in the feature dimension (not the timestep dimension).
input_concat = Concatenate(axis=-1)([toBePassed, elapsed_time_input])
input_concat = Concatenate(axis=2)([toBePassed, elapsed_time_input])
After this, you'll get a keras tensor with a shape of (None, 4, 13). It represents a batch of time series, each having 4 timesteps and 13 features per step (12 original features + elapsed time in second for each step).

In pytorch, how can I sum some elements, and get a tensor of smaller shape?

Specifically I have a tensor of dimension 298x160x160 (faces in 298 frames), I need to sum every 4x4 element in last two dimesnion so that I can get a 298x40x40 tensor.
How can I achieve that?
You could create a Convolutional layer with a single 4x4 channel and set its weights to 1, with a stride of 4 (also see Conv2D doc):
a = torch.ones((298,160,160))
# add a dimension for the channels. Conv2D expects the input to be : (N,C,H,W)
# where N=number of samples, C=number of channels, H=height, W=width
a = a.unsqueeze(1)
Out: torch.Size([298, 1, 160, 160])
with torch.no_grad(): # I assume you don't need to backprop, otherwise remove this check
m = torch.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=4,stride=4,bias=False)
# set the kernel values to 1 = * 0. + 1.
# apply the kernel and squeeze the channel dim out again
res = m(a).squeeze()
Out: torch.Size([298, 40, 40])

Correlation 2D vector fields

Having multiple 2D flow maps, ie vector fields how would one find statistical correlation between pairs of these?
The problem:
One should not (?) resize 2 flow maps of shape (x,y,2): flow1, flow2 to 1D vectors and run
since x,y entries are connected.
Plotting yields, for visualization purposes only:
I am thinking about comparing magnitudes and direction.
How would one ideally compare those (cosinus-distance, ...)?
How would one compare covariance between vector fields?
I am aware that np.corrcoef(flow1.reshape(2,-1), flow2.reshape(2,-1)) would return a 4,4 correlation coefficient matrix but find it unintuitive to interpret.
For some measures of similarity it may indeed be desirable to take the spatial structure of the domain into account. But a coefficient of correlation does not do that: it is invariant under any permutations of the domain. For example, the correlation between (0, 1, 2, 3, 4) and (1, 2, 4, 8, 16) is the same as between (1, 4, 2, 0, 3) and (2, 16, 4, 1, 8) where both arrays were reshuffled in the same way.
So, the coefficient of correlation would be obtained by:
Centering both arrays, i.e., subtracting their mean. Say, we get FC1 and FC2.
Taking the inner product FC1 and FC2 : this is just the sum of the products of matching entries.
Dividing by the square roots of the inner products FC1*FC1 and FC2*FC2.
flow1 = np.random.uniform(size=(10, 10, 2)) # the 3rd dimension is for the components
flow2 = flow1 + np.random.uniform(size=(10, 10, 2))
flow1_centered = flow1 - np.mean(flow1, axis=(0, 1))
flow2_centered = flow2 - np.mean(flow2, axis=(0, 1))
inner_product = np.sum(flow1_centered*flow2_centered)
r = inner_product/np.sqrt(np.sum(flow1_centered**2) * np.sum(flow2_centered**2))
Here the flows have some positive correlation because I included flow2 in flow1. Specifically, it's a number around 1/sqrt(2), subject to random noise.
If this is not what you want, then you don't want the coefficient of correlation but some other measure of similarity.

How to average input samples by group in Keras?

I want to implement a neural network in Keras of this architecture: say if I have some inputs and they belong to some groups. Then the neural network is like this:
input -> some layers -> separate inputs by groups -> average inputs by groups -> output
In brief, I want to separate inputs by groups then take the average of inputs by groups.
For example, if I have some inputs tensor [1, 2, 3, 4, 5, 6] and they are belonging to two groups [0, 1, 1, 0, 0, 1]. Then I want to the output tensor is like this: [3.333, 3.666, 3.666, 3.333, 3.333, 3.666]. Here 3.333 is the average of group 0 [1, 4, 5] and 3.666 is the average of group 1 [2, 3, 6].
I am not sure if you can separate the inputs as you described above directly in Keras or Tensorflow. Here is what I could come up with:
Create a mask corresponding to each class where 1 is for the element at the index being in the class and 0 for any element of another class. So in your example, you would do [0,1,1,0,0,1] for one class and [1,0,0,1,1,0] for the other. ( if you have more classes, you will correspondingly have more masks )
Stack those vectors to get a 3-D tensor and do 1D convolution with 0 stride. Use tf.nn.conv1d(). Think of those masks as filters of a Convolution operation and it's separating the classes. Be sure to reshape your Tensors to match the operation requirements.
After the convolution, you will have a 3-D Tensor where each vector would contain a classes elements. For your example you should get a Tensor with two vectors as [0,2,3,0,0,6] and [1,0,0,4,5,0]. Use tf.reduce_mean() on the correct axis to get the average of each class.
Multiply the Tensor of the mean : [[3.333], [3.666]] with the masks using tf.multiply() and add the vectors using tf.reduce_sum() on the correct axis. And it should result in the vector you desire.
I have figured out a method. It can be archived by matrix manipulation. First turn the cluster vector to a categorical matrix, for example, if the batch size is 6, the categorical matrix (cluster) is like:
1, 0
1, 0
0, 1
0, 1
1, 0
0, 1
then we generate a cluster_mean matrix:
1/3, 0
1/3, 0
0, 1/3
0, 1/3
1/3, 0
0, 1/3
If we have an input matrix n*b (n is the number of features and b is the batch), then we can get average by cluster by using
cluster * t(cluster_mean) * input
Transpose, average and dot product can be archived by using tensorflow functions.

How does a 1D multi-channel convolutional layer (Keras) train?

I am working with time series EEG data recorded from 10 individual locations on the body to classify future behavior in terms of increasing heart activity. I would like to better understand how my labeled data corresponds to the training inputs.
So far, several RNN configurations as well as countless combinations of vanilla dense networks have not gotten me great results and I'd figure a 1D convnet is worth a try.
The things I'm having trouble understanding are:
1.) Feeding data into the model.
orig shape = (30000 timesteps, 10 channels)
array fed to layer = (300 slices, 100 timesteps, 10 channels)
Are the slices separated by 1 time step, giving me 300 slices of timesteps at either end of the original array, or are they separated end to end? If the second is true, how could I create an array of (30000 - 100) slices separated by one ts and is also compatible with the 1D CNN layer?
2) Matching labels with the training and testing data
My understanding is that when you feed in a sequence of train_x_shape = (30000, 10), there are 30000 labels with train_y_shape = (30000, 2) (2 classes) associated with the train_x data.
So, when (300 slices of) 100 timesteps of train_x data with shape = (300, 100, 10) are fed into the model, does the label value correspond to the entire 100 ts (one label per 100 ts, with this label being equal to the last time step's label), or are each 100 rows/vectors in the slice labeled- one for each ts?
Train input:
train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
n_timesteps = 100
n_channels = 10
layer : model.add(Convolution1D(filters = n_channels * 2, padding = 'same', kernel_size = 3, input_shape = (n_timesteps, n_channels)))
final layer : model.add(Dense(2, activation = 'softmax'))
I use categorical_crossentropy for loss.
Answer 1
This will really depend on "how did you get those slices"?
The answer is totally dependent on what "you're doing". So, what do you want?
If you have simply reshaped (array.reshape(...)) the original array from shape (30000,10) to shape (300,100,10), the model will see:
300 individual (and not connected) sequences
100 timesteps in each sequence
Sequence 1 goes from step 0 to 299;
Sequence 2 goes from step 300 to 599 and so on.
Creating overlapping slices - Sliding window
If you want to create sequences shifted by only one timestep, make a loop for that.
import numpy as np
originalSequence = someArrayWithShape((30000,10))
newSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newSlices = np.asarray(newSlices)
Beware: if you do this in the input data, you will have to do a similar thing in your output data as well.
Again, that's totally up to you. What do you want to achieve?
Convolutional layers will keep the timesteps with these options:
If you use padding='same', the final length will be the same as the input
If you don't, the final length will be reduced depending on the kernel size you choose
Recurrent layers will keep the timesteps or not depending on:
Whether you use return_sequences=True - Output has timesteps
Or you use return_sequences=False - Output has no timesteps
If you want only one output for each sequence (not per timestep):
Recurrent models:
Use LSTM(...., return_sequences=True) until the last LSTM
The last LSTM will be LSTM(..., return_sequences=False)
Convolutional models:
At some point after the convolutions, choose one of these to add:
Flatten (but treat the number of channels later with a Dense(2)
I think I'd go with GlobalMaxPooling2D if using convoltions, but recurrent models seem better for this. (Not a rule, though).
You can choose to use intermediate MaxPooling1D layers to gradually reduce the length from 100 to 50, then to 25 and so on. This will probably reach a better output.
Remember to keep X and Y paired:
import numpy as np
train_x = someArrayWithShape((30000,10))
train_y = someArrayWithShape((30000,2))
newXSlices = [] #empty list
newYSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newXSlices = np.asarray(newXSlices)
newYSlices = np.asarray(newYSlices)