Having multiple 2D flow maps, ie vector fields how would one find statistical correlation between pairs of these?
The problem:
One should not (?) resize 2 flow maps of shape (x,y,2): flow1, flow2 to 1D vectors and run
np.correlation_coeff(flow1.reshape(1,-1),flow2.reshape(1,-1))
since x,y entries are connected.
Plotting yields, for visualization purposes only:
flow1:
flow2:
I am thinking about comparing magnitudes and direction.
How would one ideally compare those (cosinus-distance, ...)?
How would one compare covariance between vector fields?
Edit:
I am aware that np.corrcoef(flow1.reshape(2,-1), flow2.reshape(2,-1)) would return a 4,4 correlation coefficient matrix but find it unintuitive to interpret.
For some measures of similarity it may indeed be desirable to take the spatial structure of the domain into account. But a coefficient of correlation does not do that: it is invariant under any permutations of the domain. For example, the correlation between (0, 1, 2, 3, 4) and (1, 2, 4, 8, 16) is the same as between (1, 4, 2, 0, 3) and (2, 16, 4, 1, 8) where both arrays were reshuffled in the same way.
So, the coefficient of correlation would be obtained by:
Centering both arrays, i.e., subtracting their mean. Say, we get FC1 and FC2.
Taking the inner product FC1 and FC2 : this is just the sum of the products of matching entries.
Dividing by the square roots of the inner products FC1*FC1 and FC2*FC2.
Example:
flow1 = np.random.uniform(size=(10, 10, 2)) # the 3rd dimension is for the components
flow2 = flow1 + np.random.uniform(size=(10, 10, 2))
flow1_centered = flow1 - np.mean(flow1, axis=(0, 1))
flow2_centered = flow2 - np.mean(flow2, axis=(0, 1))
inner_product = np.sum(flow1_centered*flow2_centered)
r = inner_product/np.sqrt(np.sum(flow1_centered**2) * np.sum(flow2_centered**2))
Here the flows have some positive correlation because I included flow2 in flow1. Specifically, it's a number around 1/sqrt(2), subject to random noise.
If this is not what you want, then you don't want the coefficient of correlation but some other measure of similarity.
Related
Assuming I have 5x5x3 image and I have different filter for each channel - for example 3x3x3.
In Cov2D first, each of the kernels in the filter are applied to three channels in the input layer, separately (which gives 3x3x3 - without padding and stride 1) and the these three channels are summed together (element-wise addition), gives 3x3x1.
I want instead of summation over channels (3x3x1), concatenate the three channels (3x3x3).
Thanks for help.
What you are referring to is depthwise convolution where outputs' channels are concatenated rather than summed. (See https://www.tensorflow.org/api_docs/python/tf/keras/layers/DepthwiseConv2D for details)
Demonstration:
x = np.random.rand(1,5,5,3)
l = tf.keras.layers.DepthwiseConv2D(3, depth_multiplier=1)
print(l(x).shape)
'''
(1, 3, 3, 3)
'''
You can use depth_multiplier to control the number of depthwise kernels applied to each channel.
l2 = tf.keras.layers.DepthwiseConv2D(3, depth_multiplier=2)
print(l2(x).shape)
'''
(1, 3, 3, 6)
'''
I know for a 4D vector, shape should be (4, 1) which is actually represented in 4D space but ndim is 2, and for some ndarray to be in 4 dimension, its shape should be something like (2, 3, 4, 5).
So, Is it like dimensional concept differs between vector and matrices (or arrays)? I'm trying to understand from mathematical perspective and how it's derived to pandas programming.
The dimensionality of a mathematical object is usually determined by the number of independent parameters in that particular object. For example, a 4-D vector is mathematically 4 dimensional because it contains 4 independent elements (unless some relation between them has been specified). Such a vector, if represented as a column vector in numpy, would have a shape (4, 1) because it has 4 rows and 1 column. The transpose of this vector, a row vector, has shape (4, ) because it has 4 columns and only 1 row, and the row-style view is default, so if there is 1 row, it's not explicitly mentioned.
Note however, that the column vector and row vector are dimensionally equivalent mathematically. Both have 4 dimensions.
For a 3 x 3 matrix, the most general mathematical dimension is 9, because it has 9 independent elements in general. The shape of a corresponding numpy array would be (3, 3). If you're looking for the maximum number of elements in any numpy array, ndarray.size is the way to go.
ndarray.ndim, however, yields the number of axes in a numpy array. That is, the number of directions along which values are placed (sloppy terminology!). So for the 3 x 3 matrix, ndim yields 2. For an array of shape (3, 7, 2, 1), ndim would yield 4. But, as we already discussed, the mathematical dimension would generally be 3 x 7 x 2 x 1 = 42 (So this is a matrix in 42-dimensional space! But the numpy array has just 4 dimensions). Thereby, as you might've already noticed, ndarray.size is just the product of the numbers in ndarray.shape.
Note that these are not just concepts of programming. We are used to saying "2-D matrices" in mathematics, but that is not to be confused with the space in which the matrices reside.
How can I squeeze all but specific axis(es) of a Tensor?
The input size (and how many 1 dimension there is) is unknown in advance but I know that I wanna keep the first and second axis only (i.e. the rest can be only 1).
e.g. If I have a Tensor of shape (1, 4, 1, 1, 1) and I want it to be squeezed into (1, 4)
I want to implement a neural network in Keras of this architecture: say if I have some inputs and they belong to some groups. Then the neural network is like this:
input -> some layers -> separate inputs by groups -> average inputs by groups -> output
In brief, I want to separate inputs by groups then take the average of inputs by groups.
For example, if I have some inputs tensor [1, 2, 3, 4, 5, 6] and they are belonging to two groups [0, 1, 1, 0, 0, 1]. Then I want to the output tensor is like this: [3.333, 3.666, 3.666, 3.333, 3.333, 3.666]. Here 3.333 is the average of group 0 [1, 4, 5] and 3.666 is the average of group 1 [2, 3, 6].
I am not sure if you can separate the inputs as you described above directly in Keras or Tensorflow. Here is what I could come up with:
Create a mask corresponding to each class where 1 is for the element at the index being in the class and 0 for any element of another class. So in your example, you would do [0,1,1,0,0,1] for one class and [1,0,0,1,1,0] for the other. ( if you have more classes, you will correspondingly have more masks )
Stack those vectors to get a 3-D tensor and do 1D convolution with 0 stride. Use tf.nn.conv1d(). Think of those masks as filters of a Convolution operation and it's separating the classes. Be sure to reshape your Tensors to match the operation requirements.
After the convolution, you will have a 3-D Tensor where each vector would contain a classes elements. For your example you should get a Tensor with two vectors as [0,2,3,0,0,6] and [1,0,0,4,5,0]. Use tf.reduce_mean() on the correct axis to get the average of each class.
Multiply the Tensor of the mean : [[3.333], [3.666]] with the masks using tf.multiply() and add the vectors using tf.reduce_sum() on the correct axis. And it should result in the vector you desire.
I have figured out a method. It can be archived by matrix manipulation. First turn the cluster vector to a categorical matrix, for example, if the batch size is 6, the categorical matrix (cluster) is like:
1, 0
1, 0
0, 1
0, 1
1, 0
0, 1
then we generate a cluster_mean matrix:
1/3, 0
1/3, 0
0, 1/3
0, 1/3
1/3, 0
0, 1/3
If we have an input matrix n*b (n is the number of features and b is the batch), then we can get average by cluster by using
cluster * t(cluster_mean) * input
Transpose, average and dot product can be archived by using tensorflow functions.
Does sklearn PCA consider the columns of the dataframe as the vectors to reduce or the rows as vectors to reduce ?
Because when doing this:
df=pd.DataFrame([[1,-21,45,3,4],[4,5,89,-5,6],[7,-4,58,1,19],[10,11,74,20,12],[13,14,15,45,78]]) #5 rows 5 columns
pca=PCA(n_components=3)
pca.fit(df)
df_pcs=pd.DataFrame(data=pca.components_, index = df.index)
I get the following error:
ValueError: Shape of passed values is (5, 3), indices imply (5, 5)
Rows represent samples and columns represent features. PCA reduces the dimensionality of the data, ie features. So columns.
So if you are talking about vectors, then it considers a row as single feature vector and reduces its size.
If you have a dataframe of shape say [100, 6] and PCA n_components is set to 3. So your output will be [100, 3].
# You need this
df_pcs=pca.transform(df)
# This produces error because shapes dont match.
df_pcs=pd.DataFrame(data=pca.components_, index = df.index)
pca.components_ is an array of [3,5] and your index parameter is using the df.index which is of shape [5,]. Hence the error. pca.components_ represents a completely different thing.
According to documentation:-
components_ : array, [n_components, n_features]
Principal axes in feature space, representing the
directions of maximum variance in the data.