Related
I know for a 4D vector, shape should be (4, 1) which is actually represented in 4D space but ndim is 2, and for some ndarray to be in 4 dimension, its shape should be something like (2, 3, 4, 5).
So, Is it like dimensional concept differs between vector and matrices (or arrays)? I'm trying to understand from mathematical perspective and how it's derived to pandas programming.
The dimensionality of a mathematical object is usually determined by the number of independent parameters in that particular object. For example, a 4-D vector is mathematically 4 dimensional because it contains 4 independent elements (unless some relation between them has been specified). Such a vector, if represented as a column vector in numpy, would have a shape (4, 1) because it has 4 rows and 1 column. The transpose of this vector, a row vector, has shape (4, ) because it has 4 columns and only 1 row, and the row-style view is default, so if there is 1 row, it's not explicitly mentioned.
Note however, that the column vector and row vector are dimensionally equivalent mathematically. Both have 4 dimensions.
For a 3 x 3 matrix, the most general mathematical dimension is 9, because it has 9 independent elements in general. The shape of a corresponding numpy array would be (3, 3). If you're looking for the maximum number of elements in any numpy array, ndarray.size is the way to go.
ndarray.ndim, however, yields the number of axes in a numpy array. That is, the number of directions along which values are placed (sloppy terminology!). So for the 3 x 3 matrix, ndim yields 2. For an array of shape (3, 7, 2, 1), ndim would yield 4. But, as we already discussed, the mathematical dimension would generally be 3 x 7 x 2 x 1 = 42 (So this is a matrix in 42-dimensional space! But the numpy array has just 4 dimensions). Thereby, as you might've already noticed, ndarray.size is just the product of the numbers in ndarray.shape.
Note that these are not just concepts of programming. We are used to saying "2-D matrices" in mathematics, but that is not to be confused with the space in which the matrices reside.
Having multiple 2D flow maps, ie vector fields how would one find statistical correlation between pairs of these?
The problem:
One should not (?) resize 2 flow maps of shape (x,y,2): flow1, flow2 to 1D vectors and run
np.correlation_coeff(flow1.reshape(1,-1),flow2.reshape(1,-1))
since x,y entries are connected.
Plotting yields, for visualization purposes only:
flow1:
flow2:
I am thinking about comparing magnitudes and direction.
How would one ideally compare those (cosinus-distance, ...)?
How would one compare covariance between vector fields?
Edit:
I am aware that np.corrcoef(flow1.reshape(2,-1), flow2.reshape(2,-1)) would return a 4,4 correlation coefficient matrix but find it unintuitive to interpret.
For some measures of similarity it may indeed be desirable to take the spatial structure of the domain into account. But a coefficient of correlation does not do that: it is invariant under any permutations of the domain. For example, the correlation between (0, 1, 2, 3, 4) and (1, 2, 4, 8, 16) is the same as between (1, 4, 2, 0, 3) and (2, 16, 4, 1, 8) where both arrays were reshuffled in the same way.
So, the coefficient of correlation would be obtained by:
Centering both arrays, i.e., subtracting their mean. Say, we get FC1 and FC2.
Taking the inner product FC1 and FC2 : this is just the sum of the products of matching entries.
Dividing by the square roots of the inner products FC1*FC1 and FC2*FC2.
Example:
flow1 = np.random.uniform(size=(10, 10, 2)) # the 3rd dimension is for the components
flow2 = flow1 + np.random.uniform(size=(10, 10, 2))
flow1_centered = flow1 - np.mean(flow1, axis=(0, 1))
flow2_centered = flow2 - np.mean(flow2, axis=(0, 1))
inner_product = np.sum(flow1_centered*flow2_centered)
r = inner_product/np.sqrt(np.sum(flow1_centered**2) * np.sum(flow2_centered**2))
Here the flows have some positive correlation because I included flow2 in flow1. Specifically, it's a number around 1/sqrt(2), subject to random noise.
If this is not what you want, then you don't want the coefficient of correlation but some other measure of similarity.
I want to implement a neural network in Keras of this architecture: say if I have some inputs and they belong to some groups. Then the neural network is like this:
input -> some layers -> separate inputs by groups -> average inputs by groups -> output
In brief, I want to separate inputs by groups then take the average of inputs by groups.
For example, if I have some inputs tensor [1, 2, 3, 4, 5, 6] and they are belonging to two groups [0, 1, 1, 0, 0, 1]. Then I want to the output tensor is like this: [3.333, 3.666, 3.666, 3.333, 3.333, 3.666]. Here 3.333 is the average of group 0 [1, 4, 5] and 3.666 is the average of group 1 [2, 3, 6].
I am not sure if you can separate the inputs as you described above directly in Keras or Tensorflow. Here is what I could come up with:
Create a mask corresponding to each class where 1 is for the element at the index being in the class and 0 for any element of another class. So in your example, you would do [0,1,1,0,0,1] for one class and [1,0,0,1,1,0] for the other. ( if you have more classes, you will correspondingly have more masks )
Stack those vectors to get a 3-D tensor and do 1D convolution with 0 stride. Use tf.nn.conv1d(). Think of those masks as filters of a Convolution operation and it's separating the classes. Be sure to reshape your Tensors to match the operation requirements.
After the convolution, you will have a 3-D Tensor where each vector would contain a classes elements. For your example you should get a Tensor with two vectors as [0,2,3,0,0,6] and [1,0,0,4,5,0]. Use tf.reduce_mean() on the correct axis to get the average of each class.
Multiply the Tensor of the mean : [[3.333], [3.666]] with the masks using tf.multiply() and add the vectors using tf.reduce_sum() on the correct axis. And it should result in the vector you desire.
I have figured out a method. It can be archived by matrix manipulation. First turn the cluster vector to a categorical matrix, for example, if the batch size is 6, the categorical matrix (cluster) is like:
1, 0
1, 0
0, 1
0, 1
1, 0
0, 1
then we generate a cluster_mean matrix:
1/3, 0
1/3, 0
0, 1/3
0, 1/3
1/3, 0
0, 1/3
If we have an input matrix n*b (n is the number of features and b is the batch), then we can get average by cluster by using
cluster * t(cluster_mean) * input
Transpose, average and dot product can be archived by using tensorflow functions.
Let's take a very simple case: an array with shape (2,3,4), ignoring the values.
>>> a.shape
(2, 3, 4)
When we transpose it and print the dimensions:
>>> a.transpose([1,2,0]).shape
(3, 4, 2)
So I'm saying: take axis index 2 and make it the first, then take axis index 0 and make it the second and finally take axis index 1 and make it the third. I should get (4,2,3), right?
Well, I thought perhaps I don't understand the logic fully. So I read the documentation and its says:
Use transpose(a, argsort(axes)) to invert the transposition of tensors
when using the axes keyword argument.
So I tried
>>> c = np.transpose(a, [1,2,0])
>>> c.shape
(3, 4, 2)
>>> np.transpose(a, np.argsort([1,2,0])).shape
(4, 2, 3)
and got yet a completely different shape!
Could someone please explain this? Thanks.
In [259]: a = np.zeros((2,3,4))
In [260]: idx = [1,2,0]
In [261]: a.transpose(idx).shape
Out[261]: (3, 4, 2)
What this has done is take a.shape[1] dimension and put it first. a.shape[2] is 2nd, and a.shape[0] third:
In [262]: np.array(a.shape)[idx]
Out[262]: array([3, 4, 2])
transpose without parameter is a complete reversal of the axis order. It's an extension of the familiar 2d transpose (rows become columns, columns become rows):
In [263]: a.transpose().shape
Out[263]: (4, 3, 2)
In [264]: a.transpose(2,1,0).shape
Out[264]: (4, 3, 2)
And the do-nothing transpose:
In [265]: a.transpose(0,1,2).shape
Out[265]: (2, 3, 4)
You have an initial axes order and final one; describing swap can be hard to visualize if you don't regularly work with lists of size 3 or larger.
Some people find it easier to use swapaxes, which changes the order of just axes. rollaxis is yet another way.
I prefer to use transpose since it can do anything the others can; so I just have to develop an intuitive for one tool.
The argsort comment operates this way:
In [278]: a.transpose(idx).transpose(np.argsort(idx)).shape
Out[278]: (2, 3, 4)
That is, apply it to the result of one transpose to get back the original order.
np.argsort([1,2,0]) returns an array like [2,0,1]
So
np.transpose(a, np.argsort([1,2,0])).shape
act like
np.transpose(a, [2,0,1]).shape
not
np.transpose(a, [1,2,0]).shape
I'm now trying to use tf.losses.sigmoid_cross_entropy on an unbalanced dataset. However, I'm a little confused on the parameter weights. Here are the comments in the documentation:
weights: Optional Tensor whose rank is either 0, or the same rank as
labels, and must be broadcastable to labels (i.e., all dimensions must
be either 1, or the same as the corresponding losses dimension).
I know in tf.losses.softmax_cross_entropy the parameter weights can be a rank 1 tensor with weight for each sample. Why must the weights in tf.losses.sigmoid_cross_entropy have the same rank as labels?
Can anybody answer me? Better with an example.
You want your loss to be weighted and so tensorflow expects that you will provide it weight for each of your label. Consider the following example
Labels: [0, 0, 0, 1, 0]
possible_weights1: [1]
possible_weights2: [1, 2, 1, 1, 1]
illegal_weights1: [1, 2]
illegal_weights2: [[1], [2]]
Here your labels have rank 1 (only 1 dimension), so tensorflow expects that either you'll provide weight for each of the element in label (as demonstrated in possible_weights2) or will provide weight for each dimension (as demonstrated in possible_weights1, which is broadcasted to [1, 1, 1, 1, 1]).
But, if you have illegal_weights2 as your weights, then tensorflow does not understand how it should handle the two dimensions in the weights, since there is only one dimension in labels? So your rank should always be same.
illegal_weights1 is case where rank is same but weights are neither of the same length as labels, nor of length 1 (which can be broadcasted), but are of length 2 which cannot be broadcasted and hence is illegal.