Bayesian inference in feature-based categorization - bayesian

Here is my problem which I hope you can help me with:
Lets say we live in a world where there are only two categories, where each has some features. The objects in this world are different permutations of these features.
cat1: {a, b, c, d, e, f}
cat2: {g, h, i, j}
Now we have an object with these features:
obj: {a, b, c, d, g, h}
What is the probability that this object gets categorized as Cat.1?
p(cat1|a, b, c, d, g, h)?
In general how can I model an equation for:
n categories each with different number of features, objects with different permutations?

You can use a Bayesian classifier to calculate these probabilities. To fit a normal distribution as it is the most used distribution for Bayesian classifiers see this link. However, to use this solution you need to assume that every object holds a value for all of your features for example:
obj1:{a=1, b=1, c=1, d=1, e=0, f=0, g=1, h=1, i=0, j=0}
Note that feature values must be normalized before calculating parameters of your distribution.

Related

Self-Attention Explainability of the Output Score Matrix

I am learning about attention models, and following along with Jay Alammar's amazing blog tutorial on The Illustrated Transformer. He gives a great walkthrough for how the attention scores are calculated, but I get a bit lost at a certain point, and am not seeing how the attention score Z matrix he explains is used to interpret strength of associations between different words within an input sequence.
He mentions that given some input matrix X, with shape N x D, where N is the number of elements in an input sequence, and D is the input dimensionality, we multiply X with three separate weight matrices of shape D x d, where d is some lower dimensionality that represents the projected space of the query, key, and value matrices:
The query and key matrices are dotted, and then divided by a scaling factor usually the square root of the projected dimensionality, and then run through a softmax function. This produces a weight matrix of size N x N, which is multiplied by the value matrix to get an output Z of shape N x d, which Jay says
That concludes the self-attention calculation. The resulting vector is
one we can send along to the feed-forward neural network.
The screenshot from his blog for this calculation is below:
However, this is where I'm confused. Z is N x d. However, I don't particularly understand what I'm supposed to do with this matrix from an interpretability sense, and as far as I understand, for a particular sequence element (ie. the word cats in the sequence I love pets, especially cats), self-attention is supposed to score other parts of the sequence high when it is relevant or strong associated with that word embedding. However, I'd expect then that Z is N x N, so I could say that I can select the Z[i,j] and say for the i-th word in the sequence, the j-th word relates or associates with it this or that much.
In fact, wouldn't it make much more sense to use only the softmax output of the weights (without multiplying them by the value matrix), since it already is N x N? In essence, how is Jay determining the strength of these associations in this particular sequence with the word it?
This is an N by 1 relationship he is showing - there are N values that correspond with the strength of association to the word it.

How to best use Numpy/Scipy to find optimal common coefficients for a set different linear equations?

I have n (around 5 million) sets of specific (k,m,v,z)* parameters that describe some linear relationships. I want to find the optimal positive a,b and c coefficients that minimize the addition of their absolute values as shown below:
I know beforehand the range for each a, b and c and so, I could use it to make things a bit faster. However, I do not know how to properly implement this problem to best take advantage of Numpy (or Scipy/etc).
I was thinking of iteratively making checks using different a, b and c coefficients (based on a step) and in the end keeping the combination that would provide the minimum sum. But properly implementing this in Numpy is another thing.
*
(k,m,v are either 0 or positive and are in fact k,m,v,i,j,p)
(z can be negative too)
Any tips are welcome!
Either I am missing something, or a == b == c == 0 is optimal. So, a positive solution for (a,b,c) does not exist in general. You can verify this explicitly by posing the minimization problem as a quantile regression of 0 on (k, m, v) with the quantile set to 0.5.
import numpy as np
from statsmodels.regression.quantile_regression import QuantReg
x = np.random.rand(1000, 3)
a, b, c = QuantReg(np.zeros(x.shape[0]), x).fit(0.5).params
assert np.allclose([a, b, c], 0)

Is it possible to calculate covariance between 2 variables based on their individual covariance with a third variable

If one knows the covariance between A and B and the covariance between B and C, is it then possible to calculate the covariance between A and C?
Or what would you additionally need?
Thanks
No it isn't.
If you know Cov(A, B) and Cov(B, C) then Cov(A, C) is constrained, but not uniquely defined.
The constraints can be found by asserting that the covariance matrix (of A, B, and C) is positive semi-definite: i.e. none of its eigenvalues can be negative.

Finding Rotation and Translation between three coordinate systems

If we have three coordinate systems namely A, B, and C and we know the [R|t] from A to B and A to C. Then how can we find the [R|t] between B and C?
From B to C is from B to A to C, so you need to invert the first transformation and combine it with the second.
I assume that by [R|t] you mean the rotation matrix plus translation vector. It might be easier to consider these two as a single square matrix operating on homogeneous coordinates. For planar operations that would be a 3×3 matrix, for 3d operations it would be 4×4. That way you can use regular matrix inversion and multiplication to describe your combined result.

define / declare variable in Scilab`

I would like to ask how I can define / declare a variable in Scilab. In some PDFs that I read, it says that I can just type it in and Scilab will take care of the declaration. Not so. I want to set up a matrix equation of something like:
Ax + By + Cz = D
Mx + Ny + Pz = E
Rx + Sy + Tz = F
And then I want to get the general value of x, y, x in terms of A, B, C, D, E, F, M, N, P, R, S, T. I remember this is possible with Matlab. And later on, I want to plug in these values to get actual numbers. Please help.
Scilab is much more oriented at numerical computation than algebra solving, but you can still do it.
In your case you first should define the system in the form M1*x=M2, being M1 upper triangular.
I suggest you look at help for solve() and trianfml(), there are nice examples.
After that you can evaluate the expressions giving any value you want for A, B, C, ..., using evstr()
For symbolic algebra, I recommend Wolfram mathematica, Maple, or Maxima (this last one is open-source like Scilab)
OK, this is what I found. SciLab requires "symbolic math toolbox" in order to do symbolic math. the scimax/overload toolbox (by Calixte Denizet) can do that by integrating Maxima with SciLab. however, it is only available on Linux/Unix OS. another way to do it is the OVLD/SYM toolbox (by the deceased Jean-François Magni) which works with Windows (even Win 7). however, support for this toolbox has ceased due to its author's demise. the installation guide on spoken-tutorial.org no longer exists. thus, I am left with using Maxima by itself to solve symbolic equations and calculus problems.