Difference between number of states and number of mixtures in Gaussian Mixture Model? - hidden-markov-models

I am using the GMMHMM library and I can not figure out the difference between
n_components : Number of states in the model.
n_mix : Number of states in the GMM.

n_components means the states in the Hidden Markov Model; n_mix means the number of the mixtures in the mixture gaussian distribution corresponding to each state.

Related

Can features be given as input to hidden markov model?

I want to train a HMM classifier with features as input. Considering two observation states(o1, o2) and two hidden states(h1, h2), and some initial probability I apply a supervised algorithm and on the basis of the classifier output, calculate the following
Transition prob :
[ P(h1/h1), P( h1/ h2);
P(h2/ h1),P(h2/h2)].
emission prob:
[p(o1/h1), p(o1/h2);
p(o2/h1), p(o2/h2)]
Is this the correct way to calculate the probabilities?

how to generate different samples using PixelCNN?

I am trying pixelcnn, which is auto-regressive generative model. After training, the model receive an all-zero tensor and generate the next pixel form the left top coner. Now that the model parameters are fixed, does the model only can produce the same outputs starting from the same zero tensor? How to produce different samples?
Yes, you always provide an all-zero tensor. However, for PixelCNN each pixel location is represented by a distribution. So when you do the forward pass you then sample from a random distribution at the end. That is how the pixel values are different each run.
This is of course because PixelCNN is a probabilistic neural network. So the pixels, as mentioned before, are all represented by conditional probability distributions of all the layers below, not just point estimates.

Tensorflow, difference between tf.nn.softmax_cross_entropy_with_logits and tf.nn.sparse_softmax_cross_entropy_with_logits

I have read the docs of both functions, but as far as I know, for function tf.nn.softmax_cross_entropy_with_logits(logits, labels, dim=-1, name=None), the result is the cross entropy loss, in which the dimensions of logits and labels are the same.
But, for function tf.nn.sparse_softmax_cross_entropy_with_logits, the dimensions of logits and labels are not the same?
Could you give a more detail example of tf.nn.sparse_softmax_cross_entropy_with_logits?
The difference is that tf.nn.softmax_cross_entropy_with_logits doesn't assume that the classes are mutually exclusive:
Measures the probability error in discrete classification tasks in
which each class is independent and not mutually exclusive. For
instance, one could perform multilabel classification where a picture
can contain both an elephant and a dog at the same time.
Compare with sparse_*:
Measures the probability error in discrete classification tasks in
which the classes are mutually exclusive (each entry is in exactly one
class). For example, each CIFAR-10 image is labeled with one and only
one label: an image can be a dog or a truck, but not both.
As such, with sparse functions, the dimensions of logits and labels are not the same: labels contain one number per example, whereas logits the number of classes per example, denoting probabilities.

Rating prediction in non negative matrix factorization

I was following this blog http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/ (Also attaching the matrix here)for the rating prediction using matrix factorization . Initially we have a sparse user-movie matrix R .
We then apply the MF algorithm so as to create a new matrix R' which is the product of 2 matrix P(UxK) and Q(DxK) . We then "minimize" the error in the value given in R and R' .So far so good . But in the final step , when the matrix is filled up , I am not so convinced that these are the predicted values that the user will give . Here is the final matrix:
What is the basis of justification that these are in fact the "predicted" ratings . Also , I am planning to use the P matrix (UxK) as the user's latent features . Can we somehow "justify" that these are infact user's latent features ?
The justification for using the obtained vectors for each user as latent trait vectors is that using these values of the latent latent traits will minimize the error between the predicted ratings and the actual known ratings.
If you take a look at the predicted ratings and the known ratings in the two diagrams that you posted you can see that the difference between the two matrixes in the cells that are common to both is very small. Example: U1D4 is 1 in the first diagram and 0.98 in the second.
Since the features or user latent trait vector produces good results on the known ratings we think that it would do a good job on predicting the unknown ratings. Of course, we use regularisation to avoid overfitting the training data, but that is the general idea.
To evaluate how good your latent feature vectors are you should split your data into training, validation and test.
The training set are the observed ratings that you use to learn your latent features. The validation set is used during learning to tune your model parameters, but but due learning and your test set is used to evaluate your learnt latent features once they are learnt. You can simply set aside a percentage of observed samples for validation and test. If your ratings are time stamped a natural way to select then is but using the most recent samples as validation and test.
More details on splitting your data is here https://link.medium.com/mPpwhdhjknb

Seq2Seq for prediction of complex states

My problem:
I have a sequence of complex states and I want to predict the future states.
Input:
I have a sequence of states. Each sequence can be of variable length. Each state is a moment in time and is described by several attributes: [att1, att2, ...]. Where each attribute is a number between an interval [[0..5], [1..3651], ...]
The example (and paper) of Seq2Seq is based on that each state (word) is taken from their dictionary. So each state has around 80.000 possibilities. But how would you represent each state when it is taken from a set of vectors and the set is just each possible combination of the attributes.
Is there any method to work with more complex states with TensorFlow? Also, what is a good method do decide the boundaries of your buckets when the relation between input length and output length is unclear?
May I suggest a rephrasing and splitting of your question into two parts? The first is really a general machine learning/LSTM question that's independent of tensorflow: How to use an LSTM to predict when the sequence elements are general vectors, and the second is how to represent this in tensorflow. For the former - there's nothing really magical to do there.
But a very quick answer: You've really just skipped the embedding lookup part of seq2seq. You can feed dense tensors in to a suitably modified version of it -- your state is just a dense vector representation of the state. That's the same thing that comes out of an embedding lookup.
The vector representation tutorial discusses the preprocessing that turns, e.g., words into embeddings for use in later parts of the learning pipeline.
If you look at line 139 of seq2seq.py you'll see that the embedding_rnn_decoder takes in a 1D batch of things to decide (the dimension is elements in the batch), but then uses the embedding lookup to turn it into a batch_size * cell.input_size tensor. You want to directly input a batch_size * cell.input_size tensor into the RNN, skipping the embedding step.