How to pass groups in pointwise learning to rank - xgboost

In pairwise or listwise learning to rank using xgboost.. i can pass a parameter called group during model training which basically means to fit the model for every query group. But how do i pass groups for pointwise learning to rank using say ordinal regression?

Related

Loss function for differences between two tensors

I'm training a convolution neural network (using Tensorflow) with the method of the so called 'Knowledge Distillation (KD)' that in few words consist on training a big model (the teacher) on the task that you want to achieve and after that to train a smaller model (the student) in a way that it can simulate the results of the teacher but using less parameters and so being more quickly at test time.
The problem that I'm facing regards how to build in an effective way the loss function between the result of the student model and the teacher model on the same input (the result is a tensor with the same size either for student and teacher model).
I don't have a classification task, so I don't have a label for the input, but I have only the result from the teacher that I want to simulate.
At now the loss function is defined like this:
loss_value = tf.nn.l2_loss(student_prediction - teacher_prediction)
The 'student_prediction' and 'teacher_prediction' are calculated runtime given each input in the dataset.
With this definition I'm still not able to reach convergence with my student model.
Thank you.
I found myself the answer (using MSE). That is this:
loss = tf.reduce_mean(tf.squared_difference(tensor_1, tensor_2))

Batch structure for training a ranking model with contrastive loss?

How do I choose my batch if I train a deep ranking model with a eg. contrastive loss where I have per query 1 positive document and 2 negative samples?
So, it is about ranking (loss) which applies to eg. the quora question pair data or any other question/answer pairs which I want to rank using a deep learning ranking model or just a Siamese network.
The data would look like this: https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/datasets/toy/train.csv
Now, I assume that it is crucial how to build the batch, right? Since for every question all according pos and neg answers need to be contained inside a batch, right?
Different strategies can be used to build the batches and the triplets or pairs. Usually, the batches are built randomnly, and then the hardest negative, or one of the hardest negatives in the batch is picked.
So yes, positive and negatives examples need to be contaned inside a batch. And it is crucial to pick negatives. But usually efforts are made to pick the proper negatives inside the batch, instead of in building the batches in a specific way.
This blogpost explaining how ranking losses work may be usefull https://gombru.github.io/2019/04/03/ranking_loss/

How to compute a per-class parameter from a mini-batch in TensorFlow?

I am starting to learn TensorFlow and I have a seemingly simple modeling question. Suppose I have a C-class problem and data arrives into TensorFlow in mini-batches containing B samples each. Each sample x is a D-dimensional vector that comes with its label y (non-negative integer between 0 and C-1). I want to estimate a class-specific parameter (for example the sample mean) for each class. The estimation takes place after each sample independently undergoes a TensorFlow-defined transformation pipeline. The per-class parameter/sample-mean is then utilized in the computation of other tensors.
Intuitively, I would group the samples in each mini-batch by label, sum-combine them, and add the total of each label group to the corresponding class parameter, with appropriate normalization.
How can I implement such a simple procedure (group by label, perform a per-group operation, then use the labels as indices for writing into a tensor) or an equivalent one, using TensorFlow? What TensorFlow operations do I need to learn about to achieve this? Is it advisable to do it outside TensorFlow?

Rating prediction in non negative matrix factorization

I was following this blog http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/ (Also attaching the matrix here)for the rating prediction using matrix factorization . Initially we have a sparse user-movie matrix R .
We then apply the MF algorithm so as to create a new matrix R' which is the product of 2 matrix P(UxK) and Q(DxK) . We then "minimize" the error in the value given in R and R' .So far so good . But in the final step , when the matrix is filled up , I am not so convinced that these are the predicted values that the user will give . Here is the final matrix:
What is the basis of justification that these are in fact the "predicted" ratings . Also , I am planning to use the P matrix (UxK) as the user's latent features . Can we somehow "justify" that these are infact user's latent features ?
The justification for using the obtained vectors for each user as latent trait vectors is that using these values of the latent latent traits will minimize the error between the predicted ratings and the actual known ratings.
If you take a look at the predicted ratings and the known ratings in the two diagrams that you posted you can see that the difference between the two matrixes in the cells that are common to both is very small. Example: U1D4 is 1 in the first diagram and 0.98 in the second.
Since the features or user latent trait vector produces good results on the known ratings we think that it would do a good job on predicting the unknown ratings. Of course, we use regularisation to avoid overfitting the training data, but that is the general idea.
To evaluate how good your latent feature vectors are you should split your data into training, validation and test.
The training set are the observed ratings that you use to learn your latent features. The validation set is used during learning to tune your model parameters, but but due learning and your test set is used to evaluate your learnt latent features once they are learnt. You can simply set aside a percentage of observed samples for validation and test. If your ratings are time stamped a natural way to select then is but using the most recent samples as validation and test.
More details on splitting your data is here https://link.medium.com/mPpwhdhjknb

Not passing a value for a placeholder in tensorflow, why isn't it allowed?

I had 3 models that use the same input but produce 3 unique outputs (1 classifier, 2 regression). 2 of the 3 I combined into 1 model with 2 loss functions and saw a significant improvement in accuracy/RMSE.
I'm trying to combine the 3rd loss function into the model, so I have 1 model with 3 loss functions that share many parameters.
The 3rd loss function only applies to half the data though. I tested standardizing the labels to 0-mean-unit-variance and using 0 for the labels where they don't apply to loss function C, but that biased results towards 0 in some cases.
I'm now experimenting with alternating optimization on loss functions A & B together with a batch from the full dataset, vs all 3 loss functions A, B, & C with a batch appropriate for loss C (and A&B). In the context of my problem this is logical to do.
My Question:
Tensorflow requires all placeholders that are defined in the graph to be passed in. However, I'm not using that tensor in this particular optimization step. Is this expected behavior? And should I just pass in a dummy variable to appease TF here? I wonder if I'm missing an important detail.
The dependency was with tensorboard, I had a summary operation on all loss functions, forcing them to be executed.
I split out my summary operations into groups using tf.add_to_collection() to gather different summary ops, then used a for loop to add them to the list of tensors to process as appropriate.
It was that and one other dependency that was just a bug that I found. #Sygi and #Fake are correct, you shouldn't need to pass in a value that isn't used in a particular computation just because it exists in the graph.