DL4J with ComputationGraph : Cosine similarity between layers - cosine-similarity

Context
I am trying to create a model with DL4J.
There is two embeddings : one for user and one for item.
val conf = new NeuralNetConfiguration.Builder()
.updater(new Sgd(0.01))
.graphBuilder()
.addInputs("item_input", "user_input")
.addLayer("item_embedding", new DenseLayer.Builder().nIn(5).nOut(5).build(), "item_input")
.addLayer("user_embedding", new DenseLayer.Builder().nIn(5).nOut(5).build(), "user_input")
// Something
.build()
val net = new ComputationGraph(conf)
net.init()
Problem
At the end I would like to compute the cosine similarity between these two embeddings.
Then I want to train the model to maximize the similarity on positive example and minimize it on negative one.
Positive example = the user is interrested by the item
Negative example = the user is not interrested by the item
Possible solutions
I have found two possible solutions.
1) Create a custom layer class.
2) Create a custom LossFunction to apply cosine similarity on the output layers.
Questions
1) Is there a layer already implemented that implement a cosine similarity beetween two layers ?
2) If not, how can I implement my own layer ?
The only example I found is the following : https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/customlayers/CustomLayerExampleReadme.md

You'll want to make a custom vertex. Check out the Vertex implementations here: https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl . I think the L2Vertex will be the most similar to what you want.

Related

Creating a new random operation in computation graph of Tensorflow

How can I create a new random operator (something like tf.random_normal) which is part of the graph? I want to add Cauchy random variable to the output of one layer of my network.
I found tf.contrib.distributions.Cauchy, but how can I make it work inside a layer ( as we have with random_normal)?

Tensorflow: pattern training and generation

Imagine I have hundreds of rectangular patterns that look like the following:
_yx_0zzyxx
_0__yz_0y_
x0_0x000yx
_y__x000zx
zyyzx_z_0y
Say the only variables for the different patterns are dimension (width by height in characters) and values at a given cell within the rectangle with possible characters _ y x z 0. So another pattern might look like this:
yx0x_x
xz_x0_
_yy0x_
zyy0__
and another like this:
xx0z00yy_z0x000
zzx_0000_xzzyxx
_yxy0y__yx0yy_z
_xz0z__0_y_xz0z
y__x0_0_y__x000
xz_x0_z0z__0_x0
These simplified examples were randomly generated, but imagine there is a deeper structure and relation between dimensions and layout of characters.
I want to train on this dataset in an unsupervised fashion (no labels) in order to generate similar output. Assuming I have created my dataset appropriately with tf.data.Dataset and categorical identity columns:
what is a good general purpose model for unsupervised training (no labels)?
is there a Tensorflow premade estimator that would represent such a model well enough?
once I've trained the model, what is a general approach to using it for generation of patterns based on what it has learned? I have in mind Google Magenta, which can be used to train on a dataset of musical melodies in order to generate similar ones from a kind of seed/primer melody
I'm not looking for a full implementation (that's the fun part!), just some suggested tutorials and next steps to follow. Thanks!

Torch, neural nets - forward function on gmodule object - nngraph class

I am a newbie to torch and lua (as anyone who has been following my latest posts could attest :) and have the following question on the forward function for the gmodule object (class nngraph).
as per the source code (https://github.com/torch/nn/blob/master/Module.lua - as class gmodule inherits from nn.module) the syntax is:
function Module:forward(input)
return self:updateOutput(input)
end
However, I have found cases where a table is passed as input, as in:
local lst = clones.rnn[t]:forward{x[{{}, t}], unpack(rnn_state[t-1])}
where:
clones.rnn[t]
is itself a gmodule object. In turn, rnn_state[t-1] is a table with 4 tensors. So in the end, we have something akin to
result_var = gmodule:forward{[1]=tensor_1,[2]=tensor_2,[3]=tensor_3,...,[5]=tensor_5}
The question is, depending on the network architecture, can you pass input - formatted as table - not only to the input layer but also to the hidden layers?
In that case, you have to check that you pass exactly one input per layer? (with the exception of the output layer)
Thanks so much
I finally found the answer. The module class (as well as the inherited class gmodule) has an input and an output.
However, the input (as well as the output) needs not be a vector, but it could be a collection of vectors - that depends on the neural net configuration, in this particular case it is a pretty complex recursive neural net.
So if the net has more than one input vector, you can do:
result_var = gmodule:forward{[1]=tensor_1,[2]=tensor_2,[3]=tensor_3,...,[5]=tensor_5}
where each tensor/vector is one of the input vectors. Only one of those vectors is the X vector, or the feature vector. The others could serve as input to other intermediate nodes.
In turn, result_var (which is the output) can have one output as tensor (the prediction) or a collection of tensors as output (a collection of tensors), depending on the network configuration.
If the latter is the case, one of those output tensors is the prediction, and the reminder are usually used as input to the intermediate nodes in the next time step - but that again depends on the net configuration.

sklearn: get feature names after L1-based feature selection

This question and answer demonstrate that when feature selection is performed using one of scikit-learn's dedicated feature selection routines, then the names of the selected features can be retrieved as follows:
np.asarray(vectorizer.get_feature_names())[featureSelector.get_support()]
For example, in the above code, featureSelector might be an instance of sklearn.feature_selection.SelectKBest or sklearn.feature_selection.SelectPercentile, since these classes implement the get_support method which returns a boolean mask or integer indices of the selected features.
When one performs feature selection via linear models penalized with the L1 norm, it's unclear how to accomplish this. sklearn.svm.LinearSVC has no get_support method and the documentation doesn't make clear how to retrieve the feature indices after using its transform method to eliminate features from a collection of samples. Am I missing something here?
For sparse estimators you can generally find the support by checking where the non-zero entries are in the coefficients vector (provided the coefficients vector exists, which is the case for e.g. linear models)
support = np.flatnonzero(estimator.coef_)
For your LinearSVC with l1 penalty it would accordingly be
from sklearn.svm import LinearSVC
svc = LinearSVC(C=1., penalty='l1', dual=False)
svc.fit(X, y)
selected_feature_names = np.asarray(vectorizer.get_feature_names())[np.flatnonzero(svc.coef_)]
I've been using sklearn 15.2, and according to LinearSVC documentation , coef_ is an array, shape = [n_features] if n_classes == 2 else [n_classes, n_features].
So first, np.flatnonzero doesn't work for multi-class. You'll have index out of range error. Second, it should be np.where(svc.coef_ != 0)[1] instead of np.where(svc.coef_ != 0)[0] . 0 is index of classes, not features. I ended up with using np.asarray(vectorizer.get_feature_names())[list(set(np.where(svc.coef_ != 0)[1]))]

Scikit Naive Bayes Classification for text

i am trying to use scikit for the Naive Basyes classification. i have couple of question (Also i am new to scikit)
1) Scikit Algorithms want input as a numpy array and label as arrays. In case of text classification should i map each of my word with a number (id) , by maintaining a hash of words in vocab and a unique id associated with it? is this is standard practice in scikit?
2) In case of assigning same text to more than one class how should i proceed. One obvious way is to replicate each training example one for each associated label. Any better representation exist?
3) Similarly for the test data how will i get more than one class associated with a test?
I am using http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html
as my base.
1) yes. Use DictVectorizer or HashVectorizer from the feature_extraction module.
2) This is a multilabel problem. Maybe use the OneVsRestClassifier from the multi_class module. It will train a separate classifier for each class.
3) Using a multilabel classifier / one classifier per calss will do that.
Take a look at http://scikit-learn.org/dev/auto_examples/grid_search_text_feature_extraction.html
and http://scikit-learn.org/dev/auto_examples/plot_multilabel.html