word embeddings in tensorflow (no pre_trained) [closed] - tensorflow

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new to tensorflow and trying to look at different examples of tensorflow to understand it better.
Now I have seen this line being used in many tensorflow examples without mentioning of any specific embedding algorithm being used for getting the words embeddings.
embeddings = tf.Variable(tf.random_uniform((vocab_size, embed_dim), -1, 1))
embed = tf.nn.embedding_lookup(embeddings, input_data)
Here are some examples:
https://github.com/Decalogue/dlnd_tv_script_generation/blob/master/dlnd_tv_script_generation.py
https://github.com/ajmaradiaga/cervantes-text-generation/blob/master/cervants_nn.py
I understand that the first line will initialize the embedding of the words by random distribution but will the embedding vectors further be trained in the model to give more accurate representation of the words (and change the initial random values to more accurate numbers) and if yes what is the actual method being used when there is no mention of any obvious embedding methods such as using word2vec and glove inside the code (or feeding the pre_tained vectors of these methods instead of random numbers in the beginning)?

Yes, those embeddings are trained further just like weights and biases otherwise representing words with some random values wouldn't make any sense. Those embeddings are updated while training like you would update a weight matrix, that is, by using optimization methods like Gradient Descent or Adam optimizer, etc.
When we use pre-trained embeddings like word2vec, they're already trained on very large datasets and are quite accurate representations of words already hence, they don't need any further training. If you are asking how those are trained, there are two main training algorithms that can be used to learn the embedding from the text; they are Continuous Bag of Words (CBOW) and Skip Grams. Explaining them completely is not possible here but I would suggest taking help from Google. This article might get you started.

Related

Why predicted values are towards center? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
It looks like most predicted values are close to 0.5. How can the predicted values follow closer the original values?
normalizer = layers.Normalization()
normalizer.adapt(np.array(X_train))
model = keras.Sequential([
normalizer,
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='linear'),
layers.Normalization()
])
There might be many issues here, but definitely you cannot normalize data at the output. You are literally saying "on average, I am expecting my output to be 0 and have unit variance". This makes sense iff your target is a standard, normalised Gaussian, but from the plot you can tell clearly it is not. Normalising inputs, or internal activations is fine, as there is always the final layer to apply final affine mapping. But if you do so at the end of the network, you are just making it impossible to learn most targets/signals.
Once this is solved, a network with 8 hidden neurons is extremely tiny and there is absolutely no guarantee it can learn anything, your training loss is very far from 0, you should make it much, much more expressive, and try to get your training to 0, if you can't do this - you have a bug somewhere else in the code (or the model is not expressive enough).

How do I get cross entropy of two distributions with Tensorflow? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In TensorFlow we have tf.nn.softmax_cross_entropy_with_logits which only allows you to use your predicted logits and the index of gold labels (one-hot). However, sometimes we want to compute the cross entropy of two distributions, i.e., the gold standard is not one-hot. How can I achieve this purpose?
Actually tf.nn.softmax_cross_entropy_with_logits does not impose the restriction that labels must be on-hot encoded, so you can go ahead and use non-one-hot label vectors. You might be confusing this with tf.nn.sparse_softmax_cross_entropy_with_logits which does impose this restriction.
To the other part of your question– if you want to compute the cross-entropy between two normalized distributions in tensors p and q, you can use the formula yourself if you make sure use tf.math.xlogy so that you get zero for x=0 and y=0. So, letting p and q be two tensors representing normalized distributions across axis 1 you would have–
ce = - tf.reduce_sum(tf.math.xlogy(p, q), axis=1)
On the other hand, its likey that you actually have some logits that are output by a model (rather than a normalized distribution q that is computed from the logits). In this case it would be better to compute the cross-entropy by applying log-softmax of your logits
ce = - tf.reduce_sum(p * tf.nn.log_softmax(logits, axis=1), axis=1)
(thereby avoiding numerical instability of explicitly computing a softmax distribution and then immediately taking it's log). In the typical ML setting p would be your "labels" and q & logits is the output of your model. Note that this works fine for non-one-hot labels p.

RNN with simultaneous POS tagging and sentiment classification? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm working on a problem where I need to perform simultaneous part-of-speech (POS) tagging and sentiment analysis. I'm using Tensorflow and am considering Keras.
I have a large data set of English sentences that have been labelled with both POS tags and with sentiment (negative, neutral, positive).
Is it possible to train a recurrent neural network (vanilla RNN, GRU, or LSTM) to learn both POS tagging and sentiment classification? Of course, during test time, I'd like to enter a sentence and have the RNN generate predictions for both the POS tags and the sentiment together.
I was thinking of the following RNN architecture. I'm not sure if it's possible with Tensorflow (which I've been using) or with Keras (which I'm just learning now). I've previously implemented RNNs that do one task, not two.
Thanks for any help.
A really simple Keras model that might work for POS-tagging might look like this:
from keras.layers import Dense, LSTM
from keras.models import Model, Sequential
model = Sequential()
model.add(
LSTM(
hidden_layer_size,
return_sequences=True,
input_shape=(seq_length, nb_words),
unroll=True
)
)
model.add(Dense(nb_pos_types, activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
Where I assume various parameters:
hidden_layer_size: whatever dimension of the internal recurrent layer.
seq_length: the input sequence length.
nb_words: vocabulary size, for the one-hot encoded inputs detailing which word for which sequence position.
nb_pos_types: number of different possible POS labels (for the one-hot encoded labels).
The goal is to modify a simple network like this so that it also makes a prediction about sentiment (not clear if your sentiment is a score or a category label, but I will assume a category label), and so that the loss function includes a penalty term for that sentiment prediction.
There are many ways to do this, but one common way is to "fork" a new spoke of the model off of some early layer, and have this spoke produce the additional prediction (often referred to as "multi-task" or "joint-task" learning).
To do this, we'll start off the same with Sequential, but rename it as base_model to make it clear that it serves as a base set of layers before branching for multiple tasks. Then we'll use Keras's functional syntax to do what we need with each branch before combining them together as multiple outputs of a final_model, in which we can express part of the overall loss function for each output.
Here's how we could modify the above example to do it:
base_model = Sequential()
base_model.add(
LSTM(
hidden_layer_size,
return_sequences=True,
input_shape=(seq_length, nb_words),
unroll=True
)
)
# Get a handle to the output of the recurrent layer.
rec_output = base_model.outputs[0]
# Create a layer representing the POS prediction.
pos_spoke = Dense(nb_pos_types, activation="softmax",
name="pos")(rec_output)
# Create a layer representing the sentiment prediction.
# I assume `nb_sentiments` is the number of sentiment categories.
sentiment_spoke = Dense(nb_sentiments, activation="softmax",
name="sentiment")(rec_output)
# Reunify into a single model which takes the same inputs as
# determined for `base_model`, and provides a list of 2 outputs,
# one for each spoke (POS and sentiment).
final_model = Model(inputs=base_model.inputs,
outputs=[pos_spoke, sentiment_spoke])
# Finally, use a dictionary for the loss function to specify the
# loss for each output, and optionally separate weights for when
# the losses are added as a weighted sum for the total loss.
final_model.compile(
optimizer='rmsprop',
loss={'pos': 'categorical_crossentropy',
'sentiment': 'categorical_crossentropy'},
loss_weights={'pos': 1.0, 'sentiment': 1.0}
)
And finally when calling final_model.fit, you'll supply a list for the labels, containing two tensors or arrays of labels, associated with each output.
You can read more about multi-output losses and architectures at the Keras docs on multi-input and multi-output models.
Finally, note that this is an exceedingly simple model (and would likely not perform well-- it's only meant for illustration). You could use the spokes we created, pos_spoke and sentiment_spoke to have additional layers with more sophisticated network topologies if you have particular POS-specific or sentiment-specific architectures.
Instead of defining them straight away as Dense, they could be additional recurrent layers, perhaps even convolutional, etc., followed by some eventual Dense layer whose variable name and layer name would be used for the appropriate places in the outputs and losses.
Also be aware of the use of return_sequences=True here. This allows for POS and sentiment prediction at each step in the sequence, even though you likely would only care about sentiment prediction at the end. One likely option would be to modify sentiment_spoke to operate only on the final sequence element from rec_output, or another (less likely) option would be to repeat the sentence's overall sentiment label for every word in the input sequence.

Initializing the weights of a layer from the output of another layer in tensorflow/keras [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to implement the paper "learning to segment everything" and I need to set the weights of a layer in the segmentation network using the output of a weight transfer function.
The output of the last layer in the weight transfer fetched using layer.output in Keras is of type 'tensorflow.python.framework.ops.Tensor' while the weights should be initialized as a numpy array. Any idea how I can set the weights?
From what i got from the paper, the weights should be connected to the output of this transform layer let's say it's X. So what you want isn't creating "weights" then initializing the weights with this output X using tf.assign or any other method as this will not be differentiable., what you want is to connect the output X directly to work as weights in this other graph.
The thing is you can't do this through Keras layers or even tf.layers as this high level api doesn't allow you control this, because as soon as you create a layer in tf.layers or keras it creates it's own weights and you don't want that, you want to use this output X as weights not creating a new weights. So what you can do is easily re-implement whatever layer you want by yourself and use X directly as weights in this layer this will allow the gradient to flow back through this X.
Weights are typically stored in Variables. tf.assign operation can be used to assign values (represented as Tensors) to variables. You can see some basic examples of using tf.assign in session tests. It name there is state_ops.assign().
Just be aware that, like other tensorflow operations, it does not update the value of the variable immediately (unless you are using eager execution). It returns a tensor, that when evaluated (e.g. via session.run()), will update the variable.
From your question, I suspect that you might not be 100% clear about tensorflow computation model. The Tensor type is a symbolic representation of some value that will be produced only when the computation is actually run (via session.run()). You can't really talk about "converting a Tensor to numpy array" because you can't really convert the "result of operation foo" to concrete floats. You have to run the computation to compute the "result of operation foo" to know the concrete numbers. tf.assign works in this symbolic space. When using it, you are saying, "whatever the value of this tensor (output of some layer) will be when I run the computation, assign it to this variable".

How to perform a multi label classification with tensorflow? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm new to tensorflow and would like to know if there is any tutorial or example of a multilabel classification with multiple network outputs.
I'm asking this because I have a collection of images, in which, each image can belong to several classes and my output needs to have a score of each class.
I also do not know if the tensorflow follows some file pattern of the images and the classes, so if someone has some example it would facilitate a lot.
Thank you.
You can also try transforming your problem from a multi-label to multi-class classification using a Label Powerset approach. Label Powerset transformation treats every label combination attested in the training set as a different class and constructs one instance of a multi-class clasifier - and after prediction converts the assigned classes back to multi-label case. It is provided in scikit-multilearn and scikit-compatibility wrapper over the tensorflow Estimator or via an input_fn or use skflow. Then just plug it into an instance of LabelPowerset.
The code could go as follows:
from skmultilearn.problem_transform import LabelPowerset
import tensorflow.contrib.learn as skflow
# assume data is loaded using
# and is available in X_train/X_test, y_train/y_test
# initialize LabelPowerset multi-label classifier
# with tensor flow DNN base classifier
classifier = LabelPowerset(skflow.TensorFlowDNNClassifier(OPTIONS))
# train
classifier.fit(X_train, y_train)
# predict
predictions = classifier.predict(X_test)
The most naive (and reasonable) approach would be to train a classification network, and remove the softmax layer and replace it with a vector of sigmoids. This way you can have multiple units with an activation of 1.
You can see on TF-slim examples for classification networks. Under the path datasets you will find examples on how to prepare the TFExample "file pattern" for images and classes
Most solutions refer to sigmoid loss, and sigmoid do solve multi-label classification well in my case by tf.nn.sigmoid_cross_entropy_with_logits(labels,logits) in tensorflow.
However, when I handled class unbalance problem, where negative cases is much more than positive cases, I found my edited softsign loss worked much better than sigmoid. The adjust coefficient gamma is added to label to lower negative class's gradient by 3/4.
def unbalance_softsign_loss(labels, logits):
gamma = 1.25 *labels - 0.25
res = 1 - tf.log1p( gamma*logits/(1+ tf.abs(logits)) )
return res
where labels is multi-hot encoding vectors like [0, 1, 0, 1, 0], logits ~ (-inf, inf)