Get a sentence encoding from word encoding using LSTM using Keras - tensorflow

I have a word encoding matrix of size 1*300 and wish to get one sentence encoding (possibly 1*300) using all the words in the sentences. I am trying to use bi-directional LSTM using Keras
Is there a good solution for it?
Is there another way we should go about it?

Related

Sentence classification focused on specified word

I have passed some online courses on the sentence classification problems using TensorFlow.
But I dont understand how to start the following problem.
I am interested in the biniry classification of sentences based on the meaning of specified word. This word can have two meaning. And I want to train model which will classify it.
I have a training date. All sentences contain this word. There is labels 0 or 1 for each sentence.
Do I need a neural network for this or it can be done unisg nltk library?
How to implement such project? I have learn about word embedding. But have not idea how to use it in this project.
Where I can read about it?

How to do tokenization from a predifined vocab in tensorflow or pytorch or keras?

I have a predefined vocab which build from the common-used 3500 Chinese characters. Now I want to tokenize the Dataset with this vocab to fix each character. Any mature class or function exists I can inherit from to buid the data reading pipline?
Rather than go through the how to details here I suggest you go to a tutorial on YouTube located here.. The author demonstrates how to use the tokenizer to encode text characters into sequences which can then be used as input to an embedding layer. The part you will be interested in starts at time 23:30 of the video

How to use Transformers for text classification?

I have two questions about how to use Tensorflow implementation of the Transformers for text classifications.
First, it seems people mostly used only the encoder layer to do the text classification task. However, encoder layer generates one prediction for each input word. Based on my understanding of transformers, the input to the encoder each time is one word from the input sentence. Then, the attention weights and the output is calculated using the current input word. And we can repeat this process for all of the words in the input sentence. As a result we'll end up with pairs of (attention weights, outputs) for each word in the input sentence. Is that correct? Then how would you use this pairs to perform a text classification?
Second, based on the Tensorflow implementation of transformer here, they embed the whole input sentence to one vector and feed a batch of these vectors to the Transformer. However, I expected the input to be a batch of words instead of sentences based on what I've learned from The Illustrated Transformer
Thank you!
There are two approaches, you can take:
Just average the states you get from the encoder;
Prepend a special token [CLS] (or whatever you like to call it) and use the hidden state for the special token as input to your classifier.
The second approach is used by BERT. When pre-training, the hidden state corresponding to this special token is used for predicting whether two sentences are consecutive. In the downstream tasks, it is also used for sentence classification. However, my experience is that sometimes, averaging the hidden states give a better result.
Instead of training a Transformer model from scratch, it is probably more convenient to use (and eventually finetune) a pre-trained model (BERT, XLNet, DistilBERT, ...) from the transformers package. It has pre-trained models ready to use in PyTorch and TensorFlow 2.0.
The Transformers are designed to take the whole input sentence at once. The main motive for designing a transformer was to enable parallel processing of the words in the sentences. This parallel processing is not possible in LSTMs or RNNs or GRUs as they take words of the input sentence as input one by one.
So in the encoder part of the transformers, the very first layer contains the number of units equal to the number of words in a sentence and then each unit converts that word into an embedding vector corresponding to that word. Further, the rest of the processes are carried out. For more details, you can go through the article: http://jalammar.github.io/illustrated-transformer/
How to use this transformer for text classification - Since in text classification our output is a single number not a sequence of numbers or vectors so we can remove the decoder part and just use the encoder part. The output of the encoder is a set of vectors, the same in number as the number of words in the input sentence. Further, we can feed these sets of output vectors into a CNN, or we can add an LSTM or RNN model and perform classification.
The input is the whole sentence or batch of sentences not word by word. Surely you would have misunderstood it.

How to use fasttext vectors in a tensorflow embedding layer?

I just struggle to find out how I can use Fasttext wordvectors for OOV words in a keras/tensorflow embedding layer. There is nothing out there. Maybe someone has thought of that too and has some hints for me?
The way via word embedding look up works via indices like
tf.nn.embedding_lookup(word_embeddings, x)
And you could have an index for one OOV. But how can I assign a specific vector (from a different and custom source like fasttext) at runtime?
I imagine a function which can customly assign a vector to the UNK index for a OOV word.
Related to that:
Assign custom word vector to UNK token during prediction?
Using subword information in OOV token from fasttext in word embedding layer (keras/tensorflow)
You can do the embedding lookup \ computation outside of tensorflow, and use the embedded text as input to the model (so the input wont be a sequence of word indices, but a sequence of vectors)

Word2vec classification and clustering tensorflow

I am trying to cluster some sentences using similarity (maybe cosine) and then maybe use a classifier to put text in predefined classes.
My idea is to use tensorflow to generate the word embedding then average them for each sentence. Next use a clustering/classification algorithm.
Does tensorflow provide ready to use word2vec generation algorithm?
Would a bag of words model generate a good output?
No, tensorflow does not provide a ready-to-use word2vec but it does have a tutorial on word2vec.
Yes, a bag of words can generate surprisingly good output (but not state-of-the-art), and has the benefit of being amazingly faster. I have a small amount of data (tens of thousands of sentences) and have achieved F1 scores of >0.90 for classification.