Tensorflow textsum model- different source and target vocabs

Tensorflow textsum model- different source and target vocabs - tensorflow

I want to use the textsum model for tagging named entities. Hence the target size vocab is very small. While training there doesn't seem to be an option to provide different vocabs on the encoder and on the decoder side-or is there?
See code lines on Github
if hps.mode == 'train':
model = seq2seq_attention_model.Seq2SeqAttentionModel(hps, vocab, num_gpus=FLAGS.num_gpus)

Hrishikesh I don't believe there is a way to provide separate vocab files, but not fully understanding why you need it. The vocab simply provides a numerical way of representing a word. Therefore when the model is working with these words, it uses the numerical representations of them. Once the hypothesis is complete and the statistical choices for words have been chosen, it then simply uses the vocab file to convert back the vocab index to it's associated word. I hope this helps answer your question and solidify why you shouldn't need to have separate vocab files. That said, I may jsut be misunderstanding your need for it and I apologize if that is the case.

No there is no out-of-the-box option to use the textsum in this way. I don't see any reason why it shouldn't be possible to modify the architecture to achieve this, though. Would be interested if you pointed towards some literature on using seq2seq w/attention models for NER

Related

Using dynamically generated data with keras

I'm training a neural network using keras but I'm not sure how to feed the training data into the model in the way that I want.
My training data set is effectively infinite, I have some code to generate training examples as needed, so I just want to pipe a continuous stream of novel data into the network. keras seems to want me to specify my entire dataset in advance by creating a numpy array with everything in it, but this obviously wont work with my approach.
I've experimented with creating a generator class based on keras.utils.Sequence which seems like a better fit, but it still requires me to specify a length via the __len__ method which makes me think it will only create that many examples before recycling them. Can someone suggest a better approach?

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks. However, I'm not sure which one should I pick to start with. Which of these recent models in NLP such as original Transformer model, Bert, GPT2, XLNet would you use to start with? And why? I'd rather to implement in Tensorflow, but I'm flexible to go for PyTorch too.
Thanks!

It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Don’t start off trying to design and build the perfect system.
Instead, build and train a basic system quickly—perhaps in just a few
days
According to this suggestion, you can start with a simpler model such as ULMFiT as a baseline, verify your ideas and then move on to more complex models and see how they can improve your results.
Note that modern NLP models contain a large number of parameters and it is difficult to train them from scratch without a large dataset. That's why you may want to use transfer learning: you can download pre-trained model and use it as a basis and fine-tune it to your task-specific dataset to achieve better performance and reduce training time.

I agree with Max's answer, but if the constraint is to use a state of the art large pretrained model, there is a really easy way to do this. The library by HuggingFace called pytorch-transformers. Whether you chose BERT, XLNet, or whatever, they're easy to swap out. Here is a detailed tutorial on using that library for text classification.
EDIT: I just came across this repo, pytorch-transformers-classification (Apache 2.0 license), which is a tool for doing exactly what you want.

Well like others mentioned, it depends on the dataset and multiple models should be tried and best one must be chosen.
However, sharing my experience, XLNet beats all other models so far by a good margin. Hence if learning is not the objective, i would simple start with XLNET and then try a few more down the line and conclude. It just saves time in exploring.
Below repo is excellent to do all this quickly. Kudos to them.
https://github.com/microsoft/nlp-recipes
It uses hugging face transformers and makes them dead simple. 😃

I have used XLNet, BERT, and GPT2 for summarization tasks (English only). Based on my experience, GPT2 works the best among all 3 on short paragraph-size notes, while BERT performs better for longer texts (up to 2-3 pages). You can use XLNet as a benchmark.

preprocess data sets for Tensorflow highlevel estimators

I'm coming from a Scikit Learn background.
I'm having difficulty understanding how to preprocess data sets for Tensorflow.
I'm trying to implement svm with the iris data set.
If I have two numpy arrays, one containing a list of the features, and the other containing the list of the labels, which functions would I use to create the classifier?
estimator = SVM(
example_id_column='example_id',
feature_columns=[real_feature_column, sparse_feature_column],
l2_regularization=10.0)
I'm assuming the example_id_column would be
example_id_column = '0,1,2'
I'm not sure about how to attain the feature_columns

I think the most effective way is using the TFRecords files. There's a comprehensive tutorial available that's still mostly relevant, too. This also has the advantage of letting you define a lot more of your pipeline as part of the graph, being able to do concurrent reads from the source files, and not needing to fit your dataset in memory. It's definitely worth the effort.

Looking for clarification on "running" Tensorflow models

from Tensorflow's documentation, there seems to be a large array of options for "running", serving, testing, and predicting using a Tensorflow model. I've made a model very similar to MNIST, where it outputs a distribution from an image. For a beginner, what would be the easiest way to take one or a few images, and send them through the model, getting an output prediction? It is mostly for experimentation purposes. Sorry if this is too redundant, but all my research has led me to so many different ways of doing this and the documentation doesn't really give any info on the pros and cons of the different methods. Thanks

I guess you are using placeholders for your model input and then using feed_dict to feed values into your model.
If that's the case the simplest way would be after you have a trained model you save it using tf.saver. Then you can have a test script where you restore your model and then sess.run on your output variable with a feed_dict of whatever you want your input to be.

TensorFlow RNNs for named entity recognition

I'm trying to work out what's the best model to adapt for an open named entity recognition problem (biology/chemistry, so no dictionary of entities exists but they have to be identified by context).
Currently my best guess is to adapt Syntaxnet so that instead of tagging words as N, V, ADJ etc, it learns to tag as BEGINNING, INSIDE, OUT (IOB notation).
However I am not sure which of these approaches is the best?
Syntaxnet
word2vec
seq2seq (I think this is not the right one as I need it to learn on two aligned sequences, whereas seq2seq is designed for sequences of differing lengths as in translation)
Would be grateful for a pointer to the right method! thanks!

Syntaxnet can be used to for named entity recognition, e.g. see: Named Entity Recognition with Syntaxnet
word2vec alone isn't very effective for named entity recognition. I don't think seq2seq is commonly used either for that task.
As drpng mentions, you may want to look at tensorflow/tree/master/tensorflow/contrib/crf. Adding an LSTM before the CRF layer would help a bit, which gives something like:
LSTM+CRF code in TensorFlow: https://github.com/Franck-Dernoncourt/NeuroNER

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Tensorflow textsum model- different source and target vocabs - tensorflow

No there is no out-of-the-box option to use the textsum in this way. I don't see any reason why it shouldn't be possible to modify the architecture to achieve this, though. Would be interested if you pointed towards some literature on using seq2seq w/attention models for NER

Related

Using dynamically generated data with keras

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

preprocess data sets for Tensorflow highlevel estimators

Looking for clarification on "running" Tensorflow models

TensorFlow RNNs for named entity recognition

Categories

Resources