Can .tflite capture tf.hub.text_embedding_column() processes? - tensorflow

Just a general question here, no reproducible example but thought this might be the right place anyway since its very software specific.
I am building a model which I want to convert to .tflite. It relies on tf.hub.text_embedding_collumn() for feature generation. When I convert to .tflite will this be captured such that the resulting model will take raw text as input rather than a sparse vector representation?
Would be good to know just generally before I invest too much time in this approach. Thanks in advance!

Currently I don't imagine this would work, as we do not support enough string ops to implement that. One approach would be to do this handling through a custom op, but implementing this custom op would require domain knowledge and mitigate the ease-of-use advance of using tf hub in the first place.
There is some interest in defining a set of hub operators that are verified to work well with tflite, but this is not yet ready.

Related

Is TensorFlow the way to go for this optimization problem?

I have to optimize the result of a process that depends on a large number of variables, i.e. a laser engraving system where the engraving depth depends on the laser speed, distance, power and so on.
The final objective is the minimization of the engraving time, or the maximization of the laser speed. All the other parameters can vary, but must stay within safe bounds.
I have never used any machine learning tools, but to my very limited knowledge this seems like a good use case for TensorFlow or any other machine learning library.
I would experimentally gather data points to train the algorithm, test it and then use a gradient descent optimizer to find the parameters (within bounds) that maximize the laser travel velocity.
Does this sound feasible? How would you approach such a problem? Can you link to any examples available online?
Thank you,
Riccardo
I’m not quite sure if I understood the problem correctly, would you add some example data and a desired output?
As far as I understood, It could be feasible to use TensorFlow, but I believe there are better solutions to that problem. Let me expand on this.
TensorFlow is a framework focused in the development of Deep Learning models. These usually require lots of data (the number really depends on the problem) but I don’t believe that just you manually gathering this data would be enough unless your team is quite big or already have some data gathered.
Also, as you have a minimization (or maximization) problem given variables that lay within a known range, I think this can be a case of Operations Research optimization instead of Machine Learning. Check this example of OR.

What Tensorflow API to use for Seq2Seq

This year Google produced 5 different packages for seq2seq:
seq2seq (claimed to be general purpose but
inactive)
nmt (active but supposed to be just
about NMT probably)
legacy_seq2seq
(clearly legacy)
contrib/seq2seq
(not complete probably)
tensor2tensor (similar purpose, also
active development)
Which package is actually worth to use for the implementation? It seems they are all different approaches but none of them stable enough.
I've had too a headache about some issue, which framework to choose? I want to implement OCR using Encoder-Decoder with attention. I've been trying to implement it using legacy_seq2seq (it was main library that time), but it was hard to understand all that process, for sure it should not be used any more.
https://github.com/google/seq2seq: for me it looks like trying to making a command line training script with not writing own code. If you want to learn Translation model, this should work but in other case it may not (like for my OCR), because there is not enough of documentation and too little number of users
https://github.com/tensorflow/tensor2tensor: this is very similar to above implementation but it is maintained and you can add more of own code for ex. reading own dataset. The basic usage is again Translation. But it also enable such task like Image Caption, which is nice. So if you want to try ready to use library and your problem is txt->txt or image->txt then you could try this. It should also work for OCR. I'm just not sure it there is enough documentation for each case (like using CNN at feature extractor)
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/seq2seq: apart from above, this is just pure library, which can be useful when you want to create a seq2seq by yourself using TF. It have a function to add Attention, Sequence Loss etc. In my case I chose that option as then I have much more freedom of choosing the each step of framework. I can choose CNN architecture, RNN cell type, Bi or Uni RNN, type of decoder etc. But then you will need to spend some time to get familiar with all the idea behind it.
https://github.com/tensorflow/nmt : another translation framework, based on tf.contrib.seq2seq library
From my perspective you have two option:
If you want to check the idea very fast and be sure that you are using very efficient code, use tensor2tensor library. It should help you to get early results or even very good final model.
If you want to make a research, not being sure how exactly the pipeline should look like or want to learn about idea of seq2seq, use library from tf.contrib.seq2seq.

HEVC Transformation Optimization

I have been working on the HEVC project. I have recently asked some details with regards to Intra Prediction and it is more or less clarified. I was reading a book suggested by someone and it gives the details of the Transformation algorithm implemented in HEVC. I know that it uses Partial Butterfly in order to process data. However, would it be possible to implement via a different approach like say matrix multiplication and still the HEVC stream would be generated with no faults. What my question is that, whether if I modify the processing method for Transformation Module will it affect the process flow of HEVC Encoder on the whole.
Yes, it is possible. I have implemented the Matrix Multiplication in place of Partial Butterfly for my solution. It works fine, the output looks good. I have confirmed that the PSNR SSIM and other parameters are good enough to confirm this for you.
The transform has to comply with the HEVC standard, but how you implement it exactly is up to you.
It just implementation issue, Matrix Multiplication can optimization by using other method like SIMD.

Exporting Tensorflow Models for Eigen Only Environments

Has anyone seen any work done on this? I'd think this would be a reasonably common use-case. Train model in python, export the graph and map to a sequence of eigen instructions?
I don't believe anything like this is available, but it is definitely something that would be useful. There are some obstacles to overcome though:
Not all operations are implemented by Eigen.
We'd need to know how to generate code for all operations we want to support.
The glue code to allocate buffers and schedule work can get pretty gnarly.
It's still a good idea though, and it might get more attention posted as a feature request on https://github.com/tensorflow/tensorflow/issues/

How do you find the most discriminant terms in binary document classification?

I want to use feature selection to find the terms in a document that are most useful for a binary classification task.
I've been looking around:
This mentions Mutual Information and the chi-squared test metric
http://nlp.stanford.edu/IR-book/html/htmledition/feature-selection-1.html
MATLAB has a number of functions as well:
http://www.mathworks.com/help/toolbox/stats/brj0qbu.html
Feature Selection in MATLAB
Of the above, relieff and rankfeatures look promising.
I do not know if my data follows a normal distribution. Any thoughts on which technique performs the best? Are there any newer methods you would suggest? The focus is to increase classification accuracy.
Thank you!
Since the answer is highly dependent on the nature of your data, I'd suggest playing with several options, possibly using a hold-out set for verification.
The easiest path would probably be to use Weka or RapidMiner for experimenting. Choosing from the plethora of options provided by them, you'll probably get acquainted with several other methods.
Having said that, I have found Mutual Information/Infogain to be useful on a large variety of problems.