Is it possible to get the semantics of an unbounded arc in Tensorflow without directly enqueuing in the op itself?
For example, if I want to write an operation on that takes a scalar string my_string and "emits" tuples of ("string", num_occurrences_in_my_string), I have to resort to either of the following output options (as far as I know):
return the values necessary to construct a sparse Tensor
take a queue reference (of the correct type) and directly enqueue the input myself (like the tf.TextLineReader does)
As far as I can tell from the paper from Google on the Tensorflow "programming language", these are the only ways to accomplish it.
Is there a way in Tensorflow to emit an arbitrary number of output "rounds" per a given input besides the aforementioned workarounds?
Related
I use keras fit() method with custom metrics passed to model.
The metrics are stateful - i.e. are a subclass of a Metric, as described in https://keras.io/api/metrics/#as-subclasses-of-metric-stateful
When I run the code in a multi-gpu environment using a tf.distribute.MirroredStrategy() my metric code is called on every GPU separately with batch_size/no_of_gpus examples passed, which is reasonable to expect.
What happens next is that multiple scalars (one from every GPU) of the metric value need to be reduced to a single scalar, and what I get all the time is a sum reduction, while I would like to control that.
Keep in mind, that reduction parameter is the one of Loss in keras, and there is no such thing in the Metric class: https://github.com/tensorflow/tensorflow/blob/acbc065f8eb2ed05c7ab5c42b5c5bd6abdd2f91f/tensorflow/python/keras/metrics.py#L87
(the only crazy thing I tried was to inherit from a Mean class that is a subclass of a Metric but that didn't change anything)
reduction is mentioned in the metrics code, however this is a reduction over multiple accumulated values in a single metric object, and in multi-gpu setting - this is not the case, as every metric works in its own GPU and is somehow aggregated at the end.
The way I debugged it to understand this behaviour was - I was printing the shapes and the results inside update_state method of the metric. And then I looked at value of the metric in logs object in on_batch_end callback.
I tried looking at TF code, but couldn't find the place this is happening.
I would like to be able to control this behaviour - so either pick 'mean' or 'sum' for the metric, or at least know where it is being done in the code.
Edited: I guess this https://github.com/tensorflow/tensorflow/issues/39268 sheds some more light on this issue
I am facing the same problem as you (and that's why I found your question).
Seeing that it's been 15 days since you asked the question and there are no answers/comments yet, I thought I might share my temporary workaround.
Like you, I also think that a SUM reduction has been performed when combining progress over multiple GPUs. What I did is to pass the number of GPUs (e.g. given by the num_replicas_in_sync attribute of your tf.distribute strategy object) into the __init__(...) constructor of your sub-classed metric object, and use it to divide the return value in the results() method.
Potentially, you could also use tf.distribute.get_strategy() from within the metric object to make it "strategy aware", and use the information to decide how to modify the values in an ad hoc manner so that the SUM reduction will produce what you want.
I hope this helps for now, whether as a suggestion or as a confirmation that you're not alone on this.
When implementing the subclass of the Keras Metric class, you have to override the merge_state() function correctly. If you do not override this function, the default implementation will be used - which is a simple sum.
See: https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Metric
Some raw operations use InputLists, not (only) simple Inputs. I want to add a Placeholder to my Graph, and during TF_SessionRun add the actual array of tensors. I have two problems with it:
TF_SessionRun does not talk about InputList, it only knows Inputs. I assume (correct me if I am wrong), that from a TF_Session point of view, an InputList is just an Input (giving the first element of the array).
I cannot solve to have a Placeholder in the Graph. Defining Placeholder requires to give its data type, but in an InputList every Tensor can have its own data type.
I am looking either for a data type "DT_List" or similar indicating that the given Placeholder is a list of different tensors, OR looking for another raw operations, called "ListPlaceholder" or similar, to cater for this purpose.
How shall it be done?
P.S. Imagine raw operation Save. It's third parameter is an InputList of Tensors to save. I made a Graph that works well for a single Tensor, but I cannot solve it for multiple ones in one go.
It seems after a lot of checking, that I incorrectly guessed that there is (or should be) such a thing as an InputList input. The inputs to Session.Run are always single Tensors and as such no "Placeholder for list" exists. In the mentioned "Save" elementary operation, the "data" parameter - as guessed - has to be added using TF_AddInputList, but the list of TF_Outputs in its parameter list has to be assembled from individual TF_Output elements and cannot be retrieved as one TF_OutputList from a "Placeholder" like node.
If my conclusion is wrong, please correct me.
I have 4 (or more) models (same structure but different training data). Now I want to ensemble them to make a prediction. I want to pre-load the models and then predict one input message (one message at a time) in parallel via multiprocess. However, the program always stops at "session.run" step. I could not figure it out why.
I tried passing all arguments to the function in each process, as shown in the code below. I also tried using a Queue object and put all the data (except the model object) in the queue. I also tried to set the number of process to 1. It made no difference.
with Manager() as manager:
first_level_test_features=manager.list()
procs =[]
for id in range(4):
p = Process(target=predict, args=(id, (message, models, configs, vocabs, emoji_dict,first_level_test_features)))
procs.append(p)
p.start()
for p in procs:
p.join()
I did not get any error message since it is just stuck there. I would expect the program can start multiple processes and each process uses the model pass to it to make the prediction.
I am unsure how session sharing along different Processes would work, and this is probably where your issue comes from. Given the way TensorFlow works, I would advise implementing the ensemble call as a graph operation, so that it can be run through a single session.run call, with TF handling the parallelization of computations wherever possible.
In practice, if you have symbolic tensors representing the models' predictions, you could use a TF operation to aggregate them (tf.concat, tf.reduce_mean, tf.add_n... whichever suits your design) and end up with a single symbolic tensor representing the ensemble prediction.
I hope this helps; if not, please provide some more details as to what your setting is, notably which form your models have.
I'm working on Tensorflow with GPU. I was curious about data format of tensor. I thought that data are stored in GPU/CPU in Row Major.
However, if I want to store the data in column-major in one operation(Op), can I change the data format only for that operation(Op)? (ex. put some options in function indicates change the format of data)
For instance in matmul operation, there exist options related to transpose. Is there any change in data format (Column Major / Row Major) if I transpose the matrix?
Thanks.
Yes, the default data format is row-major which is opposite to Eigen.
If you are using Python, then you will need to transpose your data when simulating a col-major layout. When using C++ nothing prevents you from employing Eigen::RowMajor instead.
The matmul has options transpose_a and transpose_b as (cu-)BLAS can handle both formats without explicit transpose, e.g. see GEMM. So it will not change your data format. It is only a trick, to prevent additionals launch of CUDA kernels or other functions beforehand to minimize run-time.
It is part of the BLAS specification, e.g. see LAPACK
I am having some problems understanding how the Baum-Welch algorithm exactly works. I read that it adjusts the parameters of the HMM (the transition and the emission probabilities) in order to maximize the probability that my observation sequence may be seen by the given model.
However, what does happen if I have multiple observation sequences? I want to train my HMM against a huge lot of observations (and I think this is what is usually done).
ghmm for example can take both a single observation sequence and a full set of observations for the baumWelch method.
Does it work the same in both situations? Or does the algorithm have to know all observations at the same time?
In Rabiner's paper, the parameters of GMMs (weights, means and covariances) are re-estimated in the Baum-Welch algorithm using these equations:
These are just for the single observation sequence case. In the multiple case, the numerators and denominators are just summed over all observation sequences, and then divided to get the parameters. (this can be done since they simply represent occupation counts, see pg. 273 of the paper)
So it's not required to know all observation sequences during an invocation of the algorithm. As an example, the HERest tool in HTK has a mechanism that allows splitting up the training data amongst multiple machines. Each machine computes the numerators and denominators and dumps them to a file. In the end, a single machine reads these files, sums up the numerators and denominators and divides them to get the result. See pg. 129 of the HTK book v3.4