I'm using Weights and Biases to track my deep learning models. To monitor everything I use the WandbCallback in .fit.
In the WandbCallback documentation there is the keyword save_graph which defaults to True. The description is very brief and I wondered what that saved graph is and what it's for? Is saving a graph a costly operation? For what is it needed? (like does it complement something else, like saving the best model?)
That is used to create log a wandb.Graph of the model. This class is typically used for saving and diplaying neural net models. It represents the graph as an array of nodes and edges. The nodes can have labels that can be visualized by wandb. Here's an example of the graph that it produces: https://wandb.ai/l2k2/keras-examples/runs/ieqy2e9h/model
Here's the code that does that within the callback. https://github.com/wandb/client/blob/1609f82c84e2244ed8fe62c746099d2094bd746a/wandb/integration/keras/keras.py#L552
Related
I am using the example "stateful_clients" in tensorflow-federated examples. I want to use my pretrained model weights to initialize the model. I use the function model.load_weights(init_weight). But it seems that it doesn't work. The validation accuracy in the first round is still low. How can I solve the problem?
def tff_model_fn():
"""Constructs a fully initialized model for use in federated averaging."""
keras_model = get_five_layers_cnn([28, 28, 1])
keras_model.load_weights(init_weight)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
return stateful_fedavg_tf.KerasModelWrapper(keras_model,
test_data.element_spec, loss)
A quick primer on state and model weights in TFF
TFF takes a distinct perspective on state in machine learning, generally a consequence of its desire to be purely functional.
Usually in machine learning, a model is conceptually a function which takes data and produces a prediction. However, this notion is a little overloaded at times; does 'model' refer to a trained model (fitting the specification above), or an architecture which is parameterized by its parameters, and therefore needs to accept these parameters as an argument to be considered truly a 'function'? A conception somewhat in the middle is that of a 'stateful function', which I think tends to be what people intend to refer to when they use the term 'model'.
TFF standardizes on the latter understanding. For TFF, a 'model' is a function which accepts parameters along with data as an argument, producing a prediction. This is generally to avoid the notion of a stateful function, which is disallowed by a purely functional perspective (f(x) == f(x) should always be true, so f cannot have any state which affects its output).
On the code in question
I'm not super familiar with this portion of the TFF codebase; in particular I'm a little surprised at the behavior of the keras model wrapper, as usually TFF wants to serialize all logic into TFF-defined data structures as soon as possible (at least, this is how I think about it). Glancing at the code, it looks to me like it could work--but there have been exciting interactions between TFF and Keras in the past.
Briefly, here is how this path should be working:
The model function you define above is invoked while building the initialize computation, in a graph context; the logic to load weights (or assignment of the weights themselves, baked into the graph as a constant) would hopefully be serialized into the graph that TFF generates to represent initialize.
Upon calling iterative_process.initialize, you would find your desired weights populated in the appropriate attributes of the returned data structure. This would serve as your initial starting point for your iterative process, and you would be off to the races.
What I am suspicious of in the above is 1. TFF will silently invoke your model_fn in a TensorFlow graph context, resulting in non program-order semantics; if there is no control dependency between the assignment and the return value of your function (which there isn't in the code above, and in fact it is not obvious how to force this), the assignment may be skipped at initialize time. Therefore the state returned from initialize won't have your specified weights.
If this suspicion is true, the appropriate solution is to run this to run the weight loading logic directly in Python. TFF provides some utilities to help with this kind of thing, like tff.learning.state_with_new_model_weights. This would be used like:
state = iterative_process.initialize()
weights = tf.keras.load_weights(...) # No idea if this call is correct, probably not.
state_with_loaded_weights = tff.learning.state_with_new_model_weights(state, weights)
...
# continue on using state in the iterative process
I am trying to code a simple Neural machine translation using tensorflow. But I am a little stuck regarding the understanding of the embedding on tensorflow :
I do not understand the difference between tf.contrib.layers.embed_sequence(inputs, vocab_size=target_vocab_size,embed_dim=decoding_embedding_size)
and
dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)
In which case should I use one to another ?
The second thing I do not understand is about tf.contrib.seq2seq.TrainingHelper and tf.contrib.seq2seq.GreedyEmbeddingHelper. I know that in the case of translation, we use mainly TrainingHelper for the training step (use the previous target to predict the next target) and GreedyEmbeddingHelper for the inference step (use the previous timestep to predict the next target).
But I do not understand how does it work. In particular the different parameters used. For example why do we need a sequence length in the case of TrainingHelper (why do we not used an EOS)? Why both of them do not use the embedding_lookup or embedding_sequence as input ?
I suppose that you're coming from this seq2seq tutorial. Even though this question is starting to get old, I'll try to answer for the people passing by like me:
For the first question, I looked at the source code behind tf.contrib.layers.embed_sequence, and it is actually using tf.nn.embedding_lookup. So it just wraps it, and creates the embedding matrix (tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))) for you. Although this is convenient and less verbose, by using embed_sequence there doesn't seem to a direct way to access the embeddings. So if you want to, you have to query for the internal variable used as the embedding matrix by using the same name space. I have to admit that the code in the tutorial above is confusing. I even suspect he's using different embeddings in the encoder and the decoder.
For the second question:
I guess it is equivalent to use a sequence length or an embedding.
The TrainingHelper doesn't need the embedding_lookup as it only forwards the inputs to the decoder, GreedyEmbeddingHelper does take as a first input the embedding_lookup as mentioned in the documentation.
If I understand you correctly, the first question is about the differences between tf.contrib.layers.embed_sequence and tf.nn.embedding_lookup.
According to the official docs (https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence),
Typical use case would be reusing embeddings between an encoder and decoder.
I think tf.contrib.layers.embed_sequence is designed for seq2seq models.
I found the following post:
https://github.com/tensorflow/tensorflow/issues/17417
where #ispirmustafa mentioned:
embedding_lookup doesn't support invalid ids.
Also, in another post: tf.contrib.layers.embed_sequence() is for what?
#user1930402 said:
When building a neural network model that has multiple gates that take features as input, by using tensorflow.contrib.layers.embed_sequence, you can reduce the number of parameters in your network while preserving depth. For example, it eliminates the need for each gates of the LSTM to perform its own linear projection of features.
It allows for arbitrary input shapes, which helps the implementation be simple and flexible.
For the second question, sorry that I didn't use TrainingHelper and can't answer your question.
I was going through the tensorflow-model-analysis documentation evaluating TensorFlow models. The getting started guide talks about a special SavedModel called the EvalSavedModel.
Quoting the getting started guide:
This EvalSavedModel contains additional information which allows TFMA
to compute the same evaluation metrics defined in your model in a
distributed manner over a large amount of data, and user-defined
slices.
My question is how can I convert an already existing saved_model.pb to an EvalSavedModel?
EvalSavedModel is exported as SavedModel message, thus there is no need in such conversion.
EvalSavedModel uses SavedModelBuilder under the hood. It populates the estimator graph with several placeholders, creates some additional metric collections. Later on, it performs simple SavedModelBuilder procedure.
Source - https://github.com/tensorflow/model-analysis/blob/master/tensorflow_model_analysis/eval_saved_model/export.py#L228
P.S. I suppose you want to run model-analysis on your model, exported by SavedModelBuilder. Since SavedModel doesn't have neither metric nodes nor related collections, which are created in EvalSavedModel, it's useless to do so - model-analysis just simply couldn't find any metric related to your estimator.
If I understand your question correctly, you have saved_model.pb generated, either by using tf.saved_model.simple_save or tf.saved_model.builder.SavedModelBuilderor by estimator.export_savedmodel.
If my understanding is correct, then, you are exporting Training and Inference Graphs to saved_model.pb.
The Point you mentioned from the Guide of TF Org Website states that, in addition to Exporting Training Graph, we need to Export Evaluation Graph as well. That is called EvalSavedModel.
The Evaluation Graph comprises the Metrics for that Model, so that you can Evaluate the Model's performance using Visualizations.
Before we Export EvalSaved Model, we should prepare eval_input_receiver_fn, similar to serving_input_receiver_fn.
We can mention other functionalities as well, like, if you want the Metrics to be defined in a Distributed Manner or if we want to Evaluate our Model using Slices of Data, rather than the Entire Dataset. Such Options can be mentioned in eval_input_receiver_fn.
Then we can Export the EvalSavedModel using the Code below:
tfma.export.export_eval_savedmodel(estimator=estimator,export_dir_base=export_dir,
eval_input_receiver_fn=eval_input_receiver_fn)
I have looked around everywhere but could not find the way to do this.
Basically I want to feed input to some intermediate layer in a keras model and want to the backpropagation for the full graph (i.e. including layer before the intermediate layer). To understand this I refer you to the figure as mentioned in the paper "Multi-view Convolutional Neural Networks for 3D Shape Recognition".
From the figure you can see that the feature are maxpooled in view pooling layer and then the resultant vector is passed to the rest of the network.
From the paper they further did he back propagation using the view pooling features.
To achieve this I am trying a simple approach. There will not be any viewpooling layer in my model. This pooling I will do offline by taking the features for multiple views and then taking the max of it. Finally the aggregated feature will be passed to rest of the network. However I am not able to figure out how to do the back propagation to the full network by passing input to intermediate layer directly.
Any help would be appreciated. Thanks
If you have the code of the tensorflow model, then this will be quite simple. The model would probably look like
def model( cnns ):
viewpool_output = f(cnns)
cnn2_output = cnn2( viewpool_output )
...
You would just need to change the model to
def model( viewpool_output ):
cnn2_output = cnn2( viewpool_output )
...
and instead of passing a "real" view pool output, you just pass whatever image you want. But you haven't given any code, so we can only guess at what it looks like.
I am looking for a proper or best way to get variable importance in a Neural Network created with Keras. The way I currently do it is I just take the weights (not the biases) of the variables in the first layer with the assumption that more important variables will have higher weights in the first layer. Is there another/better way of doing it?
Since everything will be mixed up along the network, the first layer alone can't tell you about the importance of each variable. The following layers can also increase or decrease their importance, and even make one variable affect the importance of another variable. Every single neuron in the first layer itself will give each variable a different importance too, so it's not something that straightforward.
I suggest you do model.predict(inputs) using inputs containing arrays of zeros, making only the variable you want to study be 1 in the input.
That way, you see the result for each variable alone. Even though, this will still not help you with the cases where one variable increases the importance of another variable.
*Edited to include relevant code to implement permutation importance.
I answered a similar question at Feature Importance Chart in neural network using Keras in Python. It does implement what Teque5 mentioned above, namely shuffling the variable among your sample or permutation importance using the ELI5 package.
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
import eli5
from eli5.sklearn import PermutationImportance
def base_model():
model = Sequential()
...
return model
X = ...
y = ...
my_model = KerasRegressor(build_fn=basemodel, **sk_params)
my_model.fit(X,y)
perm = PermutationImportance(my_model, random_state=1).fit(X,y)
eli5.show_weights(perm, feature_names = X.columns.tolist())
It is not that simple. For example, in later stages the variable could be reduced to 0.
I'd have a look at LIME (Local Interpretable Model-Agnostic Explanations). The basic idea is to set some inputs to zero, pass it through the model and see if the result is similar. If yes, then that variable might not be that important. But there is more about it and if you want to know it, then you should read the paper.
See marcotcr/lime on GitHub.
This is a relatively old post with relatively old answers, so I would like to offer another suggestion of using SHAP to determine feature importance for your Keras models. SHAP also allows you to process Keras models using layers requiring 3d input like LSTM and GRU while eli5 cannot.
To avoid double-posting, I would like to offer my answer to a similar question on Stackoverflow on using SHAP.