Tensorflow Adam Optimizer state not updating ( get_config ) - tensorflow

I am using optimizer.get_config() to get the final state of my adam optimizer (as in https://stackoverflow.com/a/60077159/607528) however .get_config() is returning the initial state. I assume this means one of the following
.get_config() is supposed to return the initial state
my optimizer is not updating because I've set something up wrong
my optimizer is not updating tf's adam is broken (highly unlikely)
my optimizer is updating but is being reset somewhere before I call .get_config()
something else?
Of course I originally noticed the issue in a proper project with training and validation sets etc, but here is a really simple snippet that seems to reproduce the issue:
import tensorflow as tf
import numpy as np
x=np.random.rand(100)
y=(x*3).round()
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x, y, epochs=500)
model.evaluate(x, y)
model.optimizer.get_config()

If you want to restore your training - you should save optimizer weights, not config:
weight_values = optimizer.get_weights()
with open(self.output_path+'optimizer.pkl', 'wb') as f:
pickle.dump(weight_values, f)
And then load them:
model.fit(dummy_x, dummy_y, epochs=500) # build optimizer by fitting model with dummy input - e.g. random tensors with simpliest shape
with open(self.path_to_saved_model+'optimizer.pkl', 'rb') as f:
weight_values = pickle.load(f)
optimizer.set_weights(weight_values)

Related

How to visualize graph without training the model using Tensorboard?

I'm trying to visualize the model in Tensorboard without training.
I checked this and that, but this still doesn't work even for the simplest model.
import tensorflow as tf
import tensorflow.keras as keras
# Both tf.__version__ tensorboard.__version__ are 2.5.0
s_model = keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation='softmax')
])
logdir = '.../logs'
_callbacks = keras.callbacks.TensorBoard(log_dir=logdir)
_callbacks.set_model(s_model) # This is exactly suggested in the link
When I did the above, I get the error message:
Graph visualization failed.
Error: Malformed GraphDef. This can sometimes be caused by a bad
network connection or difficulty reconciling mulitple GraphDefs; for
the latter case, please refer to
https://github.com/tensorflow/tensorboard/issues/1929.
I don't think this is a reconciliation problem because it is not a custom function, and if I compile the model, train, then I can get the graph visualization I wanted.
s_model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
(train_images, train_labels), _ = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0
logdir = '.../logs'
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
s_model.fit(
train_images,
train_labels,
batch_size=64,
epochs=5,
callbacks=[tensorboard_callback])
This gives the wanted graph visualization. But is there any other way to get graph visualization in Tensorboard without training?
Of course, I'm also aware that workaround, i.e. train with the tf.random.normal() for a while, would do the trick but I'm looking for the neat way like _callbacks.set_model(s_model)...

Different results when using Manual KFold-Cross validation vs. KerasClassifier-KFold Cross Validation

I've been struggling to understand why two similar Kfold-cross validations result in two different averages.
When I use a manual KFold approach (with Tensorflow and Keras)
cvscores = []
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=3)
for train, test in kfold.split(X, y):
model = create_baseline()
model.fit(X[train], y[train], epochs=50, batch_size=32, verbose=0)
scores = model.evaluate(X[test], y[test], verbose=0)
#print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))
I get
65.89% (+/- 3.77%)
When I use the KerasClassifier wrapper from scikit
estimator = KerasClassifier(build_fn=create_baseline, epochs=50, batch_size=32, verbose=0)
kfold = StratifiedKFold(n_splits=10,shuffle=True, random_state=3)
results = cross_val_score(estimator, X, y, cv=kfold, scoring='accuracy')
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
I get
63.82% (5.37%)
Additionally, when using KerasClassifier the following warning appears
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/wrappers/scikit_learn.py:241: Sequential.predict_classes (from tensorflow.python.keras.engine.sequential) is deprecated and will be removed after 2021-01-01.
Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).
Do the results differ because KerasClassifier uses predict_classes() while the manual Tensorflow/Keras approach uses just predict()? If so, which approach is more reasonable?
My model looks like this
def create_baseline():
model = tf.keras.models.Sequential()
model.add(Dense(8, activation='relu', input_shape=(12,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
The two CV-results do not look too different, they are both within each others standard deviation.
You fixed the seed for the StratifiedKFold class, that's good. However there is additional randomness you should take control of and that comes from the weight initialization. Make sure you initialize your model for each CV-run with different weights, but use the same 10 initializations for both cross-validations, manual and automatic. You can pass an initializer to each layer, they have a seed argument as well. In general you should fix all possible seeds (np.random.seed(3), tf.set_random_seed(3)).
What happens if you run cross_val_score() or your manual version twice? Do you get the same results / numbers?

Tensorflow Fit exits with code 1 without any error message

I'm new to tensorflow. What I'm trying to do is to train a simple neural network to solve the Newton 2 problems, to guess the force value of given mass and acceleration values. The input layer consists of two neurons which are mass and acceleration values. The output layer is the force.
The program just gives a warning, prints some data which I guess the outputs and then exits with code 1. I cannot try anything to solve this problem. Because as I said before I'm new to tensorflow and there is no error message.
Here is the code:
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential
import numpy as np
import pickle
X = pickle.load(open("Newton2_X.pickle", "rb"))
y = pickle.load(open("Newton2_y.pickle", "rb"))
model = Sequential()
# model.add(Flatten())
model.add(Dense(2, activation="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(1, activation="softmax"))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(X, y, epochs=3, validation_split=0.1, batch_size=100)
Here are the pickle files:
https://drive.google.com/drive/folders/1FkKmY4px8oQJkbHYb_Z4y4Lnb1EazkvP?usp=sharing
After this part of the code I've some additional lines to make the network to guess a new value and some print lines. These lines are not executed. In fact, I've found that the 'problem' must be in model.fit(...) part. Because no lines after that line are executed.
Here is the full warning msg that I got from the program:
WARNING: Logging before flag parsing goes to stderr.
W0816 07:02:05.292823 17652 deprecation.py:506] From C:\Users\SABA\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
6, 0.2142802901764338, 0.26114980919201514, 0.2451221454091551, 0.19920049739052853, ...
A couple of things to tweak.
Firstly, I don't think the data is the shape that you think it is. You have:
X.shape # (45000, 2, 2, 1)
y is a flat list with 90,000 elements.
Secondly, you are predicting a number (so a regression) but you were trying to use 'sparse_categorical_crossentropy' as a loss function which is for classification problems.
I can get your code to run by simply slicing the data down to the shape we need but obviously it won't train as I haven't paired up the correct Xs and ys. You'll need to sort this out properly in the data
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential
import numpy as np
import pickle
### TODO - sort this out!
X = pickle.load(open("Newton2_X.pickle", "rb"))[:,0,:,0]
y = np.array(pickle.load(open("Newton2_y.pickle", "rb")))[:45000]
####
model = Sequential()
# model.add(Flatten())
model.add(Dense(2, activation="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(1, activation="softmax"))
model.compile(optimizer='adam',
loss='mse')
model.fit(X, y, epochs=3, validation_split=0.1, batch_size=100)

Getting bad performance with a simple TensorFlow model

I'm trying to experiment with a simple TensorFlow model built with keras, but I can't figure out why I'm getting such poor predictions. Here's the model:
x_train = np.asarray([[.5], [1.0], [.4], [5], [25]])
y_train = np.asarray([.25, .5, .2, 2.5, 12.5])
opt = keras.optimizers.Adam(lr=0.01)
model = Sequential()
model.add(Dense(1, activation="relu", input_shape=(x_train.shape[1:])))
model.add(Dense(9, activation="relu"))
model.add(Dense(1, activation="relu"))
model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)
print(model.predict(np.asarray([[5]])))
As you can see, it should learn to divide the input by two. However the loss is 32.5705, and over a few epochs, it refuses to change whatsoever (even if I do something crazy like 100 epochs, it's always that loss). Is there anything you can see that I'm doing horribly wrong here? The prediction for any value it seems is 0..
It also seems to be randomly switching between performing as expected, and the weird behavior described above. I re-ran it and got a loss of 0.0019 after 200 epochs, but if I re-run it with all the same parameters a second later the loss stays at 30 like before. What's going on here?
Some reasons that I can think of,
training set is too small
learning rate is high
last layer should just be a linear layer
for some runs the ReLU units are dying (see dead ReLU problem) and your network weights don't change after that so you see the same loss value.
In this case maybe a tanh activation will provide better conditioning for optimization
I made a few changes to your code based on what I commented, and I get decent results.
import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
x_train = np.random.random((50000, 1))#np.asarray([[.5], [1.0], [.4], [5], [25]])
y_train = x_train /2. #TODO: add small amount of noise to y #np.asarray([.25, .5, .2, 2.5, 12.5])
opt = keras.optimizers.Adam(lr=0.0005, clipvalue=0.5)
model = Sequential()
model.add(Dense(1, activation="tanh", input_shape=x_train.shape[1:]))
model.add(Dense(9, activation="tanh"))
model.add(Dense(1, activation=None))
model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)
print(model.predict(np.asarray([.4322])))
Output:
[[0.21410337]]

Sequentialmodels without `input_shape` passed to the 1st layer cannot reload optimizer state

WARNING:tensorflow:Sequential models without an input_shape passed to the first layer cannot reload their optimizer state. As a result, your model is starting with a freshly initialized optimizer.
while trying to load a saved model i encountered this warning from tensorflow
import tensorflow.keras as keras
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3)
model.save('epic_num_reader.model')
new_model = tf.keras.models.load_model('epic_num_reader.model')
predictions = new_model.predict(x_test)
Had the same problem after upgrading to TF 1.14, I fixed it changing the definition of the first layer from this:
model.add(tf.keras.layers.Flatten())
to this
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
where 28 is the size of the input map to be flattened (mnist pixels in our case)
As the warning suggest, your first layer need the argument input_shape. In your case this would be the layer Flatten.
In the keras Documentation there is an extra section about the sequential API. See here for further information.
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
for the first layer after tf 1.14 it is require to use input type which is the dimensions for the particular image.
Or you might get warning while retrieving model to not get proper working for your optimizer