So here is an example of using batch normalization over a 1-D input vector. Batch normalization is performed over 100 training examples xTr. I then want to test on say just 1 example later on xTe.
import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers import layers
if __name__ == "__main__":
bn = layers.batch_norm
nFeats = 3
nObs = 100
xTr = np.random.rand(nObs,nFeats) # Train
xTe = np.random.rand(1,nFeats) # Test
bnTrain = tf.placeholder(tf.bool)
X = tf.placeholder(tf.float32,[None,nFeats])
Y = bn(X,nFeats,is_training=bnTrain) # want to be able to change is_training via a feed_dict.
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init_op)
yTr_ = Y.eval(feed_dict={X:xTr,bnTrain:True})
yTe_ = Y.eval(feed_dict={X:xTe,bnTrain:False})
But I can't pass a tf.Tensor to a function expecting a normal python bool. What is the best way of going about this so I can change a bool during a session.
The current implementation of the tf.contrib.layers.batch_norm() function is designed to accept a tf.Tensor as the is_training argument (although this fact doesn't appear to be documented), and looking at the revision history, it was added in the TensorFlow 0.10 release. If you are using an older version, please try upgrading to the latest release (currently 0.12), and your existing code should work. Among other improvements, it contains a fused implementation of batch normalization that should make a significant performance improvement.
Related
While the MirroredStrategy's IndexError: pop from empty list is now infamous and there are numerous possible causes for it, such as reported in the following questions:
MirroredStrategy IndexError caused by K.clear_session()
MirroredStrategy IndexError within AutoKeras
MirroredStrategy IndexError when training from checkpoint
And so forth, but none apply to my use case.
In my use case, I'm using Keras Sequence objects to generate the training inputs, as I'm working on large datasets (would not fit in RAM) with a single known positive class and unknown negatives.
Following tutorials such as the one available on the Keras Documentation and TensorFlow documentation my code looks like the following:
my_training_sequence = MySequenceObject()
if tf.config.list_physical_devices('GPU'):
strategy = tf.distribute.MirroredStrategy(devices)
else:
# Use the Default Strategy
strategy = tf.distribute.get_strategy()
with strategy.scope():
model = CreateMyKerasModel()
# While in the TensorFlow documentation the compilation step
# is shown OUTSIDE the scope, in the Keras one it happens
# within the scope.
# I have found out that is NECESSARY to place it inside the scope
# as the Keras Metrics need to be in the same strategy scope of the model
# to work properly.
model.compile(...)
# Then, OUSIDE from the score, run the fit
# which causes the IndexError
model.fit(my_training_sequence)
Any ideas on how to deal with this?
After much pain, I realized that in the Keras Documentation they make use of TensorFlow Dataset objects.
Now, normal inputs such as vectors, are converted to Datasets within the fit process and do not cause problems for this reason, but currently Keras does not support the automatic conversion of Keras Sequences into Datasets under the hood. While I do not know why this is, fortunately it is relatively easy to create a method to convert a Sequence into a Dataset.
Unfortunately, it is dependant on the version of TensorFlow you are using, so in certain versions you want to use TensorSpec objects, while in older one just the combination of tensorflow data types and TensorShape will do.
In the following example, I will show an high level approach to writing a Keras Sequence class that can be converted to a Dataset. Afterwards, I will link to all Keras Sequences I have already implemented in this fashion as examples for the posterity (or myself, once I forget some of the details of this devilish thing).
import tensorflow as tf
import numpy as np
from packaging import version
from validate_version_code import validate_version_code
def tensorflow_version_is_higher_or_equal_than(tensorflow_version: str) -> bool:
"""Returns boolean if the TensorFlow version is higher than provided one.
Parameters
----------------------
tensorflow_version: str,
The version of TensorFlow to check against.
Raises
----------------------
ValueError,
If the provided version code is not a valid one.
Returns
----------------------
Boolean representing if installed TensorFlow version is higher than given one.
"""
if not validate_version_code(tensorflow_version):
raise ValueError(
(
"The provided TensorFlow version code `{}` "
"is not a valid version code."
).format(tensorflow_version)
)
return version.parse(tf.__version__) >= version.parse(tensorflow_version)
class ExampleSequence:
"""Keras Sequence convertible into a TensorFlow Dataset."""
def __init__(
self,
batch_size: int = 32,
batches_per_epoch: int,
# Your other parameters go here
):
"""
Parameters
--------------------------------
batch_size: int = 32
Size for the batches to generate,
if the size is expected to be CONSTANT
otherwise use None if some batches have different size
batches_per_epoch: int
The number of batches within an epoch
"""
self._batch_size = batch_size
self._batches_per_epoch = batches_per_epoch
# Initialize the index of the batch for the Dataset calls
self._current_index = 0
# Your other parameters go here
def __call__(self):
"""Return next batch using an infinite generator model."""
self._current_index = (self._current_index + 1) % self._batches_per_epoch
return self[self._current_index]
def into_dataset(self) -> tf.data.Dataset:
"""Return dataset generated out of the current sequence instance.
Implementative details
---------------------------------
This method handles the conversion of this Keras Sequence into
a TensorFlow dataset, also handling the proper dispatching according
to what version of TensorFlow is installed in this system.
Returns
----------------------------------
Dataset to be used for the training of a model
"""
#################################################################
# Handling kernel creation when TensorFlow is a modern version. #
#################################################################
if tensorflow_version_is_higher_or_equal_than("2.5.0"):
return tf.data.Dataset.from_generator(
self,
output_signature=(
(
tf.TensorSpec(
shape=(self._batch_size, 10),
dtype=tf.uint32
)
),
tf.TensorSpec(
shape=(self._batch_size,),
dtype=tf.bool
)
)
)
return tf.data.Dataset.from_generator(
self,
output_types=(
(tf.uint32, ),
tf.bool
),
output_shapes=(
(tf.TensorShape([self._batch_size, 10]),),
tf.TensorShape([self._batch_size, ]),
)
)
def __getitem__(self, idx: int):
"""Return batch corresponding to given index.
Parameters
---------------
idx: int,
Index corresponding to batch to be returned.
Returns
---------------
Return Tuple containing X and Y numpy arrays corresponding to given batch index.
"""
X = np.random.randint(shape=(self._batch_size, 10), dtype=np.uint32)
y = np.random.randint(high=2, shape=(self._batch_size, ), dtype=np.bool)
# Please do observe that the return type
# has multiple layer of tuple wrapping, and they are ALL needed!
# It is weird, but it is the only way this thing worked.
return (((X, ), y,),)
And then, when you run the fit, you can use:
model.fit(my_training_sequence.into_dataset())
I am trying to find out, how exactly does BatchNormalization layer behave in TensorFlow. I came up with the following piece of code which to the best of my knowledge should be a perfectly valid keras model, however the mean and variance of BatchNormalization doesn't appear to be updated.
From docs https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization
in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).
I expect the model to return a different value with each subsequent predict call.
What I see, however, are the exact same values returned 10 times.
Can anyone explain to me why does the BatchNormalization layer not update its internal values?
import tensorflow as tf
import numpy as np
if __name__ == '__main__':
np.random.seed(1)
x = np.random.randn(3, 5) * 5 + 0.3
bn = tf.keras.layers.BatchNormalization(trainable=False, epsilon=1e-9)
z = input = tf.keras.layers.Input([5])
z = bn(z)
model = tf.keras.Model(inputs=input, outputs=z)
for i in range(10):
print(x)
print(model.predict(x))
print()
I use TensorFlow 2.1.0
Okay, I found the mistake in my assumptions. The moving average is being updated during training not during inference as I thought. This makes perfect sense, as updating the moving averages during inference would likely result in an unstable production model (for example a long sequence of highly pathological input samples [e.g. such that their generating distribution differs drastically from the one on which the network was trained] could potentially bias the network and result in worse performance on valid input samples).
The trainable parameter is useful when you're fine-tuning a pretrained model and want to freeze some of the layers of the network even during training. Because when you call model.predict(x) (or even model(x) or model(x, training=False)), the layer automatically uses the moving averages instead of batch averages.
The code below demonstrates this clearly
import tensorflow as tf
import numpy as np
if __name__ == '__main__':
np.random.seed(1)
x = np.random.randn(10, 5) * 5 + 0.3
z = input = tf.keras.layers.Input([5])
z = tf.keras.layers.BatchNormalization(trainable=True, epsilon=1e-9, momentum=0.99)(z)
model = tf.keras.Model(inputs=input, outputs=z)
# a dummy loss function
model.compile(loss=lambda x, y: (x - y) ** 2)
# a dummy fit just to update the batchnorm moving averages
model.fit(x, x, batch_size=3, epochs=10)
# first predict uses the moving averages from training
pred = model(x).numpy()
print(pred.mean(axis=0))
print(pred.var(axis=0))
print()
# outputs the same thing as previous predict
pred = model(x).numpy()
print(pred.mean(axis=0))
print(pred.var(axis=0))
print()
# here calling the model with training=True results in update of moving averages
# furthermore, it uses the batch mean and variance as in training,
# so the result is very different
pred = model(x, training=True).numpy()
print(pred.mean(axis=0))
print(pred.var(axis=0))
print()
# here we see again that the moving averages are used but they differ slightly after
# the previous call, as expected
pred = model(x).numpy()
print(pred.mean(axis=0))
print(pred.var(axis=0))
print()
In the end, I found that the documentation (https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization) mentions this:
When performing inference using a model containing batch normalization, it is generally (though not always) desirable to use accumulated statistics rather than mini-batch statistics. This is accomplished by passing training=False when calling the model, or using model.predict.
Hopefully this will help someone with similar misunderstanding in the future.
I'm working at a slightly lower-level of Keras than the Model fit API. I would like to be able to set the state of a newly constructed optimizer to the state of it from previous training.
The get_weights and set_weights methods seem promising; they just return and receive numpy arrays or standard scalar data for the state of the optimizer. However, the problem is you cannot set_weights if the weights have not yet been created, and as far as I can tell, the only public way they get created is on the first call to apply_gradients.
For example, the following fails because opt2 will not have its weights created.
import tensorflow as tf
import numpy as np
opt1 = tf.keras.optimizers.Adam()
opt2 = tf.keras.optimizers.Adam()
layer = tf.keras.layers.Dense(1)
# dummy data
x = np.array([[-1, 1], [1, 1]])
y = np.array([[-1], [1]])
# do one optimization step
with tf.GradientTape() as tape:
loss = (layer(x) - y)**2
grads = tape.gradient(loss, layer.trainable_weights)
opt1.apply_gradients(zip(grads, layer.trainable_weights))
# copy state to optimizer 2
opt2.set_weights(opt1.get_weights()) # this fails!
Lets assume I do have on hand the relevant model weights on which the optimizer operates. What is the right way restore state? Based on the implementation of the apply_gradients method, it seems like this is the path:
_ = opt2.iterations # must be called to make this weight appear
opt2._create_hypers()
opt2._create_slots(layer.trainable_weights)
# now we can safely set weights
opt2.set_weights(opt1.get_weights())
But that feels really hacky to me and prone to fail if implementation details change at a future point. Are there better approaches that I'm missing?
Following the upgrade to Keras 2.0.9, I have been using the multi_gpu_model utility but I can't save my models or best weights using
model.save('path')
The error I get is
TypeError: can’t pickle module objects
I suspect there is some problem gaining access to the model object. Is there a work around this issue?
To be honest, the easiest approach to this is to actually examine the multi gpu parallel model using
parallel_model.summary()
(The parallel model is simply the model after applying the multi_gpu function). This clearly highlights the actual model (in I think the penultimate layer - I am not at my computer right now). Then you can use the name of this layer to save the model.
model = parallel_model.get_layer('sequential_1)
Often its called sequential_1 but if you are using a published architecture, it may be 'googlenet' or 'alexnet'. You will see the name of the layer from the summary.
Then its simple to just save
model.save()
Maxims approach works, but its overkill I think.
Rem: you will need to compile both the model, and the parallel model.
Workaround
Here's a patched version that doesn't fail while saving:
from keras.layers import Lambda, concatenate
from keras import Model
import tensorflow as tf
def multi_gpu_model(model, gpus):
if isinstance(gpus, (list, tuple)):
num_gpus = len(gpus)
target_gpu_ids = gpus
else:
num_gpus = gpus
target_gpu_ids = range(num_gpus)
def get_slice(data, i, parts):
shape = tf.shape(data)
batch_size = shape[:1]
input_shape = shape[1:]
step = batch_size // parts
if i == num_gpus - 1:
size = batch_size - step * i
else:
size = step
size = tf.concat([size, input_shape], axis=0)
stride = tf.concat([step, input_shape * 0], axis=0)
start = stride * i
return tf.slice(data, start, size)
all_outputs = []
for i in range(len(model.outputs)):
all_outputs.append([])
# Place a copy of the model on each GPU,
# each getting a slice of the inputs.
for i, gpu_id in enumerate(target_gpu_ids):
with tf.device('/gpu:%d' % gpu_id):
with tf.name_scope('replica_%d' % gpu_id):
inputs = []
# Retrieve a slice of the input.
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_i = Lambda(get_slice,
output_shape=input_shape,
arguments={'i': i,
'parts': num_gpus})(x)
inputs.append(slice_i)
# Apply model on slice
# (creating a model replica on the target device).
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
# Save the outputs for merging back together later.
for o in range(len(outputs)):
all_outputs[o].append(outputs[o])
# Merge outputs on CPU.
with tf.device('/cpu:0'):
merged = []
for name, outputs in zip(model.output_names, all_outputs):
merged.append(concatenate(outputs,
axis=0, name=name))
return Model(model.inputs, merged)
You can use this multi_gpu_model function, until the bug is fixed in keras. Also, when loading the model, it's important to provide the tensorflow module object:
model = load_model('multi_gpu_model.h5', {'tf': tf})
How it works
The problem is with import tensorflow line in the middle of multi_gpu_model:
def multi_gpu_model(model, gpus):
...
import tensorflow as tf
...
This creates a closure for the get_slice lambda function, which includes the number of gpus (that's ok) and tensorflow module (not ok). Model save tries to serialize all layers, including the ones that call get_slice and fails exactly because tf is in the closure.
The solution is to move import out of multi_gpu_model, so that tf becomes a global object, though still needed for get_slice to work. This fixes the problem of saving, but in loading one has to provide tf explicitly.
It's something that need a little work around by loading the multi_gpu_model weight to the regular model weight.
e.g.
#1, instantiate your base model on a cpu
with tf.device("/cpu:0"):
model = create_model()
#2, put your model to multiple gpus, say 2
multi_model = multi_gpu_model(model, 2)
#3, compile both models
model.compile(loss=your_loss, optimizer=your_optimizer(lr))
multi_model.compile(loss=your_loss, optimizer=your_optimizer(lr))
#4, train the multi gpu model
# multi_model.fit() or multi_model.fit_generator()
#5, save weights
model.set_weights(multi_model.get_weights())
model.save(filepath=filepath)
`
refrence: https://github.com/fchollet/keras/issues/8123
I'm trying to write my own cost function in tensor flow, however apparently I cannot 'slice' the tensor object?
import tensorflow as tf
import numpy as np
# Establish variables
x = tf.placeholder("float", [None, 3])
W = tf.Variable(tf.zeros([3,6]))
b = tf.Variable(tf.zeros([6]))
# Establish model
y = tf.nn.softmax(tf.matmul(x,W) + b)
# Truth
y_ = tf.placeholder("float", [None,6])
def angle(v1, v2):
return np.arccos(np.sum(v1*v2,axis=1))
def normVec(y):
return np.cross(y[:,[0,2,4]],y[:,[1,3,5]])
angle_distance = -tf.reduce_sum(angle(normVec(y_),normVec(y)))
# This is the example code they give for cross entropy
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
I get the following error:
TypeError: Bad slice index [0, 2, 4] of type <type 'list'>
At present, tensorflow can't gather on axes other than the first - it's requested.
But for what you want to do in this specific situation, you can transpose, then gather 0,2,4, and then transpose back. It won't be crazy fast, but it works:
tf.transpose(tf.gather(tf.transpose(y), [0,2,4]))
This is a useful workaround for some of the limitations in the current implementation of gather.
(But it is also correct that you can't use a numpy slice on a tensorflow node - you can run it and slice the output, and also that you need to initialize those variables before you run. :). You're mixing tf and np in a way that doesn't work.
x = tf.Something(...)
is a tensorflow graph object. Numpy has no idea how to cope with such objects.
foo = tf.run(x)
is back to an object python can handle.
You typically want to keep your loss calculation in pure tensorflow, so do the cross and other functions in tf. You'll probably have to do the arccos the long way, as tf doesn't have a function for it.
just realized that the following failed:
cross_entropy = -tf.reduce_sum(y_*np.log(y))
you cant use numpy functions on tf objects, and the indexing my be different too.
I think you can use "Wraps Python function" method in tensorflow. Here's the link to the documentation.
And as for the people who answered "Why don't you just use tensorflow's built in function to construct it?" - sometimes the cost function people are looking for cannot be expressed in tf's functions or extremely difficult.
This is because you have not initialized your variable and because of this it does not have your Tensor there right now (can read more in my answer here)
Just do something like this:
def normVec(y):
print y
return np.cross(y[:,[0,2,4]],y[:,[1,3,5]])
t1 = normVec(y_)
# and comment everything after it.
To see that you do not have a Tensor now and only Tensor("Placeholder_1:0", shape=TensorShape([Dimension(None), Dimension(6)]), dtype=float32).
Try initializing your variables
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
and evaluate your variable sess.run(y). P.S. you have not fed your placeholders up till now.