I'm trying to get batch_size in call() function in TF2 model.
However, I cannot get it because all the methods I know returns None or Tensor instead of dimension tuple.
Here is a short example
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Model
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
def call(self, x):
print(len(x))
print(x.shape)
print(tf.size(x))
print(np.shape(x))
print(x.get_shape())
print(x.get_shape().as_list())
print(tf.rank(x))
print(tf.shape(x))
print(tf.shape(x)[0])
print(tf.shape(x)[1])
return tf.random.uniform((2, 10))
m = MyModel()
m.compile(optimizer="Adam", loss="sparse_categorical_crossentropy", metrics=['accuracy'])
m.fit(np.array([[1,2,3,4], [5,6,7,8]]), np.array([0, 1]), epochs=1)
The output is:
Tensor("my_model_26/strided_slice:0", shape=(), dtype=int32)
(None, 4)
Tensor("my_model_26/Size:0", shape=(), dtype=int32)
(None, 4)
(None, 4)
[None, 4]
Tensor("my_model_26/Rank:0", shape=(), dtype=int32)
Tensor("my_model_26/Shape_2:0", shape=(2,), dtype=int32)
Tensor("my_model_26/strided_slice_1:0", shape=(), dtype=int32)
Tensor("my_model_26/strided_slice_2:0", shape=(), dtype=int32)
1/1 [==============================] - 0s 1ms/step - loss: 3.1796 - accuracy: 0.0000e+00
I fed (2,4) numpy array as input and (2, ) as target to the model in this example.
But as you can see, I cannot get batch_size in call() function.
The reason I need it is because I have to iterate tensors for batch_size which is dynamic in my real model.
For example, if the dataset size is 10 and batch size is 3, then the last batch size in last batch would be 1. So, I have to know batch size dynamically.
Can anyone help me?
Tensorflow 2.3.3
CUDA 10.2
python 3.6.9
It's because you're using TensorFlow (that's mandatory since Keras is now inside TensorFlow), and by using TensorFlow you need to be aware of the "compilation" of the dynamic graph into a static-graph.
In short, your call method is (under the hood) decorated with the #tf.function decorator.
This decorator:
Traces the python function execution
Converts the python operation in TensorFlow operations (e.g. if a > b becomes tf.cond(tf.greater(a,b), something, something_else))
Creates a tf.Graph (the static graph)
Executes the static graph just created.
Al your print calls are executed during the first step (the python execution tracing), that's why even if you train your model you see the output only 1 time.
That said, to get the runtime (dynamic shape) of a tensor, you must use tf.shape(x), the batch size is just batch_size = tf.shape(x)[0]
Please note that if you want to see the shape (using print) you can't use print, but you must use tf.print.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Model
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
def call(self, x):
shape = tf.shape(x)
batch_size = shape[0]
tf.print(shape, batch_size)
return tf.random.uniform((2, 10))
m = MyModel()
m.compile(
optimizer="Adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)
m.fit(np.array([[1, 2, 3, 4], [5, 6, 7, 8]]), np.array([0, 1]), epochs=1)
More information about static and dynamic shapes: https://pgaleone.eu/tensorflow/2018/07/28/understanding-tensorflow-tensors-shape-static-dynamic/
More info about the tf.function behavior: https://pgaleone.eu/tensorflow/tf.function/2019/03/21/dissecting-tf-function-part-1/
Note: I wrote these articles.
If you want to get exactly the data and shapes, you may turn eager run true, but it is not a good solution, since it makes training slow.
Set it like this:
m.compile(optimizer="Adam", loss="sparse_categorical_crossentropy",
metrics=['accuracy'], run_eagerly=True)
Then the output will be:
(2, 4)
tf.Tensor(8, shape=(), dtype=int32)
(2, 4)
(2, 4)
[2, 4]
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor([2 4], shape=(2,), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)
Related
I want to create a simple toy model in keras. The model should take an input, then add a 1 to every element and produce an output.
I found an example using keras, but it requires 2 inputs
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# create model
input1 = layers.Input(shape=(2,))
input2 = layers.Input(shape=(2,))
added = layers.Add()([input1, input2])
model = keras.models.Model(inputs=[input1, input2], outputs=added)
# run inference
input_shape = (2,)
x1 = tf.ones(input_shape)
x2 = tf.ones(input_shape)
y = model([x1, x2])
However, I need the model to only have a single input and simply increase every input value by 1, for example.
You can replace the second input of your toy model with a call to tf.ones_like:
input1 = layers.Input(shape=())
added = layers.Add()([input1, tf.ones_like(input1)])
model = keras.models.Model(inputs=input1, outputs=added)
tf.ones_like creates a tensor full of ones of the shape of the tensor passed as an argument. As this op depends only on the shape of the input tensor, you can technically create your network without a specified input shape, and it will accept any shape as input:
>>> model(3)
<tf.Tensor: shape=(), dtype=float32, numpy=4.0>
>>> model(tf.ones((1,2,3)))
<tf.Tensor: shape=(1, 2, 3), dtype=float32, numpy=
array([[[2., 2., 2.],
[2., 2., 2.]]], dtype=float32)>
I am reading tf.Variable in Tensorflow r2.0 in TF2:
import tensorflow as tf
# Create a variable.
w = tf.constant([1, 2, 3, 4], tf.float32, shape=[2, 2])
# Use the variable in the graph like any Tensor.
y = tf.matmul(w,tf.constant([7, 8, 9, 10], tf.float32, shape=[2, 2]))
v= tf.Variable(w)
# The overloaded operators are available too.
z = tf.sigmoid(w + y)
tf.shape(z)
# Assign a new value to the variable with `assign()` or a related method.
v.assign(w + 1)
v.assign_add(tf.constant([1.0, 21]))
ValueError: Shapes must be equal rank, but are 2 and 1 for
'AssignAddVariableOp_4' (op: 'AssignAddVariableOp') with input shapes:
[], 2.
And also how come the following returns false?
tf.shape(v) == tf.shape(tf.constant([1.0, 21],tf.float32))
My other question is that when we are in TF 2, we should not use tf.Session() anymore, correct? It seems we should never run session.run(), but the API document keys doing it with tf.compat.v1, etc. So why they are using it in TF2 docs?
Any help would be appreciated.
CS
As it clearly says in the error, it is expecting shape [2,2] for assign_add on v which is having the shape [2,2].
If you try to give any shape other than the initial shape of the Tensor which you are trying to do assign_add the error will be given.
Below is the modified code with the expected shape for the operation.
import tensorflow as tf
# Create a variable.
w = tf.constant([1, 2, 3, 4], tf.float32, shape=[2, 2])
# Use the variable in the graph like any Tensor.
y = tf.matmul(w,tf.constant([7, 8, 9, 10], tf.float32, shape=[2, 2]))
v= tf.Variable(w)
# The overloaded operators are available too.
z = tf.sigmoid(w + y)
tf.shape(z)
# Assign a new value to the variable with `assign()` or a related method.
v.assign(w + 1)
print(v)
v.assign_add(tf.constant([1, 2, 3, 4], tf.float32, shape=[2, 2]))
Output for v:
<tf.Variable 'UnreadVariable' shape=(2, 2) dtype=float32, numpy=
array([[3., 5.],
[7., 9.]], dtype=float32)>
Now the following Tensor comparison is returning True.
tf.shape(v) == tf.shape(tf.constant([1.0, 21],tf.float32))
<tf.Tensor: shape=(2,), dtype=bool, numpy=array([ True, True])>
Coming to your tf.Session() question, in TensorFlow 2.0 eager execution is enabled by default, still, if you need to disable eager execution and can use tf.Session like below.
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
hello = tf.constant('Hello, TensorFlow!')
sess = tf.compat.v1.Session()
print(sess.run(hello))
I want to build a loss in a "pythonic" way using the eager execution of TF2, but even in eager mode, Keras is passing non-eager tensors.
Code:
def conditional_loss(self, y_true, y_pred):
print(y_true)
return 0
def define_model(self):
self.model = keras.Sequential([
keras.layers.Dense(units=768),
keras.layers.BatchNormalization(),
keras.layers.ReLU(),
keras.layers.Dropout(0.2),
keras.layers.Dense(units=128),
keras.layers.BatchNormalization(),
keras.layers.ReLU(),
keras.layers.Dropout(0.2),
keras.layers.Dense(units=5, activation='softmax')
])
self.model.compile(optimizer='adam',
loss=self.conditional_loss,
metrics=[self.conditional_loss,
keras.metrics.sparse_categorical_accuracy]
)
self.model.fit(
self.train_dataset,
epochs=10,
validation_data=self.test_dataset,
callbacks=[tensorboard_callback, model_callback],
)
If I print y_true in conditional_loss TF prints a non-eager tensor.
Tensor("metrics/conditional_loss/Cast:0", shape=(None, 1), dtype=float32)
If I build my own keras.Model() I can call it with the argumentdynamic=True to enable eager execution. (Reference). Exists a way to do it in keras.Sequential() ?
To do that you have to call model.compile() with the argument run_eagerly=True. Following the question example:
self.model.compile(optimizer='adam',
loss=self.conditional_loss,
metrics=[self.conditional_loss,
keras.metrics.sparse_categorical_accuracy],
run_eagerly=True
)
In tensorflow-2.0, I am trying to create a keras.layers.Layer which outputs the Kullback-Leibler (KL) divergence between two tensorflow_probability.distributions. I would like to calculate the gradient of the output (i.e. the KL divergence) with respect to the mean value of one of the tensorflow_probability.distributions.
In all my attempts so far, the resulting gradients are 0, unfortunately.
I tried implementing a minimal example shown below. I was wondering if the problems might have to do with the eager execution mode of tf 2, as I know of a similar approach that worked in tf 1, where eager execution is disabled by default.
This is the minimal example I tried:
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Layer,Input
# 1 Define Layer
class test_layer(Layer):
def __init__(self, **kwargs):
super(test_layer, self).__init__(**kwargs)
def build(self, input_shape):
self.mean_W = self.add_weight('mean_W',trainable=True)
self.kernel_dist = tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W,
scale_diag=(1.,)
)
super(test_layer, self).build(input_shape)
def call(self,x):
return tfp.distributions.kl_divergence(
self.kernel_dist,
tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W*0.,
scale_diag=(1.,)
)
)
# 2 Create model
x = Input(shape=(3,))
fx = test_layer()(x)
test_model = Model(name='test_random', inputs=[x], outputs=[fx])
# 3 Calculate gradient
print('\n\n\nCalculating gradients: ')
# example data, only used as a dummy
x_data = np.random.rand(99,3).astype(np.float32)
for x_now in np.split(x_data,3):
# print(x_now.shape)
with tf.GradientTape() as tape:
fx_now = test_model(x_now)
grads = tape.gradient(
fx_now,
test_model.trainable_variables,
)
print('\nKL-Divergence: ', fx_now, '\nGradient: ',grads,'\n')
print(test_model.summary())
The output of the code above is
Calculating gradients:
KL-Divergence: tf.Tensor(0.0029436834, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=237, shape=(), dtype=float32, numpy=0.0>]
KL-Divergence: tf.Tensor(0.0029436834, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=358, shape=(), dtype=float32, numpy=0.0>]
KL-Divergence: tf.Tensor(0.0029436834, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=479, shape=(), dtype=float32, numpy=0.0>]
Model: "test_random"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 3)] 0
_________________________________________________________________
test_layer_3 (test_layer) () 1
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
None
The KL divergence is calculated correcly, but the resulting gradient is 0. What would be a correct way to obtain the gradients?
We are working our way through distributions & bijectors, making them friendly to closing over variables in the constructor. (Have not yet done the MVNs.) In the meantime, you could use tfd.Independent(tfd.Normal(loc=self.mean_W, scale=1), reinterpreted_batch_ndims=1) which I think will work inside your build method because we've adapted Normal.
Also: have you seen the tfp.layers package? In particular https://www.tensorflow.org/probability/api_docs/python/tfp/layers/KLDivergenceAddLoss might be interesting to you.
If anybody should be interested, I found out how to solve this:
The line
self.kernel_dist = tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W,
scale_diag=(1.,)
)
should not be inside the build()- method of the layer class definition, but rather inside the call() method. Here is the modified example:
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Layer,Input
# 1 Define Layer
class test_layer(Layer):
def __init__(self, **kwargs):
super(test_layer, self).__init__(**kwargs)
def build(self, input_shape):
self.mean_W = self.add_weight('mean_W',trainable=True)
super(test_layer, self).build(input_shape)
def call(self,x):
self.kernel_dist = tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W,
scale_diag=(1.,)
)
return tfp.distributions.kl_divergence(
self.kernel_dist,
tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W*0.,
scale_diag=(1.,)
)
)
# 2 Create model
x = Input(shape=(3,))
fx = test_layer()(x)
test_model = Model(name='test_random', inputs=[x], outputs=[fx])
# 3 Calculate gradient
print('\n\n\nCalculating gradients: ')
# example data, only used as a dummy
x_data = np.random.rand(99,3).astype(np.float32)
for x_now in np.split(x_data,3):
# print(x_now.shape)
with tf.GradientTape() as tape:
fx_now = test_model(x_now)
grads = tape.gradient(
fx_now,
test_model.trainable_variables,
)
print('\nKL-Divergence: ', fx_now, '\nGradient: ',grads,'\n')
print(test_model.summary())
The output now is
Calculating gradients:
KL-Divergence: tf.Tensor(0.024875917, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=742, shape=(), dtype=float32, numpy=0.22305119>]
KL-Divergence: tf.Tensor(0.024875917, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=901, shape=(), dtype=float32, numpy=0.22305119>]
KL-Divergence: tf.Tensor(0.024875917, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=1060, shape=(), dtype=float32, numpy=0.22305119>]
Model: "test_random"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 3)] 0
_________________________________________________________________
test_layer_1 (test_layer) () 1
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
None
as expected.
Is this something that was changed from tensorflow 1 to tensorflow 2?
It seems that keras trainable attribute is ignored by tensorflow, which makes it very inconvenient to use keras as a syntactical shortcut in tensorflow.
For example:
import keras
import tensorflow as tf
import numpy as np
import keras.backend as K
Conv2 = keras.layers.Conv2D(filters=16, kernel_size=3, padding='same')
Conv2.trainable = False #This layers has been set to not trainable.
A=keras.layers.Input(batch_shape=(1,16,16,3))
B = Conv2(A)
x = np.random.randn(1, 16, 16,3)
y = np.random.randn(1,16, 16, 16)
True_y = tf.placeholder(shape=(1,16,16,16), dtype=tf.float32)
loss = tf.reduce_sum((B - True_y) ** 2)
opt_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)
print(tf.trainable_variables())
# [<tf.Variable 'conv2d_1/kernel:0' shape=(3, 3, 3, 16) dtype=float32_ref>, <tf.Variable 'conv2d_1/bias:0' shape=(16,) dtype=float32_ref>]
sess = K.get_session()
for _ in range(10):
out = sess.run([opt_op, loss], feed_dict={A:x, True_y:y})
print(out[1])
OutPut:
5173.94
4968.7754
4785.889
4624.289
4482.1
4357.5757
4249.1504
4155.329
4074.634
4005.6482
It simply means the loss is decreasing and the weights are trainable.
I read the blog ''Keras as a simplified interface to TensorFlow'', but it mentioned nothing about the trainable problem.
Any suggestion is appreciated.
Your conclusion is basically correct. Keras is a wrapper around TensorFlow, but not all Keras functionality transfers directly into TensorFlow, so you need to be careful when you mix Keras and raw TF.
Specifically, in this case, if you want to call the minimize function yourself, you need to specify which variables you want to train on using the var_list argument of minimize.