How can i create a model in Keras and train it using Tensorflow? - tensorflow

Is it possible to create a model with Keras and without using compile and fit functions in Keras, use Tensorflow to train the model?

Sure. From Keras documentation:
Useful attributes of Model
model.layers is a flattened list of the layers comprising the model graph.
model.inputs is the list of input tensors.
model.outputs is the list of output tensors.
If you use Tensorflow backend, inputs and outputs are Tensorflow tensors, so you can use them without using Keras.

You can use keras to define a complicated graph:
import tensorflow as tf
sess = tf.Session()
from keras import backend as K
K.set_session(sess)
from keras.layers import Dense
from keras.objectives import categorical_crossentropy
img = Input(shape=(784,))
labels = Input(shape=(10,)) #one-hot vector
x = Dense(128, activation='relu')(img)
x = Dense(128, activation='relu')(x)
preds = Dense(10, activation='softmax')(x)
Then use tensorflow to config complicated optimization and training procedure:
loss = tf.reduce_mean(categorical_crossentropy(labels, preds))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
init_op = tf.global_variables_initializer()
sess.run(init_op)
# Run training loop
with sess.as_default():
for i in range(100):
batch = mnist_data.train.next_batch(50)
train_step.run(feed_dict={img: batch[0],
labels: batch[1]})
Ref: https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html

Related

Feeding tensorflow keras architecture with Sparse matrix of type scipy.sparse._csr.csr_matrix

Short Version:
I am trying to feed my data in the form of sparse matrix (of the type scipy.sparse._csr.csr_matrix') into a Tensorflow Keras Neural Network model. I highly appreciate any guidance. todense() and toarray() are not options for me. Also feeding in mini batches is not preferred.
Long version (including my efforts):
The problem is about a deep learning model with text, categorical and numerical features. My TfidfVectorizer creates a huge matrix which cannot be fed into a model as dense format.
text_cols = ['ca_name']
categorical_cols = ['cua_name','ca_category_modified']
numerical_cols = ['vidim1', 'vidim2', 'vidim3', 'vim', 'vid']
title_transformer = TfidfVectorizer()
numerical_transformer = MinMaxScaler()
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
preprocessor = ColumnTransformer(
transformers=[
('title', title_transformer, text_cols[0]),
('num', numerical_transformer, numerical_cols),
('cat', categorical_transformer, categorical_cols)
])
# df['dur_linreg] is my numerical target
X_train, X_test, y_train, y_test = train_test_split(df[text_cols+categorical_cols+numerical_cols], df['dur_linreg'], test_size=0.2, random_state=42)
# fit_transform the preprocessor on X_train, only transform X_test
X_train_transformed = preprocessor.fit_transform(X_train)
X_test_transformed = preprocessor.transform(X_test)
I can build and compile a model as following:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train_transformed.shape[1],)))
modeladd(tf.keras.layers.Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
But cannot fit it:
history = model.fit(X_train_transformed, y_train, epochs=20, batch_size=32, validation_data=(X_test_transformed, y_test))
InvalidArgumentError: Graph execution error: TypeError: 'SparseTensor' object is not subscriptable
Obviously because I am feeding the model with a sparse scipy.sparse._csr.csr_matrix matrix.
The size of my matrix and my resources restrict me to transform it to
dense format:
X_train_transformed.todense()
MemoryError: Unable to allocate 205. GiB for an array with shape (275189, 100074) and data type float64
2) (obviously) array:
X_train_transformed.toarray()
MemoryError: Unable to allocate 205. GiB for an array with shape (275189, 100074) and data type float64
According to a post "https://stackoverflow.com/questions/41538692/using-sparse-matrices-with-keras-and-tensorflow" I there are two approaches
" Keep it as a scipy sparse matrix, then, when giving Keras a minibatch, make it dense
Keep it sparse all the way through, and use Tensorflow Sparse Tensors"
The second approach is preferred for me as well. Therefore, I tried the following as well:
However, again I could only build and compile the model without a problem:
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.models import Model
input_layer = Input(shape=(X_train_transformed.shape[1],), sparse=True)
dense1 = Dense(64, activation='relu')(input_layer)
dropout1 = Dropout(0.2)(dense1)
dense2 = Dense(64, activation='relu')(dropout1)
dropout2 = Dropout(0.2)(dense2)
output_layer = Dense(1)(dropout2)
model = Model(input_layer, output_layer)
model.compile(optimizer='adam', loss='mean_squared_error')
But cannot fit it:
history = model.fit(X_train_transformed, y_train, validation_data=(X_test_transformed, y_test), epochs=5, batch_size=32)
InvalidArgumentError: Graph execution error:TypeError: 'SparseTensor' object is not subscriptable
Lastly, in case it is relevant I am using Tensorflow version 2.11.0 installed January 2023.
Many Thanks in advance for your help.

degraded accuracy performance with overfitting when downgrading from tensorflow 2.3.1 to tensorflow 1.14 or 1.15 on multiclass categorization

I made a script in tensorflow 2.x but I had to downconvert it to tensorflow 1.x (tested in 1.14 and 1.15). However, the tf1 version performs very differently (10% accuracy lower on the test set). See also the plot for train and validation performance (diagram is attached below).
Looking at the operations needed for the migration from tf1 to tf2 it seems that only the Adam learning rate may be a problem but I'm defining it explicitly tensorflow migration
I've reproduced the same behavior both locally on GPU and CPU and on colab. The keras used was the one built-in in tensorflow (tf.keras). I've used the following functions (both for train,validation and test), using a sparse categorization (integers):
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
horizontal_flip=horizontal_flip,
#rescale=None, #not needed for resnet50
preprocessing_function=None,
validation_split=None)
train_dataset = train_datagen.flow_from_directory(
directory=train_dir,
target_size=image_size,
class_mode='sparse',
batch_size=batch_size,
shuffle=True)
And the model is a simple resnet50 with a new layer on top:
IMG_SHAPE = img_size+(3,)
inputs = Input(shape=IMG_SHAPE, name='image_input',dtype = tf.uint8)
x = tf.cast(inputs, tf.float32)
# not working in this version of keras. inserted in imageGenerator
x = preprocess_input_resnet50(x)
base_model = tf.keras.applications.ResNet50(
include_top=False,
input_shape = IMG_SHAPE,
pooling=None,
weights='imagenet')
# Freeze the pretrained weights
base_model.trainable = False
x=base_model(x)
# Rebuild top
x = GlobalAveragePooling2D(data_format='channels_last',name="avg_pool")(x)
top_dropout_rate = 0.2
x = Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = Dense(num_classes,activation="softmax", name="pred_out")(x)
model = Model(inputs=inputs, outputs=outputs,name="ResNet50_comp")
optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
model.compile(optimizer=optimizer,
loss="sparse_categorical_crossentropy",
metrics=['accuracy'])
And then I'm calling the fit function:
history = model.fit_generator(train_dataset,
steps_per_epoch=n_train_batches,
validation_data=validation_dataset,
validation_steps=n_val_batches,
epochs=initial_epochs,
verbose=1,
callbacks=[stopping])
I've reproduced the same behavior for example with the following full script (applied to my dataset and changed to adam and removed intermediate final dense layer):
deep learning sandbox
The easiest way to replicate this behavior was to enable or disable the following line on a tf2 environment with the same script and add the following line to it. However, I've tested also on tf1 environments (1.14 and 1.15):
tf.compat.v1.disable_v2_behavior()
Sadly I cannot provide the dataset.
Update 26/11/2020
For full reproducibility I've obtained a similar behaviour by means of the food101 (101 categories) dataset enabling tf1 behaviour with 'tf.compat.v1.disable_v2_behavior()'. The following is the script executed with tensorflow-gpu 2.2.0:
#%% ref https://medium.com/deeplearningsandbox/how-to-use-transfer-learning-and-fine-tuning-in-keras-and-tensorflow-to-build-an-image-recognition-94b0b02444f2
import os
import sys
import glob
import argparse
import matplotlib.pyplot as plt
import tensorflow as tf
# enable and disable this to obtain tf1 behaviour
tf.compat.v1.disable_v2_behavior()
from tensorflow.keras import __version__
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
# since i'm using resnet50 weights from imagenet, i'm using food101 for
# similar but different categorization tasks
# pip install tensorflow-datasets if tensorflow_dataset not found
import tensorflow_datasets as tfds
(train_ds,validation_ds),info= tfds.load('food101', split=['train','validation'], shuffle_files=True, with_info=True)
assert isinstance(train_ds, tf.data.Dataset)
print(train_ds)
#%%
IM_WIDTH, IM_HEIGHT = 224, 224
NB_EPOCHS = 10
BAT_SIZE = 32
def get_nb_files(directory):
"""Get number of files by searching directory recursively"""
if not os.path.exists(directory):
return 0
cnt = 0
for r, dirs, files in os.walk(directory):
for dr in dirs:
cnt += len(glob.glob(os.path.join(r, dr + "/*")))
return cnt
def setup_to_transfer_learn(model, base_model):
"""Freeze all layers and compile the model"""
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
def add_new_last_layer(base_model, nb_classes):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top
nb_classes: # of classes
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dense(FC_SIZE, activation='relu')(x) #new FC layer, random init
predictions = Dense(nb_classes, activation='softmax')(x) #new softmax layer
model = Model(inputs=base_model.input, outputs=predictions)
return model
def train(nb_epoch, batch_size):
"""Use transfer learning and fine-tuning to train a network on a new dataset"""
#nb_train_samples = train_ds.cardinality().numpy()
nb_train_samples=info.splits['train'].num_examples
nb_classes = info.features['label'].num_classes
classes_names = info.features['label'].names
#nb_val_samples = validation_ds.cardinality().numpy()
nb_val_samples = info.splits['validation'].num_examples
#nb_epoch = int(args.nb_epoch)
#batch_size = int(args.batch_size)
def preprocess(features):
#print(features['image'], features['label'])
image = tf.image.resize(features['image'], [224,224])
#image = tf.divide(image, 255)
#print(image)
# data augmentation
image=tf.image.random_flip_left_right(image)
image = preprocess_input(image)
label = features['label']
# for categorical crossentropy
#label = tf.one_hot(label,101,axis=-1)
#return image, tf.cast(label, tf.float32)
return image, label
#pre-processing the dataset to fit a specific image size and 2D labelling
train_generator = train_ds.map(preprocess).batch(batch_size).repeat()
validation_generator = validation_ds.map(preprocess).batch(batch_size).repeat()
#train_generator=train_ds
#validation_generator=validation_ds
#fig = tfds.show_examples(validation_generator, info)
# setup model
base_model = ResNet50(weights='imagenet', include_top=False) #include_top=False excludes final FC layer
model = add_new_last_layer(base_model, nb_classes)
# transfer learning
setup_to_transfer_learn(model, base_model)
history = model.fit(
train_generator,
epochs=nb_epoch,
steps_per_epoch=nb_train_samples//BAT_SIZE,
validation_data=validation_generator,
validation_steps=nb_val_samples//BAT_SIZE)
#class_weight='auto')
#execute
history = train(nb_epoch=NB_EPOCHS, batch_size=BAT_SIZE)
And the performance on food101 dataset:
update 27/11/2020
It's possible to see the discrepancy also in the way smaller oxford_flowers102 dataset:
(train_ds,validation_ds,test_ds),info= tfds.load('oxford_flowers102', split=['train','validation','test'], shuffle_files=True, with_info=True)
Nb: the above plot shows confidences given by running the same training multiple times and evaluatind mean and std to check for the effects on random weights initialization and data augmentation.
Moreover I've tried some hyperparameter tuning on tf2 resulting in the following picture:
changing optimizer (adam and rmsprop)
not applying horizontal flipping aumgentation
deactivating keras resnet50 preprocess_input
Thanks in advance for every suggestion. Here are the accuracy and validation performance on tf1 and tf2 on my dataset:
Update 14/12/2020
I'm sharing the colab for reproducibility on oxford_flowers at the clic of a button:
colab script
I came across something similar, when doing the opposite migration (from TF1+Keras to TF2).
Running this code below:
# using TF2
import numpy as np
from tensorflow.keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 212.3205274187726
# using TF1+Keras
import numpy as np
from keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 187.23898954353717
you can see the same model from the same library on different versions does not return the same value (using sum as a quick check-up). I found the answer to this mysterious behavior in this other SO answer: ResNet model in keras and tf.keras give different output for the same image
Another recommendation I'd give you is, try using pooling from inside applications.resnet50.ResNet50 class, instead of the additional layer in your function, for simplicity, and to remove possible problem-generators :)

How to get evaluated gradients from keras model using tensorflow 2?

I'm trying to obtain the gradients from a keras model. The backend function keras.backend.gradients creates a symbolic function which needs to be evaluated on some specific input. The following code does work for this problem but it makes use of the old tensorflow sessions and in particular of feed_dict.
import numpy as np
import keras
from keras import backend as K
import tensorflow as tf
model = keras.Sequential()
model.add(keras.layers.Dense(16, activation='relu', input_shape = (49, )))
model.add(keras.layers.Dense(11, activation='softmax'))
model.compile(optimizer='rmsprop', loss='mse')
trainingExample = np.random.random((1, 49))
gradients = K.gradients(model.output, model.trainable_weights)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,\
feed_dict={model.input:trainingExample})
sess.close()
How can I rewrite this in tensorflow 2 style, i.e. without the sessions? There is an alternative method described here. However I don't understand why it should be necessary to give some explicit output to evaluate the gradients and how to make the solution work without these outputs.
In tensorflow-2 you can get gradients very easily using gradient tf.GradientTape().
I am citing the official tutorial code here -
#tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
you can find the complete tutorial in tensorflow official website - https://www.tensorflow.org/tutorials/quickstart/advanced

Tensorflow dense layers worse than keras sequential

I try to train an agent on the inverse-pendulum (similar to cart-pole) problem, which is a benchmark of reinforcement learning. I use neural-fitted-Q-iteration algorithm which uses a multi-layer neural network to evaluate the Q function.
I use Keras.Sequential and tf.layers.dense to build the neural network repectively, and leave all other things to be the same. However, Keras gives me a good results and tensorflow does not. In fact, tensorflow doesn't work at all with its loss being increasing and the agent learns nothing from the training.
Here I present the code for Keras as follows
def build_model():
model = Sequential()
model.add(Dense(5, input_dim=3))
model.add(Activation('sigmoid'))
model.add(Dense(5))
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
adam = Adam(lr=1E-3)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
and the tensorflow version is
class NFQ_fit(object):
"""
neural network approximator for NFQ iteration
"""
def __init__(self, sess, N_feature, learning_rate=1E-3, batch_size=100):
self.sess = sess
self.N_feature = N_feature
self.learning_rate = learning_rate
self.batch_size = batch_size
# DNN structure
self.inputs = tf.placeholder(tf.float32, [None, N_feature], 'inputs')
self.labels = tf.placeholder(tf.float32, [None, 1], 'labels')
self.l1 = tf.layers.dense(inputs=self.inputs,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-1')
self.l2 = tf.layers.dense(inputs=self.l1,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-2')
self.outputs = tf.layers.dense(inputs=self.l2,
units=1,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='outputs')
# optimization
# self.mean_loss = tf.losses.mean_squared_error(self.labels, self.outputs)
self.mean_loss = tf.reduce_mean(tf.square(self.labels-self.outputs))
self.regularization_loss = tf.losses.get_regularization_loss()
self.loss = self.mean_loss # + self.regularization_loss
self.train_op = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
The two models are the same. Both of them has two hidden layers with the same dimension. I expect that the problems may come from the kernel initialization but I don't know how to fix it.
Using Keras is great. If you want better TensorFlow integration check out tf.keras. There's no particular reason to use tf.layers if the Keras (or tf.keras) defaults work better.
In this case glorot_uniform looks like the default initializer. This is also the global TensorFlow default, so consider removing the kernel_initializer argument instead of the explicit truncated normal initialization in your question (or passing Glorot explicitly).

Calling a Keras model on a TensorFlow tensor but keep weights

In Keras as a simplified interface to TensorFlow: tutorial they describe how one can call a Keras model on a TensorFlow tensor.
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=784))
model.add(Dense(10, activation='softmax'))
# this works!
x = tf.placeholder(tf.float32, shape=(None, 784))
y = model(x)
They also say:
Note: by calling a Keras model, your are reusing both its architecture and its weights. When you are calling a model on a tensor, you are creating new TF ops on top of the input tensor, and these ops are reusing the TF Variable instances already present in the model.
I interpret this as that the weights of the model will be the same in y as in model. However, for me it seems like the weights in the resulting Tensorflow node are reinitialized. A minimal example can be seen below:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
# Create model with weight initialized to 1
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='ones',
bias_initializer='zeros'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
# Save the weights
model.save_weights('file')
# Create another identical model except with weight initialized to 0
model2 = Sequential()
model2.add(Dense(1, input_dim=1, kernel_initializer='zeros',
bias_initializer='zeros'))
model2.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
# Load the weight from the first model
model2.load_weights('file')
# Call model with Tensorflow tensor
v = tf.Variable([[1, ], ], dtype=tf.float32)
node = model2(v)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(node), model2.predict(np.array([[1, ], ])))
# Prints (array([[ 0.]], dtype=float32), array([[ 1.]], dtype=float32))
Why I want to do this:
I want to use a trained network in another minimization scheme were the network "punish" places in the search space that are not allowed. So if you have ideas not involving this specific approach, that is also very appreciated.
Finally found the answer. There are two problems in the example from the question.
1:
The first and most obvious was that I called the tf.global_variables_intializer() function which will re-initialize all variables in the session. Instead I should have called the tf.variables_initializer(var_list) where var_list is a list of variables to initialize.
2:
The second problem was that Keras did not use the same session as the native Tensorflow objects. This meant that to be able to run the tensorflow object model2(v) with my session sess it needed to be reinitialized. Again Keras as a simplified interface to tensorflow: Tutorial was able to help
We should start by creating a TensorFlow session and registering it with Keras. This means that Keras will use the session we registered to initialize all variables that it creates internally.
import tensorflow as tf
sess = tf.Session()
from keras import backend as K
K.set_session(sess)
If we apply these changes to the example provided in my question we get the following code that does exactly what is expected from it.
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense
sess = tf.Session()
# Register session with Keras
K.set_session(sess)
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='ones',
bias_initializer='zeros'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.save_weights('test')
model2 = Sequential()
model2.add(Dense(1, input_dim=1, kernel_initializer='zeros',
bias_initializer='zeros'))
model2.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model2.load_weights('test')
v = tf.Variable([[1, ], ], dtype=tf.float32)
node = model2(v)
init = tf.variables_initializer([v, ])
sess.run(init)
print(sess.run(node), model2.predict(np.array([[1, ], ])))
# prints: (array([[ 1.]], dtype=float32), array([[ 1.]], dtype=float32))
Conclusion:
The lesson is that when mixing Tensorflow and Keras, make sure everything uses the same session.
Thanks for asking this question, and answering it, it helped me! In addition to setting the same tf session in the Keras backend, it is also important to note that if you want to load a Keras model from a file, you need to run a global variable initializer op before you load the model.
sess = tf.Session()
# make sure keras has the same session as this code
tf.keras.backend.set_session(sess)
# Do this BEFORE loading a keras model
init_op = tf.global_variables_initializer()
sess.run(init_op)
model = models.load_model('path/to/your/model.h5')