TPU keras regression very slow compared to GPU/CPU - tensorflow

I'm doing a regression on a column in a dataframe. When I use a CPU, each epoch is ~95 seconds, when using a GPU it's ~45 seconds, but when using a TPU it's over 8 mins for each epoch.
I basically initialized the tpu, wrapped my model definition and compile into a TPU distribution strategy.
I * think * the problem is in my dataset. I've seen tutorials were the data is put into tensors (for my gpu/cpu performance I was sending the dataframe (X_train and y_train in my code below). I tried to both dataframe and tensor's, both are order of magnitude worst than a cpu. I'm sure this is a user error I just can't see my mistake.
Here's my code:
#setup tpu
import os
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
# This is the TPU initialization code that has to be at the beginning.
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))
tpu_strategy = tf.distribute.TPUStrategy(resolver)
def KerasRegression(FullDF, variableToPredict):
df_train1 = FullDF[FullDF[variableToPredict].notna()].copy() #lets make train data not have na for variable we are trying to predict
X_train = df_train1.drop(variableToPredict, axis=1)
y_train = df_train1[variableToPredict].copy()
x_train_shape = X_train.shape[1]
dateset=tf.data.Dataset.from_tensor_slices((X_train, y_train)).batch(batch_size=100).prefetch(buffer_size=5000)
activationLayer = 'relu'
with tpu_strategy.scope():
model = Sequential()
model.add(Dense(x_train_shape, activation=activationLayer, input_dim=x_train_shape))
model.add(Dense(x_train_shape, activation=activationLayer))
model.add(Dense(1, activation='linear'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer, metrics=['mse'],
experimental_steps_per_execution = 50)
model.fit(dateset, epochs=100)
# model.fit(X_train, y_train, epochs=100)
return model
Also if it helps the shape of my testing data is:
(590543, 209)
Any feedback is welcomed!

Related

Feeding tensorflow keras architecture with Sparse matrix of type scipy.sparse._csr.csr_matrix

Short Version:
I am trying to feed my data in the form of sparse matrix (of the type scipy.sparse._csr.csr_matrix') into a Tensorflow Keras Neural Network model. I highly appreciate any guidance. todense() and toarray() are not options for me. Also feeding in mini batches is not preferred.
Long version (including my efforts):
The problem is about a deep learning model with text, categorical and numerical features. My TfidfVectorizer creates a huge matrix which cannot be fed into a model as dense format.
text_cols = ['ca_name']
categorical_cols = ['cua_name','ca_category_modified']
numerical_cols = ['vidim1', 'vidim2', 'vidim3', 'vim', 'vid']
title_transformer = TfidfVectorizer()
numerical_transformer = MinMaxScaler()
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
preprocessor = ColumnTransformer(
transformers=[
('title', title_transformer, text_cols[0]),
('num', numerical_transformer, numerical_cols),
('cat', categorical_transformer, categorical_cols)
])
# df['dur_linreg] is my numerical target
X_train, X_test, y_train, y_test = train_test_split(df[text_cols+categorical_cols+numerical_cols], df['dur_linreg'], test_size=0.2, random_state=42)
# fit_transform the preprocessor on X_train, only transform X_test
X_train_transformed = preprocessor.fit_transform(X_train)
X_test_transformed = preprocessor.transform(X_test)
I can build and compile a model as following:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train_transformed.shape[1],)))
modeladd(tf.keras.layers.Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
But cannot fit it:
history = model.fit(X_train_transformed, y_train, epochs=20, batch_size=32, validation_data=(X_test_transformed, y_test))
InvalidArgumentError: Graph execution error: TypeError: 'SparseTensor' object is not subscriptable
Obviously because I am feeding the model with a sparse scipy.sparse._csr.csr_matrix matrix.
The size of my matrix and my resources restrict me to transform it to
dense format:
X_train_transformed.todense()
MemoryError: Unable to allocate 205. GiB for an array with shape (275189, 100074) and data type float64
2) (obviously) array:
X_train_transformed.toarray()
MemoryError: Unable to allocate 205. GiB for an array with shape (275189, 100074) and data type float64
According to a post "https://stackoverflow.com/questions/41538692/using-sparse-matrices-with-keras-and-tensorflow" I there are two approaches
" Keep it as a scipy sparse matrix, then, when giving Keras a minibatch, make it dense
Keep it sparse all the way through, and use Tensorflow Sparse Tensors"
The second approach is preferred for me as well. Therefore, I tried the following as well:
However, again I could only build and compile the model without a problem:
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.models import Model
input_layer = Input(shape=(X_train_transformed.shape[1],), sparse=True)
dense1 = Dense(64, activation='relu')(input_layer)
dropout1 = Dropout(0.2)(dense1)
dense2 = Dense(64, activation='relu')(dropout1)
dropout2 = Dropout(0.2)(dense2)
output_layer = Dense(1)(dropout2)
model = Model(input_layer, output_layer)
model.compile(optimizer='adam', loss='mean_squared_error')
But cannot fit it:
history = model.fit(X_train_transformed, y_train, validation_data=(X_test_transformed, y_test), epochs=5, batch_size=32)
InvalidArgumentError: Graph execution error:TypeError: 'SparseTensor' object is not subscriptable
Lastly, in case it is relevant I am using Tensorflow version 2.11.0 installed January 2023.
Many Thanks in advance for your help.

Getting constant accuracies for training and validation sets despite their losses are changing during CNN training?

As the title clearly describes the issue I've been experiencing during the training of my CNN model, the accuracies of training and validation sets are constant despite the losses of them are changing. I have included the detail regarding the model and its training setup below. What may cause this issue?
Here is the data that was used by training (X_train & y_train), validation, and test sets (X_test and y_test):
df = pd.read_csv(CSV_PATH, sep=',', header=None)
print(f'Shape of all data: {df.shape}')
y = df.iloc[:, -1].values
X = df.iloc[:, :-1].values
encoder = LabelEncoder()
encoder.fit(y)
encoded_Y = encoder.transform(y)
dummy_y = to_categorical(encoded_Y)
X_train, X_test, y_train, y_test = train_test_split(X, dummy_y, test_size=0.3, random_state=RANDOM_STATE)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
Here are the shapes of training and test sets:
Shape of X_train: (1322, 10800, 1)
Shape of Y_train: (1322, 3)
Shape of X_test: (567, 10800, 1)
Shape of y_test: (567, 3)
Here is my CNN model:
# Model hyper-parameters
activation_fn = 'relu'
n_lr = 1e-4
weight_decay = 1e-4
batch_size = 64
num_epochs = 200*10*10
num_classes = 3
n_dropout = 0.6
n_momentum = 0.5
n_kernel = 5
n_reg = 1e-5
# the sequential model
model = Sequential()
model.add(Conv1D(128, n_kernel, input_shape=(10800, 1)))
model.add(BatchNormalization())
model.add(Activation(activation_fn))
model.add(MaxPooling1D(pool_size=2, strides=2))
model.add(Dropout(n_dropout))
model.add(Conv1D(256, n_kernel))
model.add(BatchNormalization())
model.add(Activation(activation_fn))
model.add(MaxPooling1D(pool_size=2, strides=2))
model.add(Dropout(n_dropout))
model.add(GlobalAveragePooling1D()) # have tried model.add(Flatten()) as well
model.add(Dense(256, activation=activation_fn))
model.add(Dropout(n_dropout))
model.add(Dense(64, activation=activation_fn))
model.add(Dropout(n_dropout))
model.add(Dense(num_classes, activation='softmax'))
adam = Adam(lr=n_lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=weight_decay)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
Here is how I have evaluated the model:
Y_pred = model.predict(X_test, verbose=0)
y_pred = np.argmax(Y_pred, axis=1)
y_test_int = np.argmax(y_test, axis=1)
And, my model always predicts the same class of three classes during the model evaluation as you can see from the classification result below (via classification_result(y_test_int, y_pred) function):
precision recall f1-score support
normal 0.743 1.000 0.852 421
apb 0.000 0.000 0.000 45
pvc 0.000 0.000 0.000 101
The model was trained using the EarlyStopping callback of Keras. Thus, the training has continued for 4,173 epochs. Here is the obtained losses during the training for training and validation sets:
Here are the obtained accuracies during the training for training and validation sets:
The model was implemented using Keras and hosted on Google Colab.
Although such issues are difficult to resolve without the data, there are a couple of general rules applicable.
The very first thing we do when the model does not seem to learn anything, like here (despite the mild drop in the loss), is to remove all dropout.
In fact, dropout is not supposed to be used by default; its nominal function is to guard against overfitting - but of course, before starting to worry about overfitting, you must first have some success with fitting, something that is clearly not happening here. The fact that, with a dropout rate of n_dropout = 0.6, you also seem to be rather too aggressive in its use, does not help, either.

Why is tensorflow having a worse accuracy than keras in direct comparison?

I made a direct comparison between TensorFlow vs Keras with the same parameters and the same dataset (MNIST).
The strange thing is that Keras achieves 96% performance in 10 epochs, while TensorFlow achieves about 70% performance in 10 epochs. I have run this code many times in the same instance and this inconsistency always occurs.
Even setting 50 epochs for TensorFlow, the final performance reaches 90%.
Code:
import keras
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# One hot encoding
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
# Changing the shape of input images and normalizing
x_train = x_train.reshape((60000, 784))
x_test = x_test.reshape((10000, 784))
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
# Creating the neural network
model = Sequential()
model.add(Dense(30, input_dim=784, kernel_initializer='normal', activation='relu'))
model.add(Dense(30, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_initializer='normal', activation='softmax'))
# Optimizer
optimizer = keras.optimizers.Adam()
# Loss function
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
# Training
model.fit(x_train, y_train, epochs=10, batch_size=200, validation_data=(x_test, y_test), verbose=1)
# Checking the final accuracy
accuracy_final = model.evaluate(x_test, y_test, verbose=0)
print('Model Accuracy: ', accuracy_final)
TensorFlow code: (x_train, x_test, y_train, y_test are the same as the input for the Keras code above)
import tensorflow as tf
# Epochs parameters
epochs = 10
batch_size = 200
# Neural network parameters
n_input = 784
n_hidden_1 = 30
n_hidden_2 = 30
n_classes = 10
# Placeholders x, y
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Creating the first layer
w1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
b1 = tf.Variable(tf.random_normal([n_hidden_1]))
layer_1 = tf.nn.relu(tf.add(tf.matmul(x,w1),b1))
# Creating the second layer
w2 = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
b2 = tf.Variable(tf.random_normal([n_hidden_2]))
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1,w2),b2))
# Creating the output layer
w_out = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
bias_out = tf.Variable(tf.random_normal([n_classes]))
output = tf.matmul(layer_2, w_out) + bias_out
# Loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = output, labels = y))
# Optimizer
optimizer = tf.train.AdamOptimizer().minimize(cost)
# Making predictions
predictions = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
# Accuracy
accuracy = tf.reduce_mean(tf.cast(predictions, tf.float32))
# Variables that will be used in the training cycle
train_size = x_train.shape[0]
total_batches = train_size / batch_size
# Initializing the variables
init = tf.global_variables_initializer()
# Opening the session
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(epochs):
# Loop through all batch iterations
for i in range(0, train_size, batch_size):
batch_x = x_train[i:i + batch_size]
batch_y = y_train[i:i + batch_size]
# Fit training
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
# Running accuracy (with test data) on each epoch
acc_val = sess.run(accuracy, feed_dict={x: x_test, y: y_test})
# Showing results after each epoch
print ("Epoch: ", "{}".format((epoch + 1)))
print ("Accuracy_val = ", "{:.3f}".format(acc_val))
print ("Training Completed!")
# Checking the final accuracy
checking = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
accuracy_final = tf.reduce_mean(tf.cast(checking, tf.float32))
print ("Model Accuracy:", accuracy_final.eval({x: x_test, y: y_test}))
I'm running everything in the same instance. Can anyone explain this inconsistency?
I think it's the initialization that's the culprit. For example, one real difference is that you initialize bias in TF with random_normal which isn't the best practice, and in fact Keras defaults to initializing the bias to zero, which is the best practice. You don't override this, since you only set kernel_initializer, but not bias_initializer in your Keras code.
Furthermore, things are worse for the weight initializers. You are using RandomNormal for Keras, defined like so:
keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None)
But in TF you use tf.random.normal:
tf.random.normal(shape, mean=0.0, stddev=1.0, dtype=tf.dtypes.float32, seed=None, name=None)
I can tell you that using standard deviation of 0.05 is reasonable for initialization, but using 1.0 is not.
I suspect that if you changed these parameters, things would look better. But if they don't, I'd suggest dumping the TensorFlow graph for both models and just checking by hand to see the differences. The graphs are small enough in this case to double-check.
To some extent this highlights the difference in philosophy between Keras and TF. Keras tries hard to set good defaults for NN training that correspond to what is known to work. But TensorFlow is completely agnostic - you have to know those practices and explicitly code them in. The standard deviation thing is a stellar example: of course it should be 1 by default in a mathematical function, but 0.05 is a good value if you know it will be used to initialize an NN layer.
Answer originally provided by Dmitriy Genzel on Quora.

AlreadyExistsError while training a network on colab

I'm trying to train an LSTMs network on Google Colab. However, this error occurs:
AlreadyExistsError: Resource __per_step_116/training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
[[{{node training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var}}]]
I don't know where can be the issue. This is the model of the network:
sl_model = keras.models.Sequential()
sl_model.add(keras.layers.Embedding(max_index+1, hidden_size, mask_zero=True))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size,
activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=True)))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size, activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=False))
)
sl_model.add(keras.layers.Dense(max_length, activation='softsign'))
optimizer = keras.optimizers.Adam()
sl_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])
batch_size = 128
epochs = 3
cbk = keras.callbacks.TensorBoard("logging/keras_model")
print("\nStarting training...")
sl_model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
shuffle=True, validation_data=(x_dev, y_dev), callbacks=[cbk])
Thank you so much!
You need to restart your runtime -- this happens when you have defined multiple graphs built in a single jupyter (Colaboratory) runtime.
Calling tf.reset_default_graph() may also help, but depending on whether you are using eager exection and how you've defined your sessions this may or may not work.

Tensorflow dense layers worse than keras sequential

I try to train an agent on the inverse-pendulum (similar to cart-pole) problem, which is a benchmark of reinforcement learning. I use neural-fitted-Q-iteration algorithm which uses a multi-layer neural network to evaluate the Q function.
I use Keras.Sequential and tf.layers.dense to build the neural network repectively, and leave all other things to be the same. However, Keras gives me a good results and tensorflow does not. In fact, tensorflow doesn't work at all with its loss being increasing and the agent learns nothing from the training.
Here I present the code for Keras as follows
def build_model():
model = Sequential()
model.add(Dense(5, input_dim=3))
model.add(Activation('sigmoid'))
model.add(Dense(5))
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
adam = Adam(lr=1E-3)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
and the tensorflow version is
class NFQ_fit(object):
"""
neural network approximator for NFQ iteration
"""
def __init__(self, sess, N_feature, learning_rate=1E-3, batch_size=100):
self.sess = sess
self.N_feature = N_feature
self.learning_rate = learning_rate
self.batch_size = batch_size
# DNN structure
self.inputs = tf.placeholder(tf.float32, [None, N_feature], 'inputs')
self.labels = tf.placeholder(tf.float32, [None, 1], 'labels')
self.l1 = tf.layers.dense(inputs=self.inputs,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-1')
self.l2 = tf.layers.dense(inputs=self.l1,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-2')
self.outputs = tf.layers.dense(inputs=self.l2,
units=1,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='outputs')
# optimization
# self.mean_loss = tf.losses.mean_squared_error(self.labels, self.outputs)
self.mean_loss = tf.reduce_mean(tf.square(self.labels-self.outputs))
self.regularization_loss = tf.losses.get_regularization_loss()
self.loss = self.mean_loss # + self.regularization_loss
self.train_op = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
The two models are the same. Both of them has two hidden layers with the same dimension. I expect that the problems may come from the kernel initialization but I don't know how to fix it.
Using Keras is great. If you want better TensorFlow integration check out tf.keras. There's no particular reason to use tf.layers if the Keras (or tf.keras) defaults work better.
In this case glorot_uniform looks like the default initializer. This is also the global TensorFlow default, so consider removing the kernel_initializer argument instead of the explicit truncated normal initialization in your question (or passing Glorot explicitly).