Seed for dropout in Tensorflow LSTM - Difference in model(X) and model.predict(X) - tensorflow

Outputs for LSTM layer in tensorflow when using model(X) and model.predict(X) differ when using dropout.
Let's call the output of model(X) as Fwd Pass and model.predict(X) as Prediction
For a regular dropout layer, we can specify the seed but LSTM layer doesn't have such an argument. I'm guessing this is causing the difference between these Fwd Pass and Prediction.
In the following code sample, if dropout=0.4, these the outputs are different but when dropout=0.0 they match exactly. This makes me believe that every evaluation is using a different operation level seed.
Is there a way to set that? I've already set the global seed for tensforflow.
Is there something else going on, that I am not aware of?
PS: I want to use dropout during inference, so that is by design.
Code
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.initializers import GlorotUniform
SEED = 200
HIDDEN_UNITS = 4
N_OUTPUTS = 1
N_INPUTS = 4
BATCH_SIZE = 4
N_SAMPLES = 4
np.random.seed(SEED)
tf.random.set_seed(SEED)
# Simple LSTM Model
def my_model():
inputs = x = keras.Input(shape=(N_INPUTS, 1))
initializer = GlorotUniform(seed=SEED)
x = layers.LSTM(HIDDEN_UNITS,
kernel_initializer=initializer,
recurrent_dropout=0.0,
dropout=0.4,
# return_sequences=True,
use_bias=False)(x, training=True)
output = x
model = keras.Model(inputs=inputs, outputs=[output])
return model
# Create Sample Data
# Target Function
def f_x(x):
y = x[:, 0] + x[:, 1] ** 2 + np.sin(x[:, 2]) + np.sin(x[:, 3] ** 3)
y = y[:, np.newaxis]
return y
# Generate random inputs
d = np.linspace(0.1, 1, N_SAMPLES)
X = np.transpose(np.vstack([d*0.25, d*0.5, d*0.75, d]))
X = X[:, :, np.newaxis]
Y = f_x(X)
# PRINT FWD PASS
model = my_model()
n_out = model(X).numpy()
print('FWD PASS:')
print(n_out, '\n')
# PRINT PREDICT OUTPUT
print('PREDICT:')
out = model.predict(X)
print(out)
Output (dropout=0.4) - do not match
FWD PASS:
[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0.0526864 -0.13284351 0.02326298 -0.30357683]
[ 0.06297918 -0.14084947 0.02214929 -0.44425806]]
PREDICT:
[[ 0.00975818 -0.029404 0.00678372 -0.03232396]
[ 0.0347842 -0.0974849 0.01938616 -0.15696262]
[ 0. 0. 0. 0. ]
[ 0.06297918 -0.14084947 0.02214929 -0.44425806]]
Output (dropout=0.0) - no dropout, outputs match
FWD PASS:
[[ 0.00593475 -0.01799661 0.00424165 -0.01876264]
[ 0.02226446 -0.06519517 0.01399653 -0.08595844]
[ 0.03620889 -0.10084937 0.01987283 -0.1663805 ]
[ 0.0475584 -0.12453148 0.02269932 -0.2541136 ]]
PREDICT:
[[ 0.00593475 -0.01799661 0.00424165 -0.01876264]
[ 0.02226446 -0.06519517 0.01399653 -0.08595844]
[ 0.03620889 -0.10084937 0.01987283 -0.1663805 ]
[ 0.0475584 -0.12453148 0.02269932 -0.2541136 ]]

Related

How are input tensors with different shapes fed to neural network?

I am following this tutorial on Policy Gradient using Keras,
and can't quite figure out the below.
In the below case, how exactly are input tensors with different shapes fed to the model?
Layers are neither .concated or .Added.
input1.shape = (4, 4)
input2.shape = (4,)
"input" layer has 4 neurons, and accepts input1 + input2 as 4d vector??
The code excerpt (modified to make it simpler) :
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
from tensorflow.keras import backend as K
import numpy as np
input = tf.keras.Input(shape=(4, ))
advantages = tf.keras.Input(shape=[1])
dense1 = layers.Dense(32, activation='relu')(input)
dense2 = layers.Dense(32, activation='relu')(dense1)
output = layers.Dense(2, activation='softmax')(dense2)
model = tf.keras.Model(inputs=[input, advantages], outputs=[output])
# *********************************
input1 = np.array(
[[ 4.52281174e-02, 4.31672811e-02, -4.57789579e-02, 4.35560472e-02],
[ 4.60914630e-02, -1.51269339e-01, -4.49078369e-02, 3.21451106e-01],
[ 4.30660763e-02, 4.44624011e-02, -3.84788148e-02, 1.49510297e-02],
[ 4.39553243e-02, -1.50087194e-01, -3.81797942e-02, 2.95249428e-01]]
)
input2 = np.array(
[ 1.60063125, 1.47153674, 1.34113826, 1.20942261]
)
label = np.array(
[[1, 0],
[0, 1],
[1, 0],
[0, 1]]
)
model.compile(optimizer=optimizers.Adam(lr=0.0005), loss="binary_crossentropy")
model.train_on_batch([input1, input2], label)
In cases where you might want to figure out what type of graph you have just build, it is helpful to use the model.summary() or tf.keras.utils.plot_model() methods for debugging:
tf.keras.utils.plot_model(model, to_file="test.png", show_shapes=True, show_layer_names=True, show_dtype=True)
This will show you that your input_2 is indeed not used. Since you haven't connected it to the main graph with any operations, it has no weights associated with it (the graph runs but there is nothing to update on the right side):

scaling back data in customized keras training loss function

I define a customized loss function for my LSTM model (RMSE function) to be as follows:
def RMSE(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true)))
everything good so far, but the issue is that I scale my input data to be in the range of [-1, 1], so the reported loss will be associated with this scale, I want the model to report the training loss in the range of my original data, for example by applying the scaler.inverse_transform function on the y_true and y_pred somehow, but no luck doing it... as they are tensor and the scaler.inverse_transform requires numpy array....
any idea how to force re-scaling the data and reporting the loss values in the right scale?
scaler.inverse_transform essentially uses scaler.min_ and scaler.scale_ parameters to convert data in sklearn.preprocessing.minmaxscaler. An example:
from sklearn.preprocessing import MinMaxScaler
import numpy as np
data = np.array([[-1, 2], [-0.5, 6], [0, 10], [1, 18]])
scaler = MinMaxScaler()
data_trans = scaler.fit_transform(data)
print('transform:\n',data_trans)
data_inverse = (data_trans - scaler.min_)/scaler.scale_
print('inverse transform:\n',data_inverse)
# print
transform:
[[0. 0. ]
[0.25 0.25]
[0.5 0.5 ]
[1. 1. ]]
inverse transform:
[[-1. 2. ]
[-0.5 6. ]
[ 0. 10. ]
[ 1. 18. ]]
So you just need to use them to achieve your goals in RMSE function.
def RMSE_inverse(y_true, y_pred):
y_true = (y_true - K.constant(scaler.min_)) / K.constant(scaler.scale_)
y_pred = (y_pred - K.constant(scaler.min_)) / K.constant(scaler.scale_)
return K.sqrt(K.mean(K.square(y_pred - y_true)))

Why does sharing layers in keras make building the graph extremely slow (tensorflow backend)

I am building a graph where the input is split into a list of tensors of length 30. I then use a shared RNN layer on each element of the list.
It takes ~ 1 minute until the model is compiled. Does it have to be like this (why?) or is there anything I am doing wrong?
Code:
shared_lstm = keras.layers.LSTM(4, return_sequences=True)
shared_dense = TimeDistributed(keras.layers.Dense(1, activation='sigmoid'))
inp_train = keras.layers.Input([None, se.action_space, 3])
# Split each possible measured label into a list:
inputs_train = [ keras.layers.Lambda(lambda x: x[:, :, i, :])(inp_train) for i in range(se.action_space) ]
# Apply the shared weights on each tensor:
lstm_out_train = [shared_lstm(x) for x in inputs_train]
dense_out_train = [(shared_dense(x)) for x in lstm_out_train]
# Merge the tensors again:
out_train = keras.layers.Lambda(lambda x: K.stack(x, axis=2))(dense_out_train)
# "Pick" the unique element along where the inp_train tensor is == 1.0 (along axis=2, in the next time step, of the first dimension of axis=3)
# (please disregard this line if it seems too complex)
shift_and_pick_layer = keras.layers.Lambda(lambda x: K.sum(x[0][:, :-1, :, 0] * x[1][:, 1:, :, 0], axis=2))
out_train = shift_and_pick_layer([out_train, inp_train])
m_train = keras.models.Model(inp_train, out_train)

Tensorflow: Input pipeline with sparse data for the SVM estimator

Introduction:
I am trying to train the tensorflow svm estimator tensorflow.contrib.learn.python.learn.estimators.svm with sparse data. Sample usage with sparse data at the github repo at tensorflow/contrib/learn/python/learn/estimators/svm_test.py#L167 (I am not allowed to post more links, so here the relative path).
The svm estimator expects as parameter example_id_column and feature_columns, where the feature columns should be derived of class FeatureColumn such as tf.contrib.layers.feature_column.sparse_column_with_hash_bucket. See Github repo at tensorflow/contrib/learn/python/learn/estimators/svm.py#L85 and the documentation at tensorflow.org at python/contrib.layers#Feature_columns.
Question:
How do I have to set up my input pipeline to format sparse data in such a way that I can use one of the tf.contrib.layers feature_columns as input for the svm estimator.
How would a dense input function with many features look like?
Background
The data that I use is the a1a dataset from the LIBSVM website. The data set has 123 features (that would correspond to 123 feature_columns if the data would be dense). I wrote an user op to read the data like tf.decode_csv() but for the LIBSVM format. The op returns the labels as dense tensor and the features as sparse tensor. My input pipeline:
NUM_FEATURES = 123
batch_size = 200
# my op to parse the libsvm data
decode_libsvm_module = tf.load_op_library('./libsvm.so')
def input_pipeline(filename_queue, batch_size):
with tf.name_scope('input'):
reader = tf.TextLineReader(name="TextLineReader_")
_, libsvm_row = reader.read(filename_queue, name="libsvm_row_")
min_after_dequeue = 1000
capacity = min_after_dequeue + 3 * batch_size
batch = tf.train.shuffle_batch([libsvm_row], batch_size=batch_size,
capacity=capacity,
min_after_dequeue=min_after_dequeue,
name="text_line_batch_")
labels, sp_indices, sp_values, sp_shape = \
decode_libsvm_module.decode_libsvm(records=batch,
num_features=123,
OUT_TYPE=tf.int64,
name="Libsvm_decoded_")
# Return the features as sparse tensor and the labels as dense
return tf.SparseTensor(sp_indices, sp_values, sp_shape), labels
Here is an example batch with batch_size = 5.
def input_fn(dataset_name):
maybe_download()
filename_queue_train = tf.train.string_input_producer([dataset_name],
name="queue_t_")
features, labels = input_pipeline(filename_queue_train, batch_size)
return {
'example_id': tf.as_string(tf.range(1,123,1,dtype=tf.int64)),
'features': features
}, labels
This is what I tried so far:
with tf.Session().as_default() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
feature_column = tf.contrib.layers.sparse_column_with_hash_bucket(
'features', hash_bucket_size=1000, dtype=tf.int64)
svm_classifier = svm.SVM(feature_columns=[feature_column],
example_id_column='example_id',
l1_regularization=0.0,
l2_regularization=1.0)
svm_classifier.fit(input_fn=lambda: input_fn(TRAIN),
steps=30)
accuracy = svm_classifier.evaluate(
input_fn= lambda: input_fn(features, labels),
steps=1)['accuracy']
print(accuracy)
coord.request_stop()
coord.join(threads)
sess.close()
Here's an example, with made up data, that works for me in TensorFlow 1.1.0-rc2. I think my comment was misleading; you're best off converting ~100 binary features to real valued features (tf.sparse_tensor_to_dense) and using a real_valued_column, since sparse_column_with_integerized_feature is hiding most of the useful information from the SVM Estimator.
import tensorflow as tf
batch_size = 10
num_features = 123
num_examples = 100
def input_fn():
example_ids = tf.random_uniform(
[batch_size], maxval=num_examples, dtype=tf.int64)
# Construct a SparseTensor with features
dense_features = (example_ids[:, None]
+ tf.range(num_features, dtype=tf.int64)[None, :]) % 2
non_zeros = tf.where(tf.not_equal(dense_features, 0))
sparse_features = tf.SparseTensor(
indices=non_zeros,
values=tf.gather_nd(dense_features, non_zeros),
dense_shape=[batch_size, num_features])
features = {
'some_sparse_features': tf.sparse_tensor_to_dense(sparse_features),
'example_id': tf.as_string(example_ids)}
labels = tf.equal(dense_features[:, 0], 1)
return features, labels
svm = tf.contrib.learn.SVM(
example_id_column='example_id',
feature_columns=[
tf.contrib.layers.real_valued_column(
'some_sparse_features')],
l2_regularization=0.1, l1_regularization=0.5)
svm.fit(input_fn=input_fn, steps=1000)
positive_example = lambda: {
'some_sparse_features': tf.sparse_tensor_to_dense(
tf.SparseTensor([[0, 0]], [1], [1, num_features])),
'example_id': ['a']}
print(svm.evaluate(input_fn=input_fn, steps=20))
print(next(svm.predict(input_fn=positive_example)))
negative_example = lambda: {
'some_sparse_features': tf.sparse_tensor_to_dense(
tf.SparseTensor([[0, 0]], [0], [1, num_features])),
'example_id': ['b']}
print(next(svm.predict(input_fn=negative_example)))
Prints:
{'accuracy': 1.0, 'global_step': 1000, 'loss': 1.0645389e-06}
{'logits': array([ 0.01612902], dtype=float32), 'classes': 1}
{'logits': array([ 0.], dtype=float32), 'classes': 0}
Since TensorFlow 1.5.0 there is an inbuilt function to read LIBSVM data,
refer to my answer here
https://stackoverflow.com/a/56354308/3885491

Batch Normalization - Tensorflow

I have looked at a few BN examples but still am a bit confused. So I am currently using this function which calls the function here;
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard4/tf.contrib.layers.batch_norm.md
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
import tensorflow as tf
def bn(x,is_training,name):
bn_train = batch_norm(x, decay=0.9, center=True, scale=True,
updates_collections=None,
is_training=True,
reuse=None,
trainable=True,
scope=name)
bn_inference = batch_norm(x, decay=1.00, center=True, scale=True,
updates_collections=None,
is_training=False,
reuse=True,
trainable=False,
scope=name)
z = tf.cond(is_training, lambda: bn_train, lambda: bn_inference)
return z
This following part is a toy run where I am just checking that the function reuses the means and variances calculated in a training step for two features. Running this part of the code in test mode i.e. is_training=False, the running mean/variances calculated in the training step are changing which can be seen when we print out the BN variables which I get from calling bnParams
if __name__ == "__main__":
print("Example")
import os
import numpy as np
import scipy.stats as stats
np.set_printoptions(suppress=True,linewidth=200,precision=3)
np.random.seed(1006)
import pdb
path = "batchNorm/"
if not os.path.exists(path):
os.mkdir(path)
savePath = path + "bn.model"
nFeats = 2
X = tf.placeholder(tf.float32,[None,nFeats])
is_training = tf.placeholder(tf.bool,name="is_training")
Y = bn(X,is_training=is_training,name="bn")
mvn = stats.multivariate_normal([0,100])
bs = 4
load = 0
train = 1
saver = tf.train.Saver()
def bnCheck(batch,mu,std):
# Checking calculation
return (x - mu)/(std + 0.001)
with tf.Session() as sess:
if load == 1:
saver.restore(sess,savePath)
else:
tf.global_variables_initializer().run()
#### TRAINING #####
if train == 1:
for i in xrange(100):
x = mvn.rvs(bs)
y = Y.eval(feed_dict={X:x, is_training.name: True})
def bnParams():
beta, gamma, mean, var = [v.eval() for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope="bn")]
return beta, gamma, mean, var
beta, gamma, mean, var = bnParams()
#### TESTING #####
for i in xrange(10):
x = mvn.rvs(1).reshape(1,-1)
check = bnCheck(x,mean,np.sqrt(var))
y = Y.eval(feed_dict={X:x, is_training.name: False})
print("x = {0}, y = {1}, check = {2}".format(x,y,check))
beta, gamma, mean, var = bnParams()
print("BN Params: Beta {0} Gamma {1} mean {2} var{3} \n".format(beta,gamma,mean,var))
saver.save(sess,savePath)
The first three iterations of test loop look as follows;
x = [[ -1.782 100.941]], y = [[-1.843 1.388]], check = [[-1.842 1.387]]
BN Params: Beta [ 0. 0.] Gamma [ 1. 1.] mean [ -0.2 99.93] var[ 0.818 0.589]
x = [[ -1.245 101.126]], y = [[-1.156 1.557]], check = [[-1.155 1.557]]
BN Params: Beta [ 0. 0.] Gamma [ 1. 1.] mean [ -0.304 100.05 ] var[ 0.736 0.53 ]
x = [[ -0.107 99.349]], y = [[ 0.23 -0.961]], check = [[ 0.23 -0.96]]
BN Params: Beta [ 0. 0.] Gamma [ 1. 1.] mean [ -0.285 99.98 ] var[ 0.662 0.477]
I am not doing BP so beta and gamma won't change. However my running means/variances are changing. Where am I going wrong?
EDIT:
It would be good to know why these variables need/do not need changing between test and train;
updates_collections, reuse, trainable
Your bn function is wrong. Use this instead:
def bn(x,is_training,name):
return batch_norm(x, decay=0.9, center=True, scale=True,
updates_collections=None,
is_training=is_training,
reuse=None,
trainable=True,
scope=name)
is_training is bool 0-D tensor signaling whether to update running mean etc. or not. Then by just changing the tensor is_training you're signaling whether you're in training or test phase.
EDIT:
Many operations in tensorflow accept tensors, and not constant True/False number arguments.
When you use slim.batch_norm,be sure to use slim.learning.create_train_op instead of tf.train.GradientDecentOptimizer(lr).minimize(loss) or other optimizer. Try it to see if it works!