Floating point operations results are different in Android, TensorFlow and Pytorch - tensorflow

I am trying to compare the floating point numbers on Android, Tensorflow and Pytorch. What I have observed is I am getting same result for Tensorflow and Android but different on Pytorch as Android and Tensorflow are performing round-down operation. Please see the following result:
TensorFlow
import tensorflow as tf
a=tf.convert_to_tensor(np.array([0.9764764, 0.79078835, 0.93181187]), dtype=tf.float32)
session = tf.Session()
result = session.run(a*a*a*a)
print(result)
PyTorch
import torch as th
th.set_printoptions(precision=8)
a=th.from_numpy(np.array([0.9764764, 0.79078835, 0.93181187])).type(th.FloatTensor)
result = a*a*a*a
print(result)
Android:
for (index in 0 until a.size) {
var res = a[index] * a[index] * a[index] * a[index]
result.add(res)
}
print("r=$result")
The result is as follows:
Android: [0.9091739, 0.3910579, 0.7538986]
TensorFlow: [0.9091739, 0.3910579, 0.7538986]
PyTorch: [0.90917391, 0.39105791, 0.75389862]
You can see that PyTorch value is different. I know that this effect is minimal in this example but when we are performing training, and we are running for 1000 rounds with different batches and epochs, this different can be accumulated and can show un-desirable results. Can anyone point out how can we fix to have same number on three platforms.
Thanks.

You are not using the same level of precision when printing, hence why you get different results. Internally, those results are identical, it's just an artifact that you see due do the default of python to print only 7 digits after the comma.
If we set the same level of precision in numpy as the one you set in PyTorch, we get:
import numpy as np
import tensorflow as tf
# setting the print precision of numpy to 8 like in your pytorch example
np.set_printoptions(precision=8, floatmode="fixed")
a=tf.convert_to_tensor(np.array([0.9764764, 0.79078835, 0.93181187]), dtype=tf.float32)
session = tf.Session()
result = session.run(a*a*a*a)
print(result)
Results in:
[0.90917391 0.39105791 0.75389862]
Exactly the same as in PyTorch.

Related

Learning a Categorical Variable with TensorFlow Probability

I would like to use TFP to write a neural network where the output are the probabilities of a categorical variable with 3 classes, and train it using the negative log-likelihood.
As I'm moving my first steps with TF and TFP, I started with a toy model where the input layer has only 1 unit receiving a null input, and the output layer has 3 units with softmax activation function. The idea is that the biases should learn (up to an additive constant) the log of the probabilities.
Here below is my code, true_p are the true parameters I use to generate the data and I would like to learn, while learned_p is what I get from the NN.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from functions import nll
from tensorflow.keras.optimizers import SGD
import tensorflow.keras.layers as layers
import tensorflow_probability as tfp
tfd = tfp.distributions
# params
true_p = np.array([0.1, 0.7, 0.2])
n_train = 1000
# training data
x_train = np.array(np.zeros(n_train)).reshape((n_train,))
y_train = np.array(np.random.choice(len(true_p), size=n_train, p=true_p)).reshape((n_train,))
# model
input_layer = layers.Input(shape=(1,))
p_layer = layers.Dense(len(true_p), activation=tf.nn.softmax)(input_layer)
p_y = tfp.layers.DistributionLambda(tfd.Categorical)(p_layer)
model_p = keras.models.Model(inputs=input_layer, outputs=p_y)
model_p.compile(SGD(), loss=nll)
# training
hist_p = model_p.fit(x=x_train, y=y_train, batch_size=100, epochs=3000, verbose=0)
# check result
learned_p = np.round(model_p.layers[1].call(tf.constant([0], shape=(1, 1))).numpy(), 3)
learned_p
With this setup, I get the result:
>>> learned_p
array([[0.005, 0.989, 0.006]], dtype=float32)
I over-estimate the second category, and can't really distinguish between the first and the third one. What's worst, if I plot the probabilities at the end of each epoch, it looks like they are converging monotonically to the vector [0,1,0], which doesn't make sense (it seems to me the gradient should push in the opposite direction once I start to over-estimate).
I really can't figure out what's going on here, but have the feeling I'm doing something plain wrong. Any idea? Thank you for your help!
For the record, I also tried using other optimizers like Adam or Adagrad playing with the hyper-params, but with no luck.
I'm using Python 3.7.9, TensorFlow 2.3.1 and TensorFlow probability 0.11.1
I believe the default argument to Categorical is not the vector of probabilities, but the vector of logits (values you'd take softmax of to get probabilities). This is to help maintain precision in internal Categorical computations like log_prob. I think you can simply eliminate the softmax activation function and it should work. Please update if it doesn't!
EDIT: alternatively you can replace the tfd.Categorical with
lambda p: tfd.Categorical(probs=p)
but you'll lose the aforementioned precision gains. Just wanted to clarify that passing probs is an option, just not the default.

Keras model.evaluate and model.predict give different results

When I build a model using Lambda layer to calculate the sum along an axis, model.evaluate() gives different result in comparison to manually calculating from model.predict(). I have checked the output data types and shapes and it did not help. The problem only occurs when I use tensorflow as Keras backend. It works fine when I use CNTK as backend.
Here is a minimal code that can reproduce this behavior.
from keras.layers import Input,Lambda
import keras.backend as K
from keras.models import Model
import numpy as np
inp=Input((2,))
out=Lambda(lambda x:K.sum(x,axis=-1),output_shape=(1,))(inp)
model=Model(input=inp,output=out)
model.compile(loss="mse",optimizer="sgd")
xs=np.random.random((3,2))
ys=np.sum(xs,axis=1)
print(np.mean((model.predict(xs)-ys)**2)) # This is zero
print(model.evaluate(xs,ys)) # This is not zero
I establish a network with no parameters and should simply calculate the sum for each x, and ys are constructed so that it should be the same with the output from the model. model.predict() gives the same results as ys, but model.evaluate() gives something non-zero when I use tensorflow as backend.
Any idea about why this should happen? Thanks!

How to print full (not truncated) tensor in tensorflow?

Whenever I try printing I always get truncated results
import tensorflow as tf
import numpy as np
np.set_printoptions(threshold=np.nan)
tensor = tf.constant(np.ones(999))
tensor = tf.Print(tensor, [tensor])
sess = tf.Session()
sess.run(tensor)
As you can see I've followed a guide I found on Print full value of tensor into console or write to file in tensorflow
But the output is simply
...\core\kernels\logging_ops.cc:79] [1 1 1...]
I want to see the full tensor, thanks.
This is solved easily by checking the Tensorflow API for tf.Print. Pass summarize=n where n is the number of elements you want displayed.
You can do it as follows in TensorFlow 2.x:
import tensorflow as tf
tensor = tf.constant(np.ones(999))
tf.print(tensor, summarize=-1)
From TensorFlow docs -> summarize: The first and last summarize elements within each dimension are recursively printed per Tensor. If set to -1, it will print all elements of every tensor.
https://www.tensorflow.org/api_docs/python/tf/print
To print all tensors without truncation in TensorFlow 2.x:
import numpy as np
import sys
np.set_printoptions(threshold=sys.maxsize)

How to give multiple input at each time step of a sequential data to a recurrent neural network using tensorflow?

Suppose i am having a data set with: number of observations = 1000, each observation is a sequence of fixed length = 10(lets say), and each point in the sequence having 2 features(numerical). how we can input such data to an rnn in tensorflow ?
Any small suggestions also accepted. Thanks
According to your description, Your dataset is 1000x10x2
which looks something like this:
import numpy as np
data=np.random.randint(0,10,[1000,10,2])
Now as you said your sequence is fixed size so you don't need padding , now you have to just decide batch_size and then iterations
suppose batch size is 5:
batch_size=5
iterations=int(len(train_dataset)//batch_size)
Now feed your input to tensorflow lstm cell , your model would be something like this:
Here is example without batch size,
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn
data=np.random.randint(0,10,[1000,10,2])
input_x=tf.placeholder(tf.float32,[1000,10,2])
with tf.variable_scope('encoder') as scope:
cell=rnn.LSTMCell(150)
model=tf.nn.dynamic_rnn(cell,inputs=input_x,dtype=tf.float32)
output_,(fs,fc)=model
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(model, feed_dict={input_x: data})
print(output)
if you want to use batch then you have to either reshape data for LSTM or you have to use embedding, because LSTM takes rank 3

How to make contrib.learn (SKFLOW) regression predict multiple values (TensorFlow 0.12.1)

This is very similar to the question skflow regression predict multiple values. However, later versions of TensorFlow seem to rendered the answer from this question obsolete.
I would like to be able to have multiple output neurons in a TensorFlow Learn regression neural network (DNNRegressor). I upgraded the code from the referenced question to account for breaking changes in TensorFlow, but still get an error.
import numpy as np
import tensorflow.contrib.learn as skflow
import tensorflow as tf
from sklearn.metrics import mean_squared_error
# Create random dataset.
rng = np.random.RandomState(1)
X = np.sort(200 * rng.rand(100, 1) - 100, axis=0)
y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T
# Fit regression DNN model.
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=X.shape[0])]
regressor = skflow.DNNRegressor(hidden_units=[5, 5],feature_columns=feature_columns)
regressor.fit(X, y)
score = mean_squared_error(regressor.predict(X), y)
print("Mean Squared Error: {0:f}".format(score))
But this results in:
ValueError: Shapes (?, 1) and (?, 2) are incompatible
I don't seen any release notes about breaking changes that indicate that the method for multiple outputs have changed. Is there another way to do this?
As mentioned in the tf.contrib.learn.DNNRegressor docs, you may use the label_dimension parameter, which is exactly what you are looking for.
Your code line with this param will do what you want:
regressor = skflow.DNNRegressor(hidden_units=[5, 5],
feature_columns=feature_columns,
label_dimension=2)
The standard predict() returns an generator object. To get an array, you have to add as_iterable=False:
score = metrics.mean_squared_error(regressor.predict(X, as_iterable=False), y)