I'm brand new to Tensorflow, but I'm trying to figure out why these results end in ...001, ...002, etc.
I'm following the tutorial here: https://www.tensorflow.org/get_started/get_started
"""This is a Tensorflow learning script."""
import tensorflow as tf
sess = tf.Session()
W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W*x + b
sess.run(tf.global_variables_initializer()) #This is the same as the above 2 lines
print(sess.run(linear_model, {x: [1, 2, 3, 4]}))
It looks like a simple math function where if I was using 2 as an input, it would be (0.3 * 2) + -0.3 = 0.3.
[ 0. 0.30000001 0.60000002 0.90000004]
I would expect:
[ 0. 0.3 0.6 0.9]

That's probably a floating point error, because you introduced your variables as a tf.float32 dtype. You could use tf.round (https://www.tensorflow.org/api_docs/python/tf/round) but it doesn't seem to have round-to-the-nearest decimal place capability yet. For that, check out the response in: tf.round() to a specified precision.

The issue is that a floating point variable (like tf.float32) simply cannot store exactly 0.3 due to being stored in binary. It's like trying to store exactly 1/3 in decimal, it'd be 0.33... but you'd have to go out to infinity to get the exact number (which isn't possible our mortal realm!).
See the python docs for more in depth review of the subject.
Tensorflow doesn't have a way to deal with decimal numbers yet (as far as I know)! But once the numbers are returned to python you could round & then convert to a Decimal.


Why does pytorch matmul get different results when executed on cpu and gpu?

I am trying to figure out the rounding difference between numpy/pytorch, gpu/cpu, float16/float32 numbers and what I'm finding confuses me.
The basic version is:
a = torch.rand(3, 4, dtype=torch.float32)
b = torch.rand(4, 5, dtype=torch.float32)
print(a.numpy()#b.numpy() - a#b)
The result is all zeros as expected, however
print((a.cuda()#b.cuda()).cpu() - a#b)
gets non-zero results. Why is Pytorch float32 matmul executed differently on gpu and cpu?
An even more confusing experiment involves float16, as follows:
a = torch.rand(3, 4, dtype=torch.float16)
b = torch.rand(4, 5, dtype=torch.float16)
print(a.numpy()#b.numpy() - a#b)
print((a.cuda()#b.cuda()).cpu() - a#b)
these two results are all non-zero. Why are float16 numbers handled differently by numpy and torch? I know cpu can only do float32 operations and numpy convert float16 to float32 before computing, however the torch calculation is also executed on cpu.
And guess what, print((a.cuda()#b.cuda()).cpu() - a.numpy()#b.numpy()) gets an all zero result! This is pure fantasy for me...
The environment is as follow:
python: 3.8.5
torch: 1.7.0
numpy: 1.21.2
cuda: 11.1
gpu: GeForce RTX 3090
On the advice of some of the commenters, I add the following equal test
(a.numpy()#b.numpy() - (a#b).numpy()).any()
((a.cuda()#b.cuda()).cpu() - a#b).numpy().any()
(a.numpy()#b.numpy() - (a#b).numpy()).any()
((a.cuda()#b.cuda()).cpu() - a#b).numpy().any()
((a.cuda()#b.cuda()).cpu().numpy() - a.numpy()#b.numpy()).any()
respectively directly following the above five print functions, and the results are:
And for the last one, I've tried several times and I think I can rule out luck.
The differences are mostly numerical, as mentioned by #talonmies. CPU/GPU and their respectively BLAS libraries are implemented differently and use different operations/order-of-operation, hence the numerical difference.
One possible cause is sequential operation vs. reduction (https://discuss.pytorch.org/t/why-different-results-when-multiplying-in-cpu-than-in-gpu/1356/3), e.g. (((a+b)+c)+d) will have different numerical properties as compared with ((a+b)+(c+d)).
This question also talks about fused operations (multiply-add) which can cause numerical differences.
I did a little bit of testing, and find that the GPU's output in float16 mode can be matched if we promote the datatype to float32 before computation and demote it afterward. This can be caused by internal intermediate casting or the better numerical stability of fused operations (torch.backends.cudnn.enabled does not matter). This does not solve the case in float32 though.
import torch
def test(L, M, N):
# test (L*M) # (M*N)
for _ in range(5000):
a = torch.rand(L, M, dtype=torch.float16)
b = torch.rand(M, N, dtype=torch.float16)
cpu_result = a#b
gpu_result = (a.cuda()#b.cuda()).cpu()
if (cpu_result-gpu_result).any():
print(f'({L}x{M}) # ({M}x{N}) failed')
print(f'({L}x{M}) # ({M}x{N}) passed')
test(1, 1, 1)
test(1, 2, 1)
test(4, 1, 4)
test(4, 4, 4)
def test2():
for _ in range(5000):
a = torch.rand(1, 2, dtype=torch.float16)
b = torch.rand(2, 1, dtype=torch.float16)
cpu_result = a#b
gpu_result = (a.cuda()#b.cuda()).cpu()
half_result = a[0,0]*b[0,0] + a[0,1]*b[1,0]
convert_result = (a[0,0].float()*b[0,0].float() + a[0,1].float()*b[1,0].float()).half()
if ((cpu_result-half_result).any()):
print('CPU != half')
if (gpu_result-convert_result).any():
print('GPU != convert')
print('All passed')
(1x1) # (1x1) passed
(1x2) # (2x1) failed
(4x1) # (1x4) passed
(4x4) # (4x4) failed
All passed
You can tell that when the inner dimension is 1, it passes the check (no multiply-add/reduction needed).

How does GEKKO optimization with bounded variables work?

I am using GEKKO to estimate the parameters of a differential equation and I have bounded one of the variables between 0 and 1. However, when I solve the ODE, I get values outside of the bounds for this variable, so I was wondering if somebody knew how GEKKO finds the solution, as this might help me resolve the issue.
Here is the code I use to fit the data. This gives me a solution x and u where x is between 0 and 1.
However, afterwards, I try to solve the ODE using scipy.integrate.solve_ivp, with the initial value of u that I got, and the solution I get for u is not between this bounds. Since it should be unique, I am wondering what is the process that GEKKO follows to find the solution (does it proyect the values to the bound or how does it deal with this?) Any comment is very appreciated.
Here is an MVCE. If you run it you can see that with GEKKO, I get a solution between the bounds and then, when I solve the ODE with solve_ivp, I don't get the same solution. Can you explain why this happens and how can I deal with it? I want to use solve_ivp to predict the next values.
from scipy.integrate import solve_ivp
from gekko import GEKKO
import matplotlib.pyplot as plt
time=[0.0, 0.11784511784511785, 0.18855218855218855,\
m = GEKKO(remote=False)
m.time= [0.0, 0.11784511784511785, 0.18855218855218855,\
x_data= [0.0003777630481280617, 0.002024573836061331,\
0.0008954383363035536, 0.005331749410182463]
x = m.CV(value=x_data, lb=0); x.FSTATUS = 1 # fit to measurement
x.SPLO = 0
sigma = m.FV(value=0.5, lb= 0, ub=1); sigma.STATUS=1
d = m.Param(0.05)
k = m.Param(0.001)
b = m.Param(0.5)
r = m.FV(value=0.5, lb= 0); r.STATUS=1
m_param = m.Param(1)
u = m.Var(value=0.1, lb=0, ub=1)
a = m.Param(0.999)
Kmax= m.Param(100000)
m_param/(k+b*u)-d), u.dt() == \
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.options.EV_TYPE = 1 # linear error (2 for squared)
m.solve(disp=False, debug=False) # display solver output
def model_case_3(t, z, r, k, b, Kmax, sigma):
x, u= z
dxdt = x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-m/(k+b*u)-0.05)
dudt = sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
return [dxdt, dudt]
sol = solve_ivp(fun=model_case_3, t_span=[0.0, 0.2356902356902357],\
y0=[0.0003777630481280617, u.value[0]],\
t_eval=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357], \
args=(r.value[0], 0.001, 0.5,1000000 , sigma.value[0]))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,3), constrained_layout=True)
ax1.plot(time, x.value, time, sol['y'][0])
ax2.plot(time, u.value, time, sol['y'][1])
It is not an issue with the version of Gekko as I have Gekko 0.2.8, so I am wondering if it has anything to do with the initialization of variables. I run the example I posted on spyder (I was using google colab) and I got the correct solution, but when I run the rest of the cases I got again negative values for u (solving with solve_ivp), which is quite strange.
You can add a bound to a variable when it is created by setting lb (lower bound) and ub (upper bound).
z = m.Var(lb=0,ub=10)
After you create the variable, the bound is adjusted with .lower and .upper.
z.LOWER = 1
z.UPPER = 9
Here is an example problem that shows the use of bounds where x is constrained to be greater than 0.5.
from gekko import GEKKO
t_data = [0, 0.1, 0.2, 0.4, 0.8, 1]
x_data = [2.0, 1.6, 1.2, 0.7, 0.3, 0.15]
m = GEKKO(remote=False)
m.time = t_data
x = m.CV(value=x_data,lb=0.5,ub=3); x.FSTATUS = 1 # fit to measurement
k = m.FV(); k.STATUS = 1 # adjustable parameter
m.Equation(x.dt()== -k * x) # differential equation
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.solve(disp=False) # display solver output
k = k.value[0]; print(k)
A plot of the results shows that the bounds are enforced but the model prediction does not fit the data because of the lower bound constraint (x>=0.5).
import numpy as np
import matplotlib.pyplot as plt # plot solution
label='Predicted (k='+str(np.round(k,2))+')')
# plot exact solution
t = np.linspace(0,1); xe = 2*np.exp(-k*t)
plt.plot(t,xe,'k:',label='Exact Solution')
plt.xlabel('Time'), plt.ylabel('Value')
Without the restrictive lower bound, the solver optimizes to best fit the points.
x = m.CV(value=x_data,lb=0.0,ub=3)
Response 1 to Question Edit
The only way that a variable (such as u) is outside of the bounds is if the solver did not report a successful solution. To report a successful solution, the solver must satisfy the Karush Kuhn Tucker conditions for optimality. I recommend that you check that it satisfied all of the equations by checking that m.options.APPSTATUS==1 after the m.solve() command. If you can include an MVCE (https://stackoverflow.com/help/minimal-reproducible-example) that has sample data so the script can run, we can help you check it.
Response 2 to Question Edit
Thanks for including a minimal reproducible example. Here are the results that I get with Gekko 0.2.8. If you are using an earlier version, I recommend that you upgrade with pip install gekko --upgrade.
The solver reports a successful solution.
EXIT: Optimal Solution Found.
The solution was found.
The final value of the objective function is 0.03164650667928192
Solver : IPOPT (v3.12)
Solution time : 0.23339999999999997 sec
Objective : 0.0316473666078486
Successful solution
The constraints x>=0 and 0<=u<=1 are satisfied. Could it just be an issue with an older version of Gekko?

Tensorflow graph execution ignores equality condition in earger execution mode

I stumbled on some weird tensorflow behaviour. After tf.print everywhere, it led me to the cause as shown in the following code but don't know why it happened unless either threading race condition or graph construction omitted the code segment. Don't see either of them should happen.
# Ragged tensor may have empty rows. So, for tensor arithmetic operation,
# we need to create zero-padded tensors to replace them.
# This implementation only keeps the first entry of each row.
# So, the output tensor is a normal tensor.
def pad_empty_ragged_tensor(ragtensor):
tf.print("Ragged tensor padding empty tensor...", output_stream=sys.stdout)
batch_size = ragtensor.shape[0]
n_rows = ragtensor.row_lengths()
tf.print("row_lengths(): ", n_rows, output_stream=sys.stdout)
new_tensor = []
for i in range(batch_size):
tf.print("n_rows[i]: ", n_rows[i], output_stream=sys.stdout)
if tf.equal(n_rows[i], 0): # Tried n_rows[i] == 0 too
tf.print("Create zero padded tensor...", output_stream=sys.stdout)
num_zeros = ragtensor.shape[-1]
tensor = tf.tile([[0]], [1, num_zeros])
tensor = tf.cast(tensor, dtype=ragtensor.dtype)
tf.print("Take first entry from the row", output_stream=sys.stdout)
tensor = ragtensor[i,0:1]
tensor = tf.stack(new_tensor, axis=0) # [batch, 1, [y, x, h, w]]
tensor.set_shape([batch_size, 1, ragtensor.shape[-1]])
tf.print("The padded tensor shape: ", tensor.shape, output_stream=sys.stdout)
return tensor
Here is a segment of the print trace:
row_lengths(): [1 1 0 ... 1 1 1]
n_rows[i]: 1
Take first entry from the row
n_rows[i]: 1
Take first entry from the row
n_rows[i]: 0
Take first entry from the row
n_rows[i]: 1
Take first entry from the row
As shown, if tf.equal(n_rows[i], 0): # Tried n_rows[i] == 0 too condition block was never called. It falls into 'else' condition every time even if the equality condition was met. Could anyone hint me what went wrong?
BTW, debugging tensorflow runtime is difficult too. Breakpoint in VSCode didn't hit once graph execution runs. tfdbg is not working with eager execution either. A suggestion on this is very beneficial to me too.
My dev env:
OS: Ubuntu18.04
Python: 3.6
Tensorflow-gpu: 1.14
GPU: RTX2070
Cuda: 10.1
cudnn: 7.6
IDE: VS code
Tensorflow mode: Eager execution
Thanks in advance

Unable to obtain moments using tensorflow

I want to calculate the moments of a vector x = np.random.normal(0,1,[1,500]). When I do mean, std = tf.nn.moments(x,axes=[0]), it throws this error:
File "/tmp/venv/local/lib/python2.7/site-packages/tensorflow/python/ops/nn.py", line 830, in moments
y = math_ops.cast(x, dtypes.float32) if x.dtype == dtypes.float16 else x
TypeError: data type not understood
I am using tensorflow==0.11.0. What is the correct syntax?
As shown in the documentation for tf.nn.moments, the input x must be a Tensor.
You should use something like the following:
x = tf.placeholder("float", [None,500])
mean, std = tf.nn.moments(x, axes=[0])
sess = tf.Session()
sample_mean, sample_std = sess.run([mean, std],
feed_dict={x: np.random.normal(0,1,[1,500])})
Note: This particular calculation does not make much sense, since there is only one data value. You may want to either increase the shape to something like [32, 500], or more likely change the axes from [0] to [1].
Regardless, the calculation will complete without errors, despite the calculated standard deviation being equal to 0, because the moments are calculated along an axis with one dimension.

pybrain LSTM sequence to predict sequential data

I have written a simple code using pybrain to predict a simple sequential data.
For example a sequence of 0,1,2,3,4 will supposed to get an output of 5 from the network. The dataset specifies the remaining sequence.
Below are my codes implementation
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.datasets import SequentialDataSet
from pybrain.structure import SigmoidLayer, LinearLayer
from pybrain.structure import LSTMLayer
import itertools
import numpy as np
net = buildNetwork(INPUTS, HIDDEN, OUTPUTS, hiddenclass=LSTMLayer, outclass=LinearLayer, recurrent=True, bias=True)
ds = SequentialDataSet(INPUTS, OUTPUTS)
trainer = BackpropTrainer(net, ds)
for _ in range(1000):
print trainer.train()
print x
The output on my screen keeps showing [0.99999999 0.99999999 0.9999999 0.99999999] every simple time. What am I missing? Is the training not sufficient? Because trainer.train()
shows output of 86.625..
The pybrain sigmoidLayer is implementing the sigmoid squashing function, which you can see here:
sigmoid squashing function code
The relevant part is this:
def sigmoid(x):
""" Logistic sigmoid function. """
return 1. / (1. + safeExp(-x))
So, no matter what the value of x, it will only ever return values between 0 and 1. For this reason, and for others, it is a good idea to scale your input and output values to between 0 and 1. For example, divide all your inputs by the maximum value (assuming the minimum is no lower than 0), and the same for your outputs. Then do the reverse with the result (e.g. multiply by 25 if you were dividing by 25 at the beginning).
Also, I'm no expert on pybrain, but I wonder if you need OUTPUTS = 4? It looks like you have only one output in your data, so I'm wondering if you could just use OUTPUTS = 1.
You may also try scaling the inputs and outputs to a particular part of the sigmoid curve (e.g. between 0.1 and 0.9) to make the pybrain's job easier, but that makes the scaling before and after a little more complex.