Why is the gradient of tf.sign() not equal to 0? - tensorflow

I expected the gradient for tf.sign() in TensorFlow to be equal to 0 or None. However, when I examined the gradients, I found that they were equal to very small numbers (e.g. 1.86264515e-09). Why is that?
(If you are curious as to why I even want to know this, it is because I want to implement the "straight-through estimator" described here, and before overriding the gradient for tf.sign(), I wanted to check that the default behavior was in fact what I was expecting.)
EDIT: Here is some code which reproduces the error. The model is just the linear regression model from the introduction to TensorFlow, except that I use y=sign(W)x + b instead of y=Wx + b.
import tensorflow as tf
import numpy as np
def gvdebug(g, v):
g2 = tf.zeros_like(g, dtype=tf.float32)
v2 = tf.zeros_like(v, dtype=tf.float32)
g2 = g
v2 = v
return g2,v2
# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3
# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = tf.sign(W) * x_data + b
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
grads_and_vars = optimizer.compute_gradients(loss)
gv2 = [gvdebug(gv[0], gv[1]) for gv in grads_and_vars]
apply_grads = optimizer.apply_gradients(gv2)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.initialize_all_variables()
# Launch the graph.
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.01)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
sess.run(init)
# Fit the line.
for step in range(201):
sess.run(apply_grads)
if (step % 20 == 0) or ((step-1) % 20 == 0):
print("")
print(sess.run(gv2[0][1])) #the variable
print(sess.run(gv2[0][0])) #the gradient
print("")
print(step, sess.run(W), sess.run(b))
# Learns best fit is W: [0.1], b: [0.3]

Related

Learning a simple pattern with RNN

I am trying to make RNN in tensorflow capture a basic pattern in a simple time series in hours. I am trying to solve a bigger problem involving count time series of customer demand.
The simple time series is as follows:
Every 24 hours (1 day) there will be a small integer number either 1 or 2 from a random uniform distirbution.
In between these 24 hours will be zero values.
Every 168 hours (7 days) there will be a high integer number (5 or 6 or 7 or 8 or 9) from a random uniform distirbution.
I tried following the code at https://r2rt.com/recurrent-neural-networks-in-tensorflow-i.html using dynamic_rnn.
Is my test data correct? How can I feed the batches of output from previous times step as input to the next time step? I have 5 hyperparamters to play with
batch_size = 8 num_steps = 192 state_size = 5 learning_rate = 0.00001
num_epochs=1
However, after training each time with the same hyperparameters I am getting different results. Each time the training error is very small. The different results seem quite random (local minima probably??). orange is actual, blue is predicted.
Can my test batch start at any point in the sequence? Does the RNN learn the number of zeros inbetween non-zero values? if the test batch starts with a small non-zero number then the RNN should know that it should output 23 zero value steps after this and then after 167 steps output a high non-zero value. if I start my test sequence at 0 then it should wait 23 more zero value steps before outputing a small non-zero value and after 167 steps output a high non-zero value?
or does it learn another pattern? I am not sure if my method of testing is correct?
Is it better to just pass one time step integer value and let the network generate the remaining time steps integer values by passing the current time step output as input to the next time step?
Currently, I just take a random sequence of X generated by the same method for training and check if my output Y is the shifted version of X by 1 time step. Could you please explain?
My code is given below. you can just copy and paste and it should run. Basically, I just generate the data, build the model, train the network and test it.
from data_generator import gen_data
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
import numpy as np
import time
import matplotlib.pyplot as plt
num_classes = 11
batch_size = 8
num_steps = 192
state_size = 5
learning_rate = 0.00001
num_epochs=1
dem = gen_data(len=1576)
def gen_batch(dem, batch_size, num_steps):
raw_x = dem[:-1]
raw_y = dem[1:]
data_length = len(raw_x)
num_of_win = data_length - num_steps - 1 # 1382 windows
batch_partition_length = num_of_win // batch_size # 172 batches
data_x = []
data_y = []
j=0
for i in range(batch_partition_length):
windows_x = []
windows_y = []
k=0
while(k<batch_size):
windows_x.append( raw_x[ j:num_steps + j] )
windows_y.append( raw_y[ j:num_steps + j] )
j+=1
k+=1
data_x.append(np.array(windows_x)) # each batch is stacked horizontally.
data_y.append(np.array(windows_y))
for windows_x, windows_y in zip(data_x,data_x):
x = windows_x
y = windows_y
z = x.shape
z = y.shape
yield (x, y)
def gen_epoch(num_epochs,batch_size, num_steps):
for n in range(num_epochs):
yield gen_batch(dem, batch_size, num_steps)
def reset_graph():
# if 'sess' in globals() and sess:
# sess.close()
tf.compat.v1.reset_default_graph()
def build_RNN_model(batch_size, num_classes,state_size,num_steps,learning_rate):
reset_graph()
x = tf.compat.v1.placeholder(dtype=tf.int32, shape=(batch_size,num_steps))
y = tf.compat.v1.placeholder(dtype=tf.int32, shape=(batch_size,num_steps))
init_state = tf.zeros([batch_size, state_size])
# with tf.compat.v1.variable_scope('rnn_cell'):
# W = tf.compat.v1.get_variable('inp_state_w', shape=(num_classes+state_size,state_size),initializer=tf.compat.v1.initializers.glorot_uniform(10) )
# b = tf.compat.v1.get_variable('inp_state_b', shape=(state_size),initializer=tf.compat.v1.initializers.constant(0.0) )
# def rnn_cell(rnn_input,state):
# with tf.compat.v1.variable_scope('rnn_cell', reuse=True):
# W = tf.compat.v1.get_variable('inp_state_w', shape=(num_classes+state_size,state_size),initializer=tf.compat.v1.initializers.glorot_uniform(10) )
# b = tf.compat.v1.get_variable('inp_state_b', shape=(state_size),initializer=tf.compat.v1.initializers.constant(0.0) )
# return tf.tanh( tf.matmul( tf.concat([rnn_input,state], axis=1),W) + b )
#cell = tf.compat.v1.nn.rnn_cell.BasicRNNCell(state_size, reuse=True, name='rnn_cell' )
rnn_inputs = tf.one_hot(x, num_classes)
cell = tf.compat.v1.nn.rnn_cell.BasicRNNCell(state_size)
rnn_outputs, final_state = tf.compat.v1.nn.dynamic_rnn(cell, rnn_inputs, initial_state=init_state)
with tf.compat.v1.variable_scope('output'):
W = tf.compat.v1.get_variable('out_state_w', shape=(state_size,num_classes),initializer=tf.compat.v1.initializers.glorot_uniform(10) )
b = tf.compat.v1.get_variable('out_state_b', shape=(num_classes),initializer=tf.compat.v1.initializers.constant(0.0) )
logits = tf.reshape( tf.compat.v1.matmul(tf.reshape(rnn_outputs, [-1, state_size]), W) + b, [batch_size, num_steps, num_classes])
predictions = tf.compat.v1.nn.softmax(logits)
tru_labels = y
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
total_loss = tf.reduce_mean(losses)
train_step = tf.compat.v1.train.AdagradOptimizer(learning_rate).minimize(total_loss)
return dict(
x=x,
y=y,
final_state = final_state,
total_loss = total_loss,
train_step = train_step,
init_state = init_state,
predictions = predictions,
tru_labels = tru_labels,
saver = tf.compat.v1.train.Saver()
)
def train_network(g,num_epochs, batch_size,num_steps, dem,save=' '):
tf.compat.v1.set_random_seed(2345)
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.initialize_all_variables())
training_losses = []
for idx, epoch in enumerate(gen_epoch(num_epochs,batch_size, num_steps)):
training_loss = 0
steps=0 # number of batches
training_state = None
for X,Y in epoch:
steps+=1
feed_dict = {g['x'] : X, g['y'] : Y}
if training_state is not None:
feed_dict[g['init_state']] = training_state
training_loss_, training_state, train_step = \
sess.run([g['total_loss'], g['final_state'], g['train_step']], feed_dict)
training_loss+=training_loss_
print("Average training loss for Epoch", idx, ":", training_loss/steps)
print('steps',steps)
training_losses.append(training_loss/steps)
if isinstance(save, str):
g['saver'].save(sess, save)
e = gen_batch(dem, batch_size, num_steps)
e = gen_batch(dem, batch_size, num_steps)
for X,Y in e:
tru_labels, predictions = \
sess.run([g['tru_labels'], g['predictions']], feed_dict={g['x'] : X, g['y'] : Y, g['init_state'] : training_state})
pred = np.argmax(predictions, axis=2)
print(pred.shape)
pred = pred[0]
print('predictions',pred)
tru_labels = tru_labels[0]
print('tru_labels',tru_labels )
plt.plot(pred)
plt.plot(tru_labels)
plt.show()
return training_loss
g = build_RNN_model(batch_size, num_classes,state_size,num_steps,learning_rate)
t = time.time()
train_network(g, num_epochs,batch_size,num_steps, dem,save='saver' )
print("It took", time.time() - t, "seconds to train for 3 epochs.")
I have written some keras code with a single RNN cell and a dense layer to capture the following two patterns which is similar to the two patterns above. However, the distribution of magnitudes of high vehicles and low vehicles that are drawn from a categorical distribution below are not being represented in the test output.
Categorical Random Variable, x = {0,1,2} and p(x) = {0.6,0.3,0.1}
low vehicles = 1 + x , every 4 hours
high vehicles = 6 + x , every 8 hours
I managed to get the results like the following
with this code
from copyreg import pickle
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow.keras as keras
import sys
#### for reproduclvle resutls
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(2)
n_steps = 12
batch_size = 32
lay1_state_size = 64
lay2_state_size = 0
dense_state_size = 1
num_epochs = 25
horizon = 24
loss_function_type = 'sparse_categorical_crossentropy or mse or rmse'
num_layers = 1
optimizer_type = 'Adam'
metrics = 'rmse'
# spikes at regrular interval
dem = np.load('const_dem_2_freq_stoch.npy')
dem_len = len(dem)
def gen_batch(dem, batch_size, n_steps):
n = n_steps + 1
raw_x = dem[:-1]
data_length = len(raw_x)
num_of_win = data_length - n - 1 # 1382 windows
batch_partition_length = num_of_win // batch_size # 172 batches
#print('batch_partition_length',batch_partition_length)
data_x = []
j=0
for i in range(batch_partition_length):
windows_x = []
k=0
while(k<batch_size):
windows_x.append( raw_x[ j:n + j] )
j+=1
k+=1
data_x.append(np.array(windows_x)) # each batch is stacked horizontally.
data_x = np.array(data_x)
data_x = np.reshape(data_x,(-1,n)) # 224 x 13
#print(data_x.shape)
return data_x,batch_partition_length
data_x,batch_partition_length = gen_batch(dem, batch_size, n_steps)
data_x = np.expand_dims(data_x,axis=-1)
tr = int(0.7*dem_len)
val = int(0.2*dem_len)
x_train, y_train = data_x[:tr,:n_steps], data_x[:tr,-1]
x_valid, y_valid = data_x[tr:tr+val,:n_steps], data_x[tr:tr+val,-1]
print('\n\n')
print('tr+val',tr+val)
print('\n\n')
x_test, y_test = data_x[tr+val:,:n_steps], data_x[tr+val:,-1]
#model
model = keras.models.Sequential([keras.layers.SimpleRNN(lay1_state_size,input_shape=[None,1]), keras.layers.Dense(dense_state_size)])
# model = keras.models.Sequential([keras.layers.SimpleRNN(lay1_state_size,return_sequences=True,input_shape=[None,1]),keras.layers.SimpleRNN(lay2_state_size),
# keras.layers.Dense(dense_state_size)])
model.compile(optimizer='Adam',loss=keras.losses.mean_absolute_error ,metrics=[tf.keras.metrics.RootMeanSquaredError()] )
model.fit(x_train, y_train, batch_size=batch_size, epochs=num_epochs,validation_data=(x_valid,y_valid))
print('\n')
print('Model Evaluation on test set:\n')
model.evaluate(x_test, y_test,batch_size=batch_size)
print('\n')
#model.summary()
y_tru = np.array([])
for step_ahead in range(horizon):
# tru label
y = np.append(data_x[step_ahead+1:,n_steps ], np.array([[0]*(step_ahead+1)]))
y_tru = np.append(y_tru,y)
# prediction
y_pred_one = model.predict(data_x[:,step_ahead:])[:,np.newaxis,:]
data_x = np.concatenate([data_x,y_pred_one ],axis=1)
y_tru = np.reshape(y_tru,(batch_partition_length*batch_size,horizon),order='F')
y_pred_horizon = data_x[:,n_steps+1:]
y_pred_horizon = np.squeeze(y_pred_horizon)
print('print(y_pred_horizon.shape)',y_pred_horizon.shape)
print(' RNN prediction on all data MSE',np.mean(keras.losses.mean_squared_error(y_tru,y_pred_horizon )) )
print(' RNN prediction on all data MAE',np.mean(keras.losses.mean_absolute_error(y_tru,y_pred_horizon )) )
print('\n')
for i in range(10):
plt.figure(i)
plt.plot(y_tru[i])
plt.plot(np.squeeze(y_pred_horizon[i]))
plt.show()
The data generation code is given below
from copyreg import pickle
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow.keras as keras
dem_len = 1240
def categorical(p):
return (p.cumsum(-1) >= np.random.uniform(size=p.shape[:-1])[..., None]).argmax(-1)
p = np.array([0.6, 0.3, 0.1])
def dem_hr(hr, lo_veh, hi_veh,len):
dem_hrs = np.array([])
for i in range(10000):
#d = np.random.randint(lo_veh,hi_veh)
d = lo_veh + categorical(p)
z = np.array([0]*(hr-1))
dem_hrs = np.append(dem_hrs, d)
dem_hrs = np.append(dem_hrs, z)
dem_hrs = dem_hrs[:len]
return dem_hrs
def gen_data(len):
dzero = np.zeros(len)
# for hr,lo_veh, hi_veh in zip([4, 8],[1, 6],[3,9]):
# d = dem_hr(hr, lo_veh, hi_veh,len)
# dem = dem + d
# dem = np.array(dem,dtype=np.float32)
d4 = dem_hr(4, 1, 3,len)
d8 = dem_hr(8, 6, 9,len)
dall = dzero + d8
dsub = dall - d4
dem = np.where(dsub>=0,d8,d4)
# plt.plot(dem)
# plt.plot(d4)
# plt.plot(d8)
# plt.show()
return dem
dem = gen_data(len=dem_len)
np.save('const_dem_2_freq_stoch_cat',dem)
plt.plot(dem)
plt.show()
I think incresing the number of steps may help to capture the distribution of magnitudes at different periods. Does increasing the layers also help to capture the magnitude distribution?

How can I use version 2.0 of Tensorflow in python 3.7.9 and use the displacement function?

I was following a tutorial by this guy: https://www.youtube.com/watch?v=PwAGxqrXSCs&list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v&index=47 , and now I have run into a Error.
The programm tells me : "AttributeError: module 'tensorflow' has no attribute 'placeholder'". I know that in version 2.0 and higher (of tensorflow) the placeholder function got removed, however I am unable to figure out another way of making my script work as I am just starting out with Deep Learning! Help would be greatly appreciated!
Here is my code (I am using PyCharm as an IDE, dont know if thats a source for the Error):
import tensorflow as tf
mnist = tf.keras.datasets.mnist
#Hidden Layers
n_nodes_hl1 = 500 # Can be different from others (they do no HAVE to be the same number)
n_nodes_hl2 = 500
n_nodes_hl3 = 500
n_classes = 10
batch_size = 100
#Input Data
#Height x width
x = tf.placeholder('float',[None, 784])
y = tf.placeholder('float')
def neural_network_model(data):
#(Input_data * weights) + biases
#Why need biases?
#If Input_Data = 0 and weights = (ANYTHING) and there was no bias, no neuron would ever fire!
#Because ofcourse (0 * (ANYTHING)) = 0
hidden_1_layer = {"weights":tf.Variable(tf.random.normal([784, n_nodes_hl1])),
"biases":tf.Variable(tf.random.normal(n_nodes_hl1))}
hidden_2_layer = {"weights":tf.Variable(tf.random.normal([n_nodes_hl1, n_nodes_hl2])),
"biases":tf.Variable(tf.random.normal(n_nodes_hl2))}
hidden_3_layer = {"weights":tf.Variable(tf.random.normal([n_nodes_hl2, n_nodes_hl3])),
"biases":tf.Variable(tf.random.normal(n_nodes_hl3))}
output_layer = {"weights":tf.Variable(tf.random.normal([n_nodes_hl3, n_classes])),
"biases":tf.Variable(tf.random.normal([n_classes]))}
# (Input_data * weights) + biases
#This is the Sum thingy l1(layer 1)
l1 = tf.add(tf.matmul(data * hidden_1_layer["weights"]), hidden_1_layer["biases"])
#Threshold function (will neuron fire?) [relu] = rectified linear
l1 = tf.nn.relu(l1)
#This is the Sum thingy for l2
l2 = tf.add(tf.matmul(l1 * hidden_2_layer["weights"]), hidden_2_layer["biases"])
l2 = tf.nn.relu(l2)
#This is the Sum thingy for l3
l3 = tf.add(tf.matmul(l2 * hidden_3_layer["weights"]), hidden_3_layer["biases"])
l3 = tf.nn.relu(l3)
#This is the Sum thingy
output = tf.matmul(l3 * output_layer["weights"]) + output_layer["biases"]
return output
def train_neural_network(x):
prediction = neural_network_model(x)
#Calculates the difference between prediction and known label
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))
#No we know what the cost is, LETS MINIMZE IT!
# learning_rate = 0.001
optimizer = tf.train.AdamOptimizer().minimize(cost)
#How many epochs do we want? (How often does it run) (Cycles feed forward + backprop)
hm_epochs = 100
with tf.compat.v1.Session() as sess:
sess.run(tf.global_variables_initializer())
#This trains:
for epoch in hm_epochs:
epoch_loss = 0
for _ in range(int(mnist.train.num_examples/batch_size)):
x, y = mnist.train.next_batch(batch_size)
_, c = sess.run([optimizer, cost], feed_dict = {x: x, y: y})
epoch_loss += c
print("Epoch", epoch, "Completed out of", hm_epochs, "loss:",epoch_loss)
#This checks if the machine does stuff right
#Checks if the maximum number is identical to the machines prediction
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct, "float"))
print("Accuracy:", accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))
train_neural_network(x)
In Tenosrflow 2 you can use tf.compat.v1.placeholder(), this is not compatible with eager execution and tf.function.
If you want to explicitly set up your inputs, also see Keras functional API on how to use tf.keras.Input to replace tf.compat.v1.placeholder.
Example,
x = tf.keras.Input(shape=(32,))

How to create compilable tf.keras model with multiple tensor inputs?

This is with tf 2.1.0
The following works up until you try to call a compiled model. Is there something to do to make the .compile and .fit methods work for multiple tensor inputs?
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.keras as keras
tf.keras.backend.set_floatx('float64')
m = 250 # samples
n_x = 1 # dim of x
n_tau = 11
x = (2 * np.random.rand(m, n_x).astype(np.float64) - 1) * 2
i = np.argsort(x[:, 0])
x = x[i] # to make plotting nicer
A = np.random.randn(n_x, 1)
y = x ** 2 + 0.3 * x + 0.4 * np.random.randn(m, 1).astype(np.float64)
y = y.dot(A) # y is 1d
y = y[:, :, None]
tau = np.linspace(1.0 / n_tau, 1 - 1.0 / n_tau, n_tau).astype(np.float64)
tau = tau[None, :, None]
def loss(tau_y, u):
tau = tau_y[0]
y = tau_y[1]
u = y - u
res = u ** 2 * (tau - tf.where(u <= np.float64(0.0), np.float64(1.0), np.float64(0.0)))
return tf.reduce_sum(tf.reduce_mean(res, axis=[1, 2]), axis=0)
tf.keras.backend.set_floatx('float64')
class My(tf.keras.models.Model):
def __init__(self):
super().__init__()
self._my_layer = tf.keras.layers.Dense(1, dtype=tf.float64)
def call(self, inputs):
tau = inputs[0]
y = inputs[1]
tf.print(tau.shape, y.shape)
return self._my_layer(tau)
model = My()
u = model((tau, y)) # calling model works
l = loss((tau, y), model((tau, y))) # call loss works
opt = tf.keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=opt, loss=loss)
# this fails with the error below
model.fit((tau, y), (tau, y))
# ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), for inputs ['output_1'] but instead got the following list of 2 arrays: [array([[[0.09090909],
# [0.17272727],
# [0.25454545],
# [0.33636364],
# [0.41818182],
# [0.5 ],
# [0.58181818],
# [0.66363636],
# [0.74545455],
# ...

Can neural networks handle redundant inputs?

I have a fully connected neural network with the following number of neurons in each layer [4, 20, 20, 20, ..., 1]. I am using TensorFlow and the 4 real-valued inputs correspond to a particular point in space and time, i.e. (x, y, z, t), and the 1 real-valued output corresponds to the temperature at that point. The loss function is just the mean square error between my predicted temperature and the actual temperature at that point in (x, y, z, t). I have a set of training data points with the following structure for their inputs:
(x,y,z,t):
(0.11,0.12,1.00,0.41)
(0.34,0.43,1.00,0.92)
(0.01,0.25,1.00,0.65)
...
(0.71,0.32,1.00,0.49)
(0.31,0.22,1.00,0.01)
(0.21,0.13,1.00,0.71)
Namely, what you will notice is that the training data all have the same redundant value in z, but x, y, and t are generally not redundant. Yet what I find is my neural network cannot train on this data due to the redundancy. In particular, every time I start training the neural network, it appears to fail and the loss function becomes nan. But, if I change the structure of the neural network such that the number of neurons in each layer is [3, 20, 20, 20, ..., 1], i.e. now data points only correspond to an input of (x, y, t), everything works perfectly and training is all right. But is there any way to overcome this problem? (Note: it occurs whether any of the variables are identical, e.g. either x, y, or t could be redundant and cause this error.) I have also attempted different activation functions (e.g. ReLU) and varying the number of layers and neurons in the network, but these changes do not resolve the problem.
My question: is there any way to still train the neural network while keeping the redundant z as an input? It just so happens the particular training data set I am considering at the moment has all z redundant, but in general, I will have data coming from different z in the future. Therefore, a way to ensure the neural network can robustly handle inputs at the present moment is sought.
A minimal working example is encoded below. When running this example, the loss output is nan, but if you simply uncomment the x_z in line 12 to ensure there is now variation in x_z, then there is no longer any problem. But this is not a solution since the goal is to use the original x_z with all constant values.
import numpy as np
import tensorflow as tf
end_it = 10000 #number of iterations
frac_train = 1.0 #randomly sampled fraction of data to create training set
frac_sample_train = 0.1 #randomly sampled fraction of data from training set to train in batches
layers = [4, 20, 20, 20, 20, 20, 20, 20, 20, 1]
len_data = 10000
x_x = np.array([np.linspace(0.,1.,len_data)])
x_y = np.array([np.linspace(0.,1.,len_data)])
x_z = np.array([np.ones(len_data)*1.0])
#x_z = np.array([np.linspace(0.,1.,len_data)])
x_t = np.array([np.linspace(0.,1.,len_data)])
y_true = np.array([np.linspace(-1.,1.,len_data)])
N_train = int(frac_train*len_data)
idx = np.random.choice(len_data, N_train, replace=False)
x_train = x_x.T[idx,:]
y_train = x_y.T[idx,:]
z_train = x_z.T[idx,:]
t_train = x_t.T[idx,:]
v1_train = y_true.T[idx,:]
sample_batch_size = int(frac_sample_train*N_train)
np.random.seed(1234)
tf.set_random_seed(1234)
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)
tf.logging.set_verbosity(tf.logging.ERROR)
class NeuralNet:
def __init__(self, x, y, z, t, v1, layers):
X = np.concatenate([x, y, z, t], 1)
self.lb = X.min(0)
self.ub = X.max(0)
self.X = X
self.x = X[:,0:1]
self.y = X[:,1:2]
self.z = X[:,2:3]
self.t = X[:,3:4]
self.v1 = v1
self.layers = layers
self.weights, self.biases = self.initialize_NN(layers)
self.sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=False,
log_device_placement=False))
self.x_tf = tf.placeholder(tf.float32, shape=[None, self.x.shape[1]])
self.y_tf = tf.placeholder(tf.float32, shape=[None, self.y.shape[1]])
self.z_tf = tf.placeholder(tf.float32, shape=[None, self.z.shape[1]])
self.t_tf = tf.placeholder(tf.float32, shape=[None, self.t.shape[1]])
self.v1_tf = tf.placeholder(tf.float32, shape=[None, self.v1.shape[1]])
self.v1_pred = self.net(self.x_tf, self.y_tf, self.z_tf, self.t_tf)
self.loss = tf.reduce_mean(tf.square(self.v1_tf - self.v1_pred))
self.optimizer = tf.contrib.opt.ScipyOptimizerInterface(self.loss,
method = 'L-BFGS-B',
options = {'maxiter': 50,
'maxfun': 50000,
'maxcor': 50,
'maxls': 50,
'ftol' : 1.0 * np.finfo(float).eps})
init = tf.global_variables_initializer()
self.sess.run(init)
def initialize_NN(self, layers):
weights = []
biases = []
num_layers = len(layers)
for l in range(0,num_layers-1):
W = self.xavier_init(size=[layers[l], layers[l+1]])
b = tf.Variable(tf.zeros([1,layers[l+1]], dtype=tf.float32), dtype=tf.float32)
weights.append(W)
biases.append(b)
return weights, biases
def xavier_init(self, size):
in_dim = size[0]
out_dim = size[1]
xavier_stddev = np.sqrt(2/(in_dim + out_dim))
return tf.Variable(tf.truncated_normal([in_dim, out_dim], stddev=xavier_stddev), dtype=tf.float32)
def neural_net(self, X, weights, biases):
num_layers = len(weights) + 1
H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0
for l in range(0,num_layers-2):
W = weights[l]
b = biases[l]
H = tf.tanh(tf.add(tf.matmul(H, W), b))
W = weights[-1]
b = biases[-1]
Y = tf.add(tf.matmul(H, W), b)
return Y
def net(self, x, y, z, t):
v1_out = self.neural_net(tf.concat([x,y,z,t], 1), self.weights, self.biases)
v1 = v1_out[:,0:1]
return v1
def callback(self, loss):
global Nfeval
print(str(Nfeval)+' - Loss in loop: %.3e' % (loss))
Nfeval += 1
def fetch_minibatch(self, x_in, y_in, z_in, t_in, den_in, N_train_sample):
idx_batch = np.random.choice(len(x_in), N_train_sample, replace=False)
x_batch = x_in[idx_batch,:]
y_batch = y_in[idx_batch,:]
z_batch = z_in[idx_batch,:]
t_batch = t_in[idx_batch,:]
v1_batch = den_in[idx_batch,:]
return x_batch, y_batch, z_batch, t_batch, v1_batch
def train(self, end_it):
it = 0
while it < end_it:
x_res_batch, y_res_batch, z_res_batch, t_res_batch, v1_res_batch = self.fetch_minibatch(self.x, self.y, self.z, self.t, self.v1, sample_batch_size) # Fetch residual mini-batch
tf_dict = {self.x_tf: x_res_batch, self.y_tf: y_res_batch, self.z_tf: z_res_batch, self.t_tf: t_res_batch,
self.v1_tf: v1_res_batch}
self.optimizer.minimize(self.sess,
feed_dict = tf_dict,
fetches = [self.loss],
loss_callback = self.callback)
def predict(self, x_star, y_star, z_star, t_star):
tf_dict = {self.x_tf: x_star, self.y_tf: y_star, self.z_tf: z_star, self.t_tf: t_star}
v1_star = self.sess.run(self.v1_pred, tf_dict)
return v1_star
model = NeuralNet(x_train, y_train, z_train, t_train, v1_train, layers)
Nfeval = 1
model.train(end_it)
I think your problem is in this line:
H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0
In the third column fo X, corresponding to the z variable, both self.lb and self.ub are the same value, and equal to the value in the example, in this case 1, so it is acutally computing:
2.0*(1.0 - 1.0)/(1.0 - 1.0) - 1.0 = 2.0*0.0/0.0 - 1.0
Which is nan. You can work around the issue in a few different ways, a simple option is to simply do:
# Avoids dividing by zero
X_d = tf.math.maximum(self.ub - self.lb, 1e-6)
H = 2.0*(X - self.lb)/X_d - 1.0
This is an interesting situation. A quick check on an online tool for regression shows that even simple regression suffers from the problem of unable to fit data points when one of the inputs is constant through the dataset. Taking a look at the algebraic solution for a two-variable linear regression problem shows the solution involving division by standard deviation which, being zero in a constant set, is a problem.
As far as solving through backprop is concerned (as is the case in your neural network), I strongly suspect that the derivative of the loss with respect to the input (these expressions) is the culprit, and that the algorithm is not able to update the weights W using W := W - α.dZ, and ends up remaining constant.

Can I use TensorBoard also with jupyter notebooks

I am experimenting (learning) TensorBoard and use the following code I got from the internet (simple regression function)
import tensorflow as tf
import numpy as np
#sess = tf.InteractiveSession() #define a session
# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype("float32")
y_data = x_data * 0.1 + 0.3
# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but Tensorflow will
# figure that out for us.)
with tf.name_scope("calculatematmul") as scope:
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
sess.run(init)
#### ----> ADD THIS LINE <---- ####
writer = tf.train.SummaryWriter('mnist_logs', sess.graph_def)
# Fit the line.
for step in xrange(201):
sess.run(train)
if step % 20 == 0:
print(step, sess.run(W), sess.run(b))
The code runs fine when I create a python file and run the file with
python test.py
also it runs fine in the jupyter notebook
However, while Tensorboard gets the information from running the python file (that is to say, it creates the xyz....home file), the interactive version does not create any info usable for Tensorboard.
Can somebody explain to me why, please!
Thanks
Peter
Be sure that you use the full path when starting tensorboard.
tensorboard --logdir='./somedirectory/mnist_logs'