I have 2 batches of sequences of 13x13 grids of 5 embedding vectors of 100 numbers. I want some embedding vectors to be very close to each other and other very far away when using a norm. How can i compute the l2 norm or other norm over all possible combinations of 2 embedding vectors from each batch? In the code bellow I tried to implement cos norm, but it becomes inf after some time. The 'tr' variable represents the embedding vectors that should be close to each other.
tf.reset_default_graph()
if True:
a = tf.placeholder(tf.float32,[2,2,13,13,5,100])
b = tf.placeholder(tf.float32,[2,2,13,13,5,1])
a11 = tf.reshape(a, [2, -1, 100])
a1 = tf.layers.Dense(100)(a11)
def triple_loss(pr, tr, batch_size=2, alpha=0.1, cos_norm=False, number_norm=False):
'''face2vec loss
pr: [b,h,w,boxes,embed]
tr: [b,h,w,boxes,1]
returns: []'''
b,l,h,w,bo,_ = tr.get_shape().as_list()
em = pr.get_shape().as_list()[-1]
tr_r = tf.reshape(tr,[-1, l*h*w*bo, 1])
tr_tiled = tf.tile(tr_r,[1, 1, l*h*w*bo])#
pr_reshaped = tf.reshape(pr,[-1, l*h*w*bo, em])
embed_prod = tf.matmul(pr_reshaped, pr_reshaped, transpose_b=True)
if cos_norm:
tr_norm = tf.reduce_sum(tf.sqrt(tr_tiled*tr_tiled),-1)
tr_norm_tiled = tf.tile(tf.reshape(tr_norm,[-1, l*h*w*bo, 1]),[1, 1, l*h*w*bo])
scale = tf.matmul(tr_norm_tiled, tr_norm_tiled, transpose_b=True)
embed_prod = embed_prod/(scale+0.000001)
if number_norm:
return tf.reduce_mean(tf.reduce_mean(embed_prod*tr_tiled,[-1,-2]) /tf.reduce_sum(tr_r,[-1,-2])\
- tf.reduce_mean(embed_prod*(1.0 - tr_tiled),[-1,-2])/tf.reduce_sum((1.0 - tr_r),[-1,-2]))\
+alpha
loss = tf.reduce_mean( tf.reduce_mean(embed_prod*tr_tiled,[-1,-2]) \
- tf.reduce_mean(embed_prod*(1.0 - tr_tiled),[-1,-2]))+alpha
return loss
loss = triple_loss(a1, b, cos_norm=True)
optimizer = tf.train.GradientDescentOptimizer(1e-3)
train_op = optimizer.minimize(loss)
sess= tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
aa = np.zeros((2,2,13,13,5,100))
aa[...,:2,:2]=1
aa[...,0,1:3,1:2]=1
bb = np.zeros((2,2,13,13,5,1))
bb[:,:,:3,:3,:2,0] = 1.0
for i in range(10):
l = sess.run([loss,train_op],{a:aa, b:bb})
print(l[-2])
the output is:
-320486.0
-2.02932e+12
-1.27284e+19
-8.06542e+25
-inf
nan
nan
Just with linear product between embedding vectors the network converges!
To compute (squared) l2 norms for each pair of vectors, stack them in a matrix and multiply it by its transpose.
The loss you compute can be negative. The optimizer is achieving its goal of minimizing the loss very well - it reaches negative infinity. You need to make sure your loss is bounded from below.
Related
I am new with Deep Learning with Pytorch. I am more experienced with Tensorflow, and thus I should say I am not new to Deep Learning itself.
Currently, I am working on a simple ANN classification. There are only 2 classes so quite naturally I am using a Softmax BCELoss combination.
The dataset is like this:
shape of X_train (891, 7)
Shape of Y_train (891,)
Shape of x_test (418, 7)
I transformed the X_train and others to torch tensors as train_data and so on. The next step is:
train_ds = TensorDataset(train_data, train_label)
# Define data loader
batch_size = 32
train_dl = DataLoader(train_ds, batch_size, shuffle=True)
I made the model class like:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(7, 32)
self.bc1 = nn.BatchNorm1d(32)
self.fc2 = nn.Linear(32, 64)
self.bc2 = nn.BatchNorm1d(64)
self.fc3 = nn.Linear(64, 128)
self.bc3 = nn.BatchNorm1d(128)
self.fc4 = nn.Linear(128, 32)
self.bc4 = nn.BatchNorm1d(32)
self.fc5 = nn.Linear(32, 10)
self.bc5 = nn.BatchNorm1d(10)
self.fc6 = nn.Linear(10, 1)
self.bc6 = nn.BatchNorm1d(1)
self.drop = nn.Dropout2d(p=0.5)
def forward(self, x):
torch.nn.init.xavier_uniform(self.fc1.weight)
x = self.fc1(x)
x = self.bc1(x)
x = F.relu(x)
x = self.drop(x)
x = self.fc2(x)
x = self.bc2(x)
x = F.relu(x)
#x = self.drop(x)
x = self.fc3(x)
x = self.bc3(x)
x = F.relu(x)
x = self.drop(x)
x = self.fc4(x)
x = self.bc4(x)
x = F.relu(x)
#x = self.drop(x)
x = self.fc5(x)
x = self.bc5(x)
x = F.relu(x)
x = self.drop(x)
x = self.fc6(x)
x = self.bc6(x)
x = torch.sigmoid(x)
return x
model = Net()
The loss function and the optimizer are defined:
loss = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.00001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
At last, the task is to run the forward in epochs:
num_epochs = 1000
# Repeat for given number of epochs
for epoch in range(num_epochs):
# Train with batches of data
for xb,yb in train_dl:
pred = model(xb)
yb = torch.unsqueeze(yb, 1)
#print(pred, yb)
print('grad', model.fc1.weight.grad)
l = loss(pred, yb)
#print('loss',l)
# 3. Compute gradients
l.backward()
# 4. Update parameters using gradients
optimizer.step()
# 5. Reset the gradients to zero
optimizer.zero_grad()
# Print the progress
if (epoch+1) % 10 == 0:
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, l.item()))
I can see in the output that after each iteration with all the batches, the hard weights are non-zero, after this zero_grad is applied.
However, the model is pretty bad. I get an F1 score of around 50% only! And the model is bad when I call it to predict the train_dl itself!!!
I am wondering what the reason is. The grad of weights not zero but not updating properly? The optimizer not optimizing the weights? Or what else?
Can someone please have a look?
I already tried different loss functions and optimizers. I tried with smaller datasets, bigger batches, different hyperparameters.
Thanks! :)
First of all, you don't use softmax activation for BCE loss, unless you have 2 output nodes, which is not the case. In PyTorch, BCE loss doesn't apply any activation function before calculating the loss, unlike the CCE which has a built-in softmax function. So, if you want to use BCE, you have to use sigmoid (or any function f: R -> [0, 1]) at the output layer, which you don't have.
Moreover, you should ideally do optimizer.zero_grad() for each batch if you want to do SGD (which is the default). If you don't do that, you will be just doing full-batch gradient descent, which is quite slow and gets stuck in local minima easily.
I have a fully connected neural network with the following number of neurons in each layer [4, 20, 20, 20, ..., 1]. I am using TensorFlow and the 4 real-valued inputs correspond to a particular point in space and time, i.e. (x, y, z, t), and the 1 real-valued output corresponds to the temperature at that point. The loss function is just the mean square error between my predicted temperature and the actual temperature at that point in (x, y, z, t). I have a set of training data points with the following structure for their inputs:
(x,y,z,t):
(0.11,0.12,1.00,0.41)
(0.34,0.43,1.00,0.92)
(0.01,0.25,1.00,0.65)
...
(0.71,0.32,1.00,0.49)
(0.31,0.22,1.00,0.01)
(0.21,0.13,1.00,0.71)
Namely, what you will notice is that the training data all have the same redundant value in z, but x, y, and t are generally not redundant. Yet what I find is my neural network cannot train on this data due to the redundancy. In particular, every time I start training the neural network, it appears to fail and the loss function becomes nan. But, if I change the structure of the neural network such that the number of neurons in each layer is [3, 20, 20, 20, ..., 1], i.e. now data points only correspond to an input of (x, y, t), everything works perfectly and training is all right. But is there any way to overcome this problem? (Note: it occurs whether any of the variables are identical, e.g. either x, y, or t could be redundant and cause this error.) I have also attempted different activation functions (e.g. ReLU) and varying the number of layers and neurons in the network, but these changes do not resolve the problem.
My question: is there any way to still train the neural network while keeping the redundant z as an input? It just so happens the particular training data set I am considering at the moment has all z redundant, but in general, I will have data coming from different z in the future. Therefore, a way to ensure the neural network can robustly handle inputs at the present moment is sought.
A minimal working example is encoded below. When running this example, the loss output is nan, but if you simply uncomment the x_z in line 12 to ensure there is now variation in x_z, then there is no longer any problem. But this is not a solution since the goal is to use the original x_z with all constant values.
import numpy as np
import tensorflow as tf
end_it = 10000 #number of iterations
frac_train = 1.0 #randomly sampled fraction of data to create training set
frac_sample_train = 0.1 #randomly sampled fraction of data from training set to train in batches
layers = [4, 20, 20, 20, 20, 20, 20, 20, 20, 1]
len_data = 10000
x_x = np.array([np.linspace(0.,1.,len_data)])
x_y = np.array([np.linspace(0.,1.,len_data)])
x_z = np.array([np.ones(len_data)*1.0])
#x_z = np.array([np.linspace(0.,1.,len_data)])
x_t = np.array([np.linspace(0.,1.,len_data)])
y_true = np.array([np.linspace(-1.,1.,len_data)])
N_train = int(frac_train*len_data)
idx = np.random.choice(len_data, N_train, replace=False)
x_train = x_x.T[idx,:]
y_train = x_y.T[idx,:]
z_train = x_z.T[idx,:]
t_train = x_t.T[idx,:]
v1_train = y_true.T[idx,:]
sample_batch_size = int(frac_sample_train*N_train)
np.random.seed(1234)
tf.set_random_seed(1234)
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)
tf.logging.set_verbosity(tf.logging.ERROR)
class NeuralNet:
def __init__(self, x, y, z, t, v1, layers):
X = np.concatenate([x, y, z, t], 1)
self.lb = X.min(0)
self.ub = X.max(0)
self.X = X
self.x = X[:,0:1]
self.y = X[:,1:2]
self.z = X[:,2:3]
self.t = X[:,3:4]
self.v1 = v1
self.layers = layers
self.weights, self.biases = self.initialize_NN(layers)
self.sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=False,
log_device_placement=False))
self.x_tf = tf.placeholder(tf.float32, shape=[None, self.x.shape[1]])
self.y_tf = tf.placeholder(tf.float32, shape=[None, self.y.shape[1]])
self.z_tf = tf.placeholder(tf.float32, shape=[None, self.z.shape[1]])
self.t_tf = tf.placeholder(tf.float32, shape=[None, self.t.shape[1]])
self.v1_tf = tf.placeholder(tf.float32, shape=[None, self.v1.shape[1]])
self.v1_pred = self.net(self.x_tf, self.y_tf, self.z_tf, self.t_tf)
self.loss = tf.reduce_mean(tf.square(self.v1_tf - self.v1_pred))
self.optimizer = tf.contrib.opt.ScipyOptimizerInterface(self.loss,
method = 'L-BFGS-B',
options = {'maxiter': 50,
'maxfun': 50000,
'maxcor': 50,
'maxls': 50,
'ftol' : 1.0 * np.finfo(float).eps})
init = tf.global_variables_initializer()
self.sess.run(init)
def initialize_NN(self, layers):
weights = []
biases = []
num_layers = len(layers)
for l in range(0,num_layers-1):
W = self.xavier_init(size=[layers[l], layers[l+1]])
b = tf.Variable(tf.zeros([1,layers[l+1]], dtype=tf.float32), dtype=tf.float32)
weights.append(W)
biases.append(b)
return weights, biases
def xavier_init(self, size):
in_dim = size[0]
out_dim = size[1]
xavier_stddev = np.sqrt(2/(in_dim + out_dim))
return tf.Variable(tf.truncated_normal([in_dim, out_dim], stddev=xavier_stddev), dtype=tf.float32)
def neural_net(self, X, weights, biases):
num_layers = len(weights) + 1
H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0
for l in range(0,num_layers-2):
W = weights[l]
b = biases[l]
H = tf.tanh(tf.add(tf.matmul(H, W), b))
W = weights[-1]
b = biases[-1]
Y = tf.add(tf.matmul(H, W), b)
return Y
def net(self, x, y, z, t):
v1_out = self.neural_net(tf.concat([x,y,z,t], 1), self.weights, self.biases)
v1 = v1_out[:,0:1]
return v1
def callback(self, loss):
global Nfeval
print(str(Nfeval)+' - Loss in loop: %.3e' % (loss))
Nfeval += 1
def fetch_minibatch(self, x_in, y_in, z_in, t_in, den_in, N_train_sample):
idx_batch = np.random.choice(len(x_in), N_train_sample, replace=False)
x_batch = x_in[idx_batch,:]
y_batch = y_in[idx_batch,:]
z_batch = z_in[idx_batch,:]
t_batch = t_in[idx_batch,:]
v1_batch = den_in[idx_batch,:]
return x_batch, y_batch, z_batch, t_batch, v1_batch
def train(self, end_it):
it = 0
while it < end_it:
x_res_batch, y_res_batch, z_res_batch, t_res_batch, v1_res_batch = self.fetch_minibatch(self.x, self.y, self.z, self.t, self.v1, sample_batch_size) # Fetch residual mini-batch
tf_dict = {self.x_tf: x_res_batch, self.y_tf: y_res_batch, self.z_tf: z_res_batch, self.t_tf: t_res_batch,
self.v1_tf: v1_res_batch}
self.optimizer.minimize(self.sess,
feed_dict = tf_dict,
fetches = [self.loss],
loss_callback = self.callback)
def predict(self, x_star, y_star, z_star, t_star):
tf_dict = {self.x_tf: x_star, self.y_tf: y_star, self.z_tf: z_star, self.t_tf: t_star}
v1_star = self.sess.run(self.v1_pred, tf_dict)
return v1_star
model = NeuralNet(x_train, y_train, z_train, t_train, v1_train, layers)
Nfeval = 1
model.train(end_it)
I think your problem is in this line:
H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0
In the third column fo X, corresponding to the z variable, both self.lb and self.ub are the same value, and equal to the value in the example, in this case 1, so it is acutally computing:
2.0*(1.0 - 1.0)/(1.0 - 1.0) - 1.0 = 2.0*0.0/0.0 - 1.0
Which is nan. You can work around the issue in a few different ways, a simple option is to simply do:
# Avoids dividing by zero
X_d = tf.math.maximum(self.ub - self.lb, 1e-6)
H = 2.0*(X - self.lb)/X_d - 1.0
This is an interesting situation. A quick check on an online tool for regression shows that even simple regression suffers from the problem of unable to fit data points when one of the inputs is constant through the dataset. Taking a look at the algebraic solution for a two-variable linear regression problem shows the solution involving division by standard deviation which, being zero in a constant set, is a problem.
As far as solving through backprop is concerned (as is the case in your neural network), I strongly suspect that the derivative of the loss with respect to the input (these expressions) is the culprit, and that the algorithm is not able to update the weights W using W := W - α.dZ, and ends up remaining constant.
For the same input and label:
the output of pytorch.nn.CTCLoss is 5.74,
the output of tf.nn.ctc_loss is 129.69,
but the output of math.log(tf ctc loss) is 4.86
So what's the difference between pytorch.nn.CTCLoss with tf.nn.ctc_loss?
tf: 1.13.1
pytorch: 1.1.0
I had try to these:
log_softmax the input, and then send it to pytorch.nn.CTCLoss,
tf.nn.log_softmax the input, and then send it to tf.nn.ctc_loss
directly send the input to tf.nn.ctc_loss
directly send the input to tf.nn.ctc_loss, and then math.log(output of tf.nn.ctc_loss)
In the case 2, case 3, and case 4, the result of calculation is difference from pytorch.nn.CTCLoss
from torch import nn
import torch
import tensorflow as tf
import math
time_step = 50 # Input sequence length
vocab_size = 20 # Number of classes
batch_size = 16 # Batch size
target_sequence_length = 30 # Target sequence length
def dense_to_sparse(dense_tensor, sequence_length):
indices = tf.where(tf.sequence_mask(sequence_length))
values = tf.gather_nd(dense_tensor, indices)
shape = tf.shape(dense_tensor, out_type=tf.int64)
return tf.SparseTensor(indices, values, shape)
def compute_loss(x, y, x_len):
ctclosses = tf.nn.ctc_loss(
y,
tf.cast(x, dtype=tf.float32),
x_len,
preprocess_collapse_repeated=False,
ctc_merge_repeated=False,
ignore_longer_outputs_than_inputs=False
)
ctclosses = tf.reduce_mean(ctclosses)
with tf.Session() as sess:
ctclosses = sess.run(ctclosses)
print(f"tf ctc loss: {ctclosses}")
print(f"tf log(ctc loss): {math.log(ctclosses)}")
minimum_target_length = 10
ctc_loss = nn.CTCLoss(blank=vocab_size - 1)
x = torch.randn(time_step, batch_size, vocab_size) # [size] = T,N,C
y = torch.randint(0, vocab_size - 2, (batch_size, target_sequence_length), dtype=torch.long) # low, high, [size]
x_lengths = torch.full((batch_size,), time_step, dtype=torch.long) # Length of inputs
y_lengths = torch.randint(minimum_target_length, target_sequence_length, (batch_size,),
dtype=torch.long) # Length of targets can be variable (even if target sequences are constant length)
loss = ctc_loss(x.log_softmax(2).detach(), y, x_lengths, y_lengths)
print(f"torch ctc loss: {loss}")
x = x.numpy()
y = y.numpy()
x_lengths = x_lengths.numpy()
y_lengths = y_lengths.numpy()
x = tf.cast(x, dtype=tf.float32)
y = tf.cast(dense_to_sparse(y, y_lengths), dtype=tf.int32)
compute_loss(x, y, x_lengths)
I expect the output of tf.nn.ctc_loss is same with the output of pytorch.nn.CTCLoss, but actually they are not, but how can i make them same?
The automatic mean reduction of the CTCLoss of pytorch is not the same as computing all the individual losses, and then doing the mean (as you are doing in the Tensorflow implementation). Indeed from the doc of CTCLoss (pytorch):
``'mean'``: the output losses will be divided by the target lengths and
then the mean over the batch is taken.
To obtain the same value:
1- Change the reduction method to sum:
ctc_loss = nn.CTCLoss(reduction='sum')
2- Divide the loss computed by the batch_size:
loss = ctc_loss(x.log_softmax(2).detach(), y, x_lengths, y_lengths)
loss = (loss.item())/batch_size
3- Change the parameter ctc_merge_repeated of Tensorflow to True (I am assuming it is the case in the pytorch CTC as well)
ctclosses = tf.nn.ctc_loss(
y,
tf.cast(x, dtype=tf.float32),
x_len,
preprocess_collapse_repeated=False,
ctc_merge_repeated=True,
ignore_longer_outputs_than_inputs=False
)
You will now get very close results between the pytorch loss and the tensorflow loss (without taking the log of the value). The small difference remaining probably comes from slight differences in between the implementations.
In my last three runs, I got the following values:
pytorch loss : 113.33 vs tf loss = 113.52
pytorch loss : 116.30 vs tf loss = 115.57
pytorch loss : 115.67 vs tf loss = 114.54
The following code has the irritating trait of making every row of "out" the same. I am trying to classify k time series in Xtrain as [1,0,0,0], [0,1,0,0], [0,0,1,0], or [0,0,0,1], according to the way they were generated (by one of four random algorithms). Anyone know why? Thanks!
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import copy
n = 100
m = 10
k = 1000
hidden_layers = 50
learning_rate = .01
training_epochs = 10000
Xtrain = []
Xtest = []
Ytrain = []
Ytest = []
# ... fill variables with data ..
x = tf.placeholder(tf.float64,shape = (k,1,n,1))
y = tf.placeholder(tf.float64,shape = (k,1,4))
conv1_weights = 0.1*tf.Variable(tf.truncated_normal([1,m,1,hidden_layers],dtype = tf.float64))
conv1_biases = tf.Variable(tf.zeros([hidden_layers],tf.float64))
conv = tf.nn.conv2d(x,conv1_weights,strides = [1,1,1,1],padding = 'VALID')
sigmoid1 = tf.nn.sigmoid(conv + conv1_biases)
s = sigmoid1.get_shape()
sigmoid1_reshape = tf.reshape(sigmoid1,(s[0],s[1]*s[2]*s[3]))
sigmoid2 = tf.nn.sigmoid(tf.layers.dense(sigmoid1_reshape,hidden_layers))
sigmoid3 = tf.nn.sigmoid(tf.layers.dense(sigmoid2,4))
penalty = tf.reduce_sum((sigmoid3 - y)**2)
train_op = tf.train.AdamOptimizer(learning_rate).minimize(penalty)
model = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(model)
for i in range(0,training_epochs):
sess.run(train_op,{x: Xtrain,y: Ytrain})
out = sigmoid3.eval(feed_dict = {x: Xtest})
Likely because your loss function is mean squared error. If you're doing classification you should be using cross-entropy loss
Your loss is penalty = tf.reduce_sum((sigmoid3 - y)**2) that's the squared difference elementwise between a batch of predictions and a batch of values.
Your network output (sigmoid3) is a tensor with shape [?, 4] and y (I guess) is a tensor with shape [?, 4] too.
The squared difference has thus shape [?, 4].
This means that the tf.reduce_sum is computing in order:
The sum over the second dimension of the squared difference, producing a tensor with shape [?]
The sum over the first dimension (the batch size, here indicated with ?) producing a scalar value (shape ()) that's your loss value.
Probably you don't want this behavior (the sum over the batch dimension) and you're looking for the mean squared error over the batch:
penalty = tf.reduce_mean(tf.squared_difference(sigmoid3, y))
I am new to TensorFlow RNN prediction.
I am trying to use RNN with BasicLSTMCell to predict sequence, such as
1,2,3,4,5 ->6
3,4,5,6,7 ->8
35,36,37,38,39 ->40
My code doesn't report error, but outputs for every batch seem to be the same, and the cost seem to not reduce while training.
When I divided all training data by 100
0.01,0.02,0.03,0.04,0.05 ->0.06
0.03,0.04,0.05,0.06,0.07 ->0.08
0.35,0.36,0.37,0.38,0.39 ->0.40
The result is pretty good, the correlation between prediction and real values is very high (0.9998).
I suspect the problem is because integer and float? but I cannot explain the reason. Anyone can help? Many thanks!!
Here is the code
library(tensorflow)
start=sample(1:1000, 100000, T)
start1= start+1
start2=start1+1
start3= start2+1
start4=start3+1
start5= start4+1
start6=start5+1
label=start6+1
data=data.frame(start, start1, start2, start3, start4, start5, start6, label)
data=as.matrix(data)
n = nrow(data)
trainIndex = sample(1:n, size = round(0.7*n), replace=FALSE)
train = data[trainIndex ,]
test = data[-trainIndex ,]
train_data= train[,1:7]
train_label= train[,8]
means=apply(train_data, 2, mean)
sds= apply(train_data, 2, sd)
train_data=(train_data-means)/sds
test_data=test[,1:7]
test_data=(test_data-means)/sds
test_label=test[,8]
batch_size = 50L
n_inputs = 1L # MNIST data input (img shape: 28*28)
n_steps = 7L # time steps
n_hidden_units = 10L # neurons in hidden layer
n_outputs = 1L # MNIST classes (0-9 digits)
x = tf$placeholder(tf$float32, shape(NULL, n_steps, n_inputs))
y = tf$placeholder(tf$float32, shape(NULL, 1L))
weights_in= tf$Variable(tf$random_normal(shape(n_inputs, n_hidden_units)))
weights_out= tf$Variable(tf$random_normal(shape(n_hidden_units, 1L)))
biases_in=tf$Variable(tf$constant(0.1, shape= shape(n_hidden_units )))
biases_out = tf$Variable(tf$constant(0.1, shape=shape(1L)))
RNN=function(X, weights_in, weights_out, biases_in, biases_out)
{
X = tf$reshape(X, shape=shape(-1, n_inputs))
X_in = tf$sigmoid (tf$matmul(X, weights_in) + biases_in)
X_in = tf$reshape(X_in, shape=shape(-1, n_steps, n_hidden_units)
lstm_cell = tf$contrib$rnn$BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=T)
init_state = lstm_cell$zero_state(batch_size, dtype=tf$float32)
outputs_final_state = tf$nn$dynamic_rnn(lstm_cell, X_in, initial_state=init_state, time_major=F)
outputs= tf$unstack(tf$transpose(outputs_final_state[[1]], shape(1,0,2)))
results = tf$matmul(outputs[[length(outputs)]], weights_out) + biases_out
return(results)
}
pred = RNN(x, weights_in, weights_out, biases_in, biases_out)
cost = tf$losses$mean_squared_error(pred, y)
train_op = tf$contrib$layers$optimize_loss(loss=cost, global_step=tf$contrib$framework$get_global_step(), learning_rate=0.05, optimizer="SGD")
init <- tf$global_variables_initializer()
sess <- tf$Session()
sess.run(init)
step = 0
while (step < 1000)
{
train_data2= train_data[(step*batch_size+1) : (step*batch_size+batch_size) , ]
train_label2=train_label[(step*batch_size+1):(step*batch_size+batch_size)]
batch_xs <- sess$run(tf$reshape(train_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
batch_ys= matrix(train_label2, ncol=1)
sess$run(train_op, feed_dict = dict(x = batch_xs, y= batch_ys))
mycost <- sess$run(cost, feed_dict = dict(x = batch_xs, y= batch_ys))
print (mycost)
test_data2= test_data[(0*batch_size+1) : (0*batch_size+batch_size) , ]
test_label2=test_label[(0*batch_size+1):(0*batch_size+batch_size)]
batch_xs <- sess$run(tf$reshape(test_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
batch_ys= matrix(test_label2, ncol=1)
step=step+1
}
First, it's quite useful to always normalize your network inputs (there are different approaches, divide by a maximum value, subtract mean and divide by std and many more). This will help your optimizer a lot.
Second, and actually most important in your case, after the RNN output you are applying sigmoid function. If you check the plot of the sigmoid function, you will see that it actually scales all inputs to the range (0,1). So basically no matter how big your inputs are your output will always be at most 1. Thus you should not use any activation functions at the output layer in regression problems.
Hope it helps.