ctc_loss error "No valid path found." - tensorflow

Training a model with tf.nn.ctc_loss produces an error every time the train op is run:
tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
Unlike in previous questions about this function, this is not due to divergence. I have a low learning rate, and the error occurs on even the first train op.
The model is a CNN -> LSTM -> CTC. Here is the model creation code:
# Build Graph
self.videoInput = tf.placeholder(shape=(None, self.maxVidLen, 50, 100, 3), dtype=tf.float32)
self.videoLengths = tf.placeholder(shape=(None), dtype=tf.int32)
self.keep_prob = tf.placeholder(dtype=tf.float32)
self.targets = tf.sparse_placeholder(tf.int32)
self.targetLengths = tf.placeholder(shape=(None), dtype=tf.int32)
conv1 = tf.layers.conv3d(self.videoInput ...)
pool1 = tf.layers.max_pooling3d(conv1 ...)
conv2 = ...
pool2 = ...
conv3 = ...
pool3 = ...
cnn_out = tf.reshape(pool3, shape=(-1, self.maxVidLength, 4*7*96))
fw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
bw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
outputs, _ = tf.nn.bidirectional_dynamic_rnn(
fw_cell, bw_cell, cnn_out, sequence_length=self.videoLengths, dtype=tf.float32)
outputs = tf.concat(outputs, 2)
outputs = tf.reshape(outputs, [-1, self.hidden_size * 2])
w = tf.Variable(tf.random_normal((self.hidden_size * 2, len(self.char2index) + 1), stddev=0.2))
b = tf.Variable(tf.zeros(len(self.char2index) + 1))
out = tf.matmul(outputs, w) + b
out = tf.reshape(out, [-1, self.maxVidLen, len(self.char2index) + 1])
out = tf.transpose(out, [1, 0, 2])
cost = tf.reduce_mean(tf.nn.ctc_loss(self.targets, out, self.targetLengths))
self.train_op = tf.train.AdamOptimizer(0.0001).minimize(cost)
And here is the feed dict creation code:
indices = []
values = []
shape = [len(vids) * 2, self.maxLabelLen]
vidInput = np.zeros((len(vids) * 2, self.maxVidLen, 50, 100, 3), dtype=np.float32)
# Actual video, then left-right flip
for j in range(len(vids) * 2):
# K is video index
k = j if j < len(vids) else j - len(vids)
# convert video and label to input format
vidInput[j, 0:len(vids[k])] = vids[k] if k == j else vids[k][:,::-1,:]
indices.extend([j, i] for i in range(len(labelList[k])))
values.extend(self.char2index[c] for c in labelList[k])
fd[self.targets] = (indices, values, shape)
fd[self.videoInput] = vidInput
# Collect video lengths and label lengths
vidLengths = [len(j) for j in vids] + [len(j) for j in vids]
labelLens = [len(l) for l in labelList] + [len(l) for l in labelList]
fd[self.videoLengths] = vidLengths
fd[self.targetLengths] = labelLens

It turns out that the ctc_loss requires that the label lengths be shorter than the input lengths. If the label lengths are too long, the loss calculator cannot unroll completely and therefore cannot compute the loss.
For example, the label BIFI would require input length of at least 4 while the label BIIF would require input length of at least 5 due to a blank being inserted between the repeated symbols.

I had the same issue but I soon realized it was just because I was using glob and my label was in the filename so it was exceeding.
You can fix this issue by using:
os.path.join(*(filename.split(os.path.sep)[noOfDir:]))

For me the problem was fixed by setting preprocess_collapse_repeated=True.
FWIW: My target sequence length was already shorter than inputs, and the RNN outputs are that of softmax.

Another possible reason which I found out in my case is the input data range is not normalized to 0~1, due to that LSTM activation function becomes saturated in the beginning of the training, and causes "no valid path" log somehow.

Related

How to display the convolution filters used on a CNN with Tensorflow?

I would like to produce figures similar to this one:
To do that, with Tensorflow I load my model and then, using this code I am about to select the variable with filters from one layer :
# search for the name of the specific layer with the filters I want to display
for v in tf.trainable_variables():
print(v.name)
# store the filters into a variable
var = [v for v in tf.trainable_variables() if v.name == "model/center/kernel:0"][0]
doing var.eval() I am able to store var into a numpy array.
This numpy array have this shape: (3, 3, 512, 512) which correspond to the kernel size: 3x3 and the number of filters: 512.
My problem is the following: How can I extract 1 filter from this 3,3,512,512 array to display it ? If I understand how to do that, I will find how to display the 512 filters
Since you are using Tensorflow, you might be using tf.keras.Sequential for building the CNN Model, and model.summary() gives the names of all the Layers, along with Shapes, as shown below:
Once you have the Layer Name, you can Visualize the Convolutional Filters of that Layer of CNN as shown in the code below:
#-------------------------------------------------
#Utility function for displaying filters as images
#-------------------------------------------------
def deprocess_image(x):
x -= x.mean()
x /= (x.std() + 1e-5)
x *= 0.1
x += 0.5
x = np.clip(x, 0, 1)
x *= 255
x = np.clip(x, 0, 255).astype('uint8')
return x
#---------------------------------------------------------------------------------------------------
#Utility function for generating patterns for given layer starting from empty input image and then
#applying Stochastic Gradient Ascent for maximizing the response of particular filter in given layer
#---------------------------------------------------------------------------------------------------
def generate_pattern(layer_name, filter_index, size=150):
layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])
grads = K.gradients(loss, model.input)[0]
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)
iterate = K.function([model.input], [loss, grads])
input_img_data = np.random.random((1, size, size, 3)) * 20 + 128.
step = 1.
for i in range(80):
loss_value, grads_value = iterate([input_img_data])
input_img_data += grads_value * step
img = input_img_data[0]
return deprocess_image(img)
#------------------------------------------------------------------------------------------
#Generating convolution layer filters for intermediate layers using above utility functions
#------------------------------------------------------------------------------------------
layer_name = 'conv2d_4'
size = 299
margin = 5
results = np.zeros((8 * size + 7 * margin, 8 * size + 7 * margin, 3))
for i in range(8):
for j in range(8):
filter_img = generate_pattern(layer_name, i + (j * 8), size=size)
horizontal_start = i * size + i * margin
horizontal_end = horizontal_start + size
vertical_start = j * size + j * margin
vertical_end = vertical_start + size
results[horizontal_start: horizontal_end, vertical_start: vertical_end, :] = filter_img
plt.figure(figsize=(20, 20))
plt.savefig(results)
The above code Visualizes only 64 filters of a Layer. You can change it accordingly.
For more information, you can refer this article.

How do I make the bounding boxes in yolo v3 tighter (closer to the objects)?

I'm following this Repo on creating Yolo v3 model from scratch in PyTorch. The only problem is that the bounding boxes are not as tight (close to the objects) in most images I tried. I compared them to the tutorial on creating Yolo v3 model but using TensorFlow. The tensorflow model produces excellent bounding boxed that are as tight as possible to the objects.
I tried to understand how the calculations are different between the two, but I'm finding myself getting stuck with the differences between torch and tf.
I believe the code for the bounding boxes in the tf tutorial comes from here:
def yolo_layer(inputs, n_classes, anchors, img_size, data_format):
"""Creates Yolo final detection layer.
Detects boxes with respect to anchors.
Args:
inputs: Tensor input.
n_classes: Number of labels.
anchors: A list of anchor sizes.
img_size: The input size of the model.
data_format: The input format.
Returns:
Tensor output.
"""
n_anchors = len(anchors)
inputs = tf.layers.conv2d(inputs, filters=n_anchors * (5 + n_classes),
kernel_size=1, strides=1, use_bias=True,
data_format=data_format)
shape = inputs.get_shape().as_list()
grid_shape = shape[2:4] if data_format == 'channels_first' else shape[1:3]
if data_format == 'channels_first':
inputs = tf.transpose(inputs, [0, 2, 3, 1])
inputs = tf.reshape(inputs, [-1, n_anchors * grid_shape[0] * grid_shape[1],
5 + n_classes])
strides = (img_size[0] // grid_shape[0], img_size[1] // grid_shape[1])
box_centers, box_shapes, confidence, classes = \
tf.split(inputs, [2, 2, 1, n_classes], axis=-1)
x = tf.range(grid_shape[0], dtype=tf.float32)
y = tf.range(grid_shape[1], dtype=tf.float32)
x_offset, y_offset = tf.meshgrid(x, y)
x_offset = tf.reshape(x_offset, (-1, 1))
y_offset = tf.reshape(y_offset, (-1, 1))
x_y_offset = tf.concat([x_offset, y_offset], axis=-1)
x_y_offset = tf.tile(x_y_offset, [1, n_anchors])
x_y_offset = tf.reshape(x_y_offset, [1, -1, 2])
box_centers = tf.nn.sigmoid(box_centers)
box_centers = (box_centers + x_y_offset) * strides
anchors = tf.tile(anchors, [grid_shape[0] * grid_shape[1], 1])
box_shapes = tf.exp(box_shapes) * tf.to_float(anchors)
confidence = tf.nn.sigmoid(confidence)
classes = tf.nn.sigmoid(classes)
inputs = tf.concat([box_centers, box_shapes,
confidence, classes], axis=-1)
return inputs
While the code for the bounding boxes for the pytorch model comes from here, and the explanation:
def bbox_iou(box1, box2):
"""
Returns the IoU of two bounding boxes
"""
#Get the coordinates of bounding boxes
b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]
#get the corrdinates of the intersection rectangle
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
#Intersection area
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)
#Union Area
b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area)
return iou
def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):
batch_size = prediction.size(0)
stride = inp_dim // prediction.size(2)
grid_size = inp_dim // stride
bbox_attrs = 5 + num_classes
num_anchors = len(anchors)
prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
prediction = prediction.transpose(1,2).contiguous()
prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)
anchors = [(a[0]/stride, a[1]/stride) for a in anchors]
#Sigmoid the centre_X, centre_Y. and object confidencce
prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])
#Add the center offsets
grid = np.arange(grid_size)
a,b = np.meshgrid(grid, grid)
x_offset = torch.FloatTensor(a).view(-1,1)
y_offset = torch.FloatTensor(b).view(-1,1)
if CUDA:
x_offset = x_offset.cuda()
y_offset = y_offset.cuda()
x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)
prediction[:,:,:2] += x_y_offset
#log space transform height and the width
anchors = torch.FloatTensor(anchors)
if CUDA:
anchors = anchors.cuda()
anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors
prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))
prediction[:,:,:4] *= stride
return prediction

Pairwise distance between a set of Matrices in Keras/Tensorflow

I want to calculate pairwise distance between a set of Tensor (e.g 4 Tensor). Each matrix is 2D Tensor. I don't know how to do this in vectorize format. I wrote following sudo-code to determine what I need:
E.shape => [4,30,30]
sum = 0
for i in range(4):
for j in range(4):
res = calculate_distance(E[i],E[j]) # E[i] is one the 30*30 Tensor
sum = sum + reduce_sum(res)
Here is my last try:
x_ = tf.expand_dims(E, 0)
y_ = tf.expand_dims(E, 1)
s = x_ - y_
P = tf.reduce_sum(tf.norm(s, axis=[-2, -1]))
This code works But I don't know how do this in a Batch. For instance when E.shape is [BATCH_SIZE * 4 * 30 * 30] my code doesn't work and Out Of Memory will happen. How can I do this efficiently?
Edit: After a day, I find a solution. it's not perfect but works:
res = tf.map_fn(lambda x: tf.map_fn(lambda y: tf.map_fn(lambda z: tf.norm(z - x), x), x), E)
res = tf.reduce_mean(tf.square(res))
Your solution with expand_dims should be okay if your batch size is not too large. However, given that your original pseudo code loops over range(4), you should probably expand axes 1 and 2, instead of 0 and 1.
You can check the shape of the tensors to ensure that you're specifying the correct axes. For example,
batch_size = 8
E_np = np.random.rand(batch_size, 4, 30, 30)
E = K.variable(E_np) # shape=(8, 4, 30, 30)
x_ = K.expand_dims(E, 1)
y_ = K.expand_dims(E, 2)
s = x_ - y_ # shape=(8, 4, 4, 30, 30)
distances = tf.norm(s, axis=[-2, -1]) # shape=(8, 4, 4)
P = K.sum(distances, axis=[-2, -1]) # shape=(8,)
Now P will be the sum of pairwise distances between the 4 matrices for each of the 8 samples.
You can also verify that the values in P is the same as what would be computed in your pseudo code:
answer = []
for batch_idx in range(batch_size):
s = 0
for i in range(4):
for j in range(4):
a = E_np[batch_idx, i]
b = E_np[batch_idx, j]
s += np.sqrt(np.trace(np.dot(a - b, (a - b).T)))
answer.append(s)
print(answer)
[149.45960605637578, 147.2815068236368, 144.97487402393705, 146.04866735065312, 144.25537059201062, 148.9300986019226, 146.61229889228133, 149.34259789169045]
print(K.eval(P).tolist())
[149.4595947265625, 147.281494140625, 144.97488403320312, 146.04867553710938, 144.25537109375, 148.9300994873047, 146.6123046875, 149.34259033203125]
Tensorflow allows to compute the Frobenius norm via tf.norm function. In case of 2D matrices, it's equivalent to 1-norm.
The following solution isn't vectorized and assumes that the first dimension in E is known statically:
E = tf.random_normal(shape=[5, 3, 3], dtype=tf.float32)
F = tf.split(E, E.shape[0])
total = tf.reduce_sum([tf.norm(tensor=(lhs-rhs), ord=1, axis=(-2, -1)) for lhs in F for rhs in F])
Update:
An optimized vectorized version of the same code:
E = tf.random_normal(shape=[1024, 4, 30, 30], dtype=tf.float32)
lhs = tf.expand_dims(E, axis=1)
rhs = tf.expand_dims(E, axis=2)
total = tf.reduce_sum(tf.norm(tensor=(lhs - rhs), ord=1, axis=(-2, -1)))
Memory concerns: upon evaluating this code,
tf.contrib.memory_stats.MaxBytesInUse() reports that the peak memory consumption is 73729792 = 74Mb, which indicates relatively moderate overhead (the raw lhs-rhs tensor is 59Mb). Your OOM is most likely caused by the duplication of BATCH_SIZE dimension when you compute s = x_ - y_, because your batch size is much larger than the number of matrices (1024 vs 4).

Why does the cost in my implementation of a deep neural network increase after a few iterations?

I am a beginner in machine learning and neural networks. Recently, after watching Andrew Ng's lectures on deep learning, I tried to implement a binary classifier using deep neural networks on my own.
However, the cost of the function is expected to decrease after each iteration.
In my program, it decreases slightly in the beginning, but rapidly increases later. I tried to make changes in learning rate and number of iterations, but to no avail. I am very confused.
Here is my code
1. Neural network classifier class
class NeuralNetwork:
def __init__(self, X, Y, dimensions, alpha=1.2, iter=3000):
self.X = X
self.Y = Y
self.dimensions = dimensions # Including input layer and output layer. Let example be dimensions=4
self.alpha = alpha # Learning rate
self.iter = iter # Number of iterations
self.length = len(self.dimensions)-1
self.params = {} # To store parameters W and b for each layer
self.cache = {} # To store cache Z and A for each layer
self.grads = {} # To store dA, dZ, dW, db
self.cost = 1 # Initial value does not matter
def initialize(self):
np.random.seed(3)
# If dimensions is 4, then layer 0 and 3 are input and output layers
# So we only need to initialize w1, w2 and w3
# There is no need of w0 for input layer
for l in range(1, len(self.dimensions)):
self.params['W'+str(l)] = np.random.randn(self.dimensions[l], self.dimensions[l-1])*0.01
self.params['b'+str(l)] = np.zeros((self.dimensions[l], 1))
def forward_propagation(self):
self.cache['A0'] = self.X
# For last layer, ie, the output layer 3, we need to activate using sigmoid
# For layer 1 and 2, we need to use relu
for l in range(1, len(self.dimensions)-1):
self.cache['Z'+str(l)] = np.dot(self.params['W'+str(l)], self.cache['A'+str(l-1)]) + self.params['b'+str(l)]
self.cache['A'+str(l)] = relu(self.cache['Z'+str(l)])
l = len(self.dimensions)-1
self.cache['Z'+str(l)] = np.dot(self.params['W'+str(l)], self.cache['A'+str(l-1)]) + self.params['b'+str(l)]
self.cache['A'+str(l)] = sigmoid(self.cache['Z'+str(l)])
def compute_cost(self):
m = self.Y.shape[1]
A = self.cache['A'+str(len(self.dimensions)-1)]
self.cost = -1/m*np.sum(np.multiply(self.Y, np.log(A)) + np.multiply(1-self.Y, np.log(1-A)))
self.cost = np.squeeze(self.cost)
def backward_propagation(self):
A = self.cache['A' + str(len(self.dimensions) - 1)]
m = self.X.shape[1]
self.grads['dA'+str(len(self.dimensions)-1)] = -(np.divide(self.Y, A) - np.divide(1-self.Y, 1-A))
# Sigmoid derivative for final layer
l = len(self.dimensions)-1
self.grads['dZ' + str(l)] = self.grads['dA' + str(l)] * sigmoid_prime(self.cache['Z' + str(l)])
self.grads['dW' + str(l)] = 1 / m * np.dot(self.grads['dZ' + str(l)], self.cache['A' + str(l - 1)].T)
self.grads['db' + str(l)] = 1 / m * np.sum(self.grads['dZ' + str(l)], axis=1, keepdims=True)
self.grads['dA' + str(l - 1)] = np.dot(self.params['W' + str(l)].T, self.grads['dZ' + str(l)])
# Relu derivative for previous layers
for l in range(len(self.dimensions)-2, 0, -1):
self.grads['dZ'+str(l)] = self.grads['dA'+str(l)] * relu_prime(self.cache['Z'+str(l)])
self.grads['dW'+str(l)] = 1/m*np.dot(self.grads['dZ'+str(l)], self.cache['A'+str(l-1)].T)
self.grads['db'+str(l)] = 1/m*np.sum(self.grads['dZ'+str(l)], axis=1, keepdims=True)
self.grads['dA'+str(l-1)] = np.dot(self.params['W'+str(l)].T, self.grads['dZ'+str(l)])
def update_parameters(self):
for l in range(1, len(self.dimensions)):
self.params['W'+str(l)] = self.params['W'+str(l)] - self.alpha*self.grads['dW'+str(l)]
self.params['b'+str(l)] = self.params['b'+str(l)] - self.alpha*self.grads['db'+str(l)]
def train(self):
np.random.seed(1)
self.initialize()
for i in range(self.iter):
#print(self.params)
self.forward_propagation()
self.compute_cost()
self.backward_propagation()
self.update_parameters()
if i % 100 == 0:
print('Cost after {} iterations is {}'.format(i, self.cost))
2. Testing code for odd or even number classifier
import numpy as np
from main import NeuralNetwork
X = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
Y = np.array([[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]])
clf = NeuralNetwork(X, Y, [1, 1, 1], alpha=0.003, iter=7000)
clf.train()
3. Helper Code
import math
import numpy as np
def sigmoid_scalar(x):
return 1/(1+math.exp(-x))
def sigmoid_prime_scalar(x):
return sigmoid_scalar(x)*(1-sigmoid_scalar(x))
def relu_scalar(x):
if x > 0:
return x
else:
return 0
def relu_prime_scalar(x):
if x > 0:
return 1
else:
return 0
sigmoid = np.vectorize(sigmoid_scalar)
sigmoid_prime = np.vectorize(sigmoid_prime_scalar)
relu = np.vectorize(relu_scalar)
relu_prime = np.vectorize(relu_prime_scalar)
Output
I believe your cross-entropy derivative is wrong. Instead of this:
# WRONG!
self.grads['dA'+str(len(self.dimensions)-1)] = -(np.divide(self.Y, A) - np.divide(1-self.Y, A))
... do this:
# CORRECT
self.grads['dA'+str(len(self.dimensions)-1)] = np.divide(A - self.Y, (1 - A) * A)
See these lecture notes for the details. I think you meant the formula (5), but forgot 1-A. Anyway, use formula (6).

How to use `sparse_softmax_cross_entropy_with_logits`: without getting Incompatible Shapes Error

I would like to use the sparse_softmax_cross_entropy_with_logits
with the julia TensorFlow wrapper.
The operations is defined in the code here.
Basically, as I understand it the first argument should be logits, that would normally be fed to softmax to get them to be category probabilities (~1hot output).
And the second should be the correct labels as label ids.
I have adjusted the example code from the TensorFlow.jl readme
See below:
using Distributions
using TensorFlow
# Generate some synthetic data
x = randn(100, 50)
w = randn(50, 10)
y_prob = exp(x*w)
y_prob ./= sum(y_prob,2)
function draw(probs)
y = zeros(size(probs))
for i in 1:size(probs, 1)
idx = rand(Categorical(probs[i, :]))
y[i, idx] = 1
end
return y
end
y = draw(y_prob)
# Build the model
sess = Session(Graph())
X = placeholder(Float64)
Y_obs = placeholder(Float64)
Y_obs_lbl = indmax(Y_obs, 2)
variable_scope("logisitic_model", initializer=Normal(0, .001)) do
global W = get_variable("weights", [50, 10], Float64)
global B = get_variable("bias", [10], Float64)
end
L = X*W + B
Y=nn.softmax(L)
#costs = log(Y).*Y_obs #Dense (Orginal) way
costs = nn.sparse_softmax_cross_entropy_with_logits(L, Y_obs_lbl+1) #sparse way
Loss = -reduce_sum(costs)
optimizer = train.AdamOptimizer()
minimize_op = train.minimize(optimizer, Loss)
saver = train.Saver()
# Run training
run(sess, initialize_all_variables())
cur_loss, _ = run(sess, [Loss, minimize_op], Dict(X=>x, Y_obs=>y))
When I run it however, I get an error:
Tensorflow error: Status: Incompatible shapes: [1,100] vs. [100,10]
[[Node: gradients/SparseSoftmaxCrossEntropyWithLogits_10_grad/mul = Mul[T=DT_DOUBLE, _class=[], _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/SparseSoftmaxCrossEntropyWithLogits_10_grad/ExpandDims, SparseSoftmaxCrossEntropyWithLogits_10:1)]]
in check_status(::TensorFlow.Status) at /home/ubuntu/.julia/v0.5/TensorFlow/src/core.jl:101
in run(::TensorFlow.Session, ::Array{TensorFlow.Port,1}, ::Array{Any,1}, ::Array{TensorFlow.Port,1}, ::Array{Ptr{Void},1}) at /home/ubuntu/.julia/v0.5/TensorFlow/src/run.jl:96
in run(::TensorFlow.Session, ::Array{TensorFlow.Tensor,1}, ::Dict{TensorFlow.Tensor,Array{Float64,2}}) at /home/ubuntu/.julia/v0.5/TensorFlow/src/run.jl:143
This only happens when I try to train it.
If I don't include an optimise function/output then it works fine.
So I am doing something that screws up the gradient math.