Permutation Invariant Loss in Tensorflow - tensorflow

I am working on a permutation invariant loss in Tensorflow 2.8.
The Loss takes two vectorized matrices of shape (N x 5), reshapes them (N,5) and then calculates all possible permutations (N!).
Then for all permutations a loss is calculated, and the minimum of this loss is used (for the best match).
However, I get the error message:
File "C:\Users\meist\anaconda3\envs\tf-2-8\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\meist\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\framework\func_graph.py", line 1147, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
File "C:\Users\meist\anaconda3\envs\tf-2-8\lib\site-packages\keras\engine\training.py", line 1021, in train_function *
return step_function(self, iterator)
File "C:\Users\meist\anaconda3\envs\tf-2-8\lib\site-packages\keras\engine\training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\meist\anaconda3\envs\tf-2-8\lib\site-packages\keras\engine\training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "C:\Users\meist\anaconda3\envs\tf-2-8\lib\site-packages\keras\engine\training.py", line 863, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "C:\Users\meist\anaconda3\envs\tf-2-8\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 532, in minimize
return self.apply_gradients(grads_and_vars, name=name)
File "C:\Users\meist\anaconda3\envs\tf-2-8\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 633, in apply_gradients
grads_and_vars = optimizer_utils.filter_empty_gradients(grads_and_vars)
File "C:\Users\meist\anaconda3\envs\tf-2-8\lib\site-packages\keras\optimizer_v2\utils.py", line 73, in filter_empty_gradients
raise ValueError(f"No gradients provided for any variable: {variable}. "
ValueError: No gradients provided for any variable
Apparently there are no gradients. However, when I simply input a y_train, and y_pred I do get a loss. Here is the loss code:
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.losses import Loss
import tensorflow as tf
from itertools import permutations
import numpy as np
import keras.backend as K
class PermInvLoss(Loss):
'''
This loss is supposed to return the minimum loss, based on the best metching of y_true and y_pred.
y_true is of dim [batchsize, Nmix x 5], and will be reshaped to [batchsize, Nmix, 5] in the call
Nmix are the number of vectors that can be permutated. The elements within the vector are fixed.
The 5 elements are [class_value,class_value,reg_value,reg_value,reg_value]
The two class values will be evaluated with CategoricalCrossentropy
The three regression values will be evaluted with MSE.
'''
def __init__(self,Nmix = 3):
super(PermInvLoss, self).__init__() # is this correct?
self.name = 'perm_inv_loss'
self.cce = CategoricalCrossentropy()
self.shape = (-1,Nmix,5) # for transforming y_true, and y_pred
variants = np.math.factorial(Nmix) # number of possible permut.
permutation_idx = list(permutations(np.arange(Nmix))) # list of permutations
perm = tf.constant(permutation_idx)
self.perm_mat = tf.constant(np.eye(Nmix)[permutation_idx],dtype = tf.float32) # permutation matrix for y_pred
eye = tf.eye(Nmix,dtype=tf.int32) # eye matrix
self.rep_mat = tf.broadcast_to(eye[tf.newaxis,...],(variants,Nmix,Nmix)) # repetition matrix for y_true
def MSE(self,y_true,y_pred,axis=(-2,-1)):
# simple MSE implementation with axis
mse = K.mean(K.square(K.abs(y_true-y_pred)),axis=axis)
return mse
def call(self, y_true, y_pred):
# reshape to [batchsize, Nmix, 5]
y_true = K.reshape(y_true,self.shape)
y_pred = K.reshape(y_pred ,self.shape)
# now y_pred is permutaed in one extra dimension (variants)
y_perm = tf.linalg.matmul(tf.cast(self.perm_mat,tf.float32),y_pred[:,tf.newaxis,...]) # [batchsize x variants x Nmix x 5]
# same for y_true, but with the repetition matrix
y_true = tf.linalg.matmul(tf.cast(self.one_mat,tf.float32),y_pred[:,tf.newaxis,...])
# print(y_perm.shape) # [batchsize x variants x Nmix x 5]
# print(y_true.shape) # [batchsize x variants x Nmix x 5]
# now we have on the second dimension all possible permutations of y_pred and can evaluate them against y_true of the same shape
# CategoricalCrossentropy for the first two values (classification)
cce = CategoricalCrossentropy(reduction='none',axis=(-1))
CE = K.sum(cce(y_true[...,:2], y_perm[...,:2]),axis=-1) # [batchsize x variants]
# MSE for other values (regression)
mse = self.MSE(y_true[...,2:], y_perm[...,2:]) # [batchsize x variants]
loss = K.min(CE+mse,axis=-1) # calculates minimum loss over the variants [batchsize]
return loss
Is the Class wrong, or is there really no Gradient?

I found the mistake, the line:
y_true = tf.linalg.matmul(tf.cast(self.rep_mat,tf.float32),y_pred[:,tf.newaxis,...])
should obviously be
y_true = tf.linalg.matmul(tf.cast(self.rep_mat,tf.float32),y_true[:,tf.newaxis,...])

Related

In 'tensorflow unable to take 'log'

I am working on CapsNet and taking code help from here. Simulation is performed on google colab with tensorflow = 2.4.0. I am getting following error:
AttributeError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/content/drive/My Drive/Cervical GAN/Segmentation/Cheng-Lin-Li/SegCaps-master-aashish/utils/custom_losses.py:102 dice_loss *
return 1-dice_soft(y_true, y_pred, from_logits=False)
/content/drive/My Drive/Cervical GAN/Segmentation/Cheng-Lin-Li/SegCaps-master-aashish/utils/custom_losses.py:41 dice_soft *
y_pred = tf.log(y_pred / (1 - y_pred))
AttributeError: module 'tensorflow' has no attribute 'log'
Following is custom_losses.py
'''
Capsules for Object Segmentation (SegCaps)
Original Paper: https://arxiv.org/abs/1804.04241
Code written by: Rodney LaLonde
If you use significant portions of this code or the ideas from our paper, please cite it :)
If you have any questions, please email me at lalonde#knights.ucf.edu.
This file contains the definitions of custom loss functions not present in the default Keras.
=====
This program includes all custom loss functions UNet, tiramisu, Capsule Nets (capsbasic) or SegCaps(segcapsr1 or segcapsr3).
#author: Cheng-Lin Li a.k.a. Clark
#copyright: 2018 Cheng-Lin Li#Insight AI. All rights reserved.
#license: Licensed under the Apache License v2.0. http://www.apache.org/licenses/
#contact: clark.cl.li#gmail.com
Enhancement:
1. Revise default loss_type to jaccard on dice_soft function.
2. add bce_dice_loss for future usage.
'''
import tensorflow as tf
from keras import backend as K
from keras.losses import binary_crossentropy
def dice_soft(y_true, y_pred, loss_type='jaccard', axis=[1,2,3], smooth=1e-5, from_logits=False):
"""Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity
of two batch of data, usually be used for binary image segmentation
i.e. labels are binary. The coefficient between 0 to 1, 1 means totally match.
Parameters
-----------
y_pred : tensor
A distribution with shape: [batch_size, ....], (any dimensions).
y_true : tensor
A distribution with shape: [batch_size, ....], (any dimensions).
loss_type : string
``jaccard`` or ``sorensen``, default is ``jaccard``.
axis : list of integer
All dimensions are reduced, default ``[1,2,3]``.
smooth : float
This small value will be added to the numerator and denominator.
If both y_pred and y_true are empty, it makes sure dice is 1.
If either y_pred or y_true are empty (all pixels are background), dice = ```smooth/(small_value + smooth)``,
then if smooth is very small, dice close to 0 (even the image values lower than the threshold),
so in this case, higher smooth can have a higher dice.
Examples
---------
>>> outputs = tl.act.pixel_wise_softmax(network.outputs)
>>> dice_loss = 1 - tl.cost.dice_coe(outputs, y_)
References
-----------
- `Wiki-Dice <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient>`_
"""
if not from_logits:
# transform back to logits
_epsilon = tf.convert_to_tensor(1e-7, y_pred.dtype.base_dtype)
y_pred = tf.clip_by_value(y_pred, _epsilon, 1 - _epsilon)
y_pred = tf.log(y_pred / (1 - y_pred))
inse = tf.reduce_sum(y_pred * y_true, axis=axis)
if loss_type == 'jaccard':
l = tf.reduce_sum(y_pred * y_pred, axis=axis)
r = tf.reduce_sum(y_true * y_true, axis=axis)
elif loss_type == 'sorensen':
l = tf.reduce_sum(y_pred, axis=axis)
r = tf.reduce_sum(y_true, axis=axis)
else:
raise Exception("Unknow loss_type")
## old axis=[0,1,2,3]
# dice = 2 * (inse) / (l + r)
# epsilon = 1e-5
# dice = tf.clip_by_value(dice, 0, 1.0-epsilon) # if all empty, dice = 1
## new haodong
dice = (2. * inse + smooth) / (l + r + smooth)
##
dice = tf.reduce_mean(dice)
return dice
def dice_hard(y_true, y_pred, threshold=0.5, axis=[1,2,3], smooth=1e-5):
"""Non-differentiable Sørensen–Dice coefficient for comparing the similarity
of two batch of data, usually be used for binary image segmentation i.e. labels are binary.
The coefficient between 0 to 1, 1 if totally match.
Parameters
-----------
y_pred : tensor
A distribution with shape: [batch_size, ....], (any dimensions).
y_true : tensor
A distribution with shape: [batch_size, ....], (any dimensions).
threshold : float
The threshold value to be true.
axis : list of integer
All dimensions are reduced, default ``[1,2,3]``.
smooth : float
This small value will be added to the numerator and denominator, see ``dice_coe``.
References
-----------
- `Wiki-Dice <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient>`_
"""
y_pred = tf.cast(y_pred > threshold, dtype=tf.float32)
y_true = tf.cast(y_true > threshold, dtype=tf.float32)
inse = tf.reduce_sum(tf.multiply(y_pred, y_true), axis=axis)
l = tf.reduce_sum(y_pred, axis=axis)
r = tf.reduce_sum(y_true, axis=axis)
## old axis=[0,1,2,3]
# hard_dice = 2 * (inse) / (l + r)
# epsilon = 1e-5
# hard_dice = tf.clip_by_value(hard_dice, 0, 1.0-epsilon)
## new haodong
hard_dice = (2. * inse + smooth) / (l + r + smooth)
##
hard_dice = tf.reduce_mean(hard_dice)
return hard_dice
def dice_loss(y_true, y_pred, from_logits=False):
return 1-dice_soft(y_true, y_pred, from_logits=False)
def bce_dice_loss(y_true, y_pred):
return binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)
def weighted_binary_crossentropy_loss(pos_weight):
# pos_weight: A coefficient to use on the positive examples.
def weighted_binary_crossentropy(target, output, from_logits=False):
"""Binary crossentropy between an output tensor and a target tensor.
# Arguments
target: A tensor with the same shape as `output`.
output: A tensor.
from_logits: Whether `output` is expected to be a logits tensor.
By default, we consider that `output`
encodes a probability distribution.
# Returns
A tensor.
"""
# Note: tf.nn.sigmoid_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# transform back to logits
_epsilon = tf.convert_to_tensor(1e-7, output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
return tf.nn.weighted_cross_entropy_with_logits(targets=target,
logits=output,
pos_weight=pos_weight)
return weighted_binary_crossentropy
def margin_loss(margin=0.4, downweight=0.5, pos_weight=1.0):
'''
Args:
margin: scalar, the margin after subtracting 0.5 from raw_logits.
downweight: scalar, the factor for negative cost.
'''
def _margin_loss(labels, raw_logits):
"""Penalizes deviations from margin for each logit.
Each wrong logit costs its distance to margin. For negative logits margin is
0.1 and for positives it is 0.9. First subtract 0.5 from all logits. Now
margin is 0.4 from each side.
Args:
labels: tensor, one hot encoding of ground truth.
raw_logits: tensor, model predictions in range [0, 1]
Returns:
A tensor with cost for each data point of shape [batch_size].
"""
logits = raw_logits - 0.5
positive_cost = pos_weight * labels * tf.cast(tf.less(logits, margin),
tf.float32) * tf.pow(logits - margin, 2)
negative_cost = (1 - labels) * tf.cast(
tf.greater(logits, -margin), tf.float32) * tf.pow(logits + margin, 2)
return 0.5 * positive_cost + downweight * 0.5 * negative_cost
return _margin_loss
The above comes while using dice loss. When using bce loss there is no error. I have tried tf.math.log instead of tf.log but still getting following error:
TypeError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/content/drive/MyDrive/Cervical GAN/Segmentation/Cheng-Lin-Li/SegCaps-master-aashish/utils/custom_losses.py:102 dice_loss *
return 1-dice_soft(y_true, y_pred, from_logits=False)
/content/drive/MyDrive/Cervical GAN/Segmentation/Cheng-Lin-Li/SegCaps-master-aashish/utils/custom_losses.py:43 dice_soft *
inse = tf.reduce_sum(y_pred * y_true, axis=axis)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1180 binary_op_wrapper
raise e
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1164 binary_op_wrapper
return func(x, y, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1496 _mul_dispatch
return multiply(x, y, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper
return target(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:518 multiply
return gen_math_ops.mul(x, y, name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py:6078 mul
"Mul", x=x, y=y, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:558 _apply_op_helper
inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Mul' Op has type uint8 that does not match type float32 of argument 'x'.
The error
TypeError: Input 'y' of 'Mul' Op has type uint8 that does not match type float32 of argument 'x'.
indicates that y does not match the type of x in x * y. This can be fixed by casting to tf.float32.
The problem arises in this line in dice_soft:
inse = tf.reduce_sum(y_pred * y_true, axis=axis)
So one solution is to use tf.cast to cast y_true to the same type as y_pred.

Tensorflow multi-label with NCE or sampled softmax

Are there any code examples for using Tensorflow's sampled_softmax_loss or nce_loss functions with multi-label problems? That is, where num_true is more than one?
What follows is my attempt to create a wrapper for nce_loss() and sampled_softmax_loss() based Jeff Chao's work (https://github.com/joelthchao/keras). In the following code, if you change num_true to 1, both samplers work. But with num_true > 1, both samplers throw slightly different exceptions involving tensor shape.
The main program is a simple autoencoder that replicates the class of problem I'm trying to solve: multi-label testing with a huge number of output classes, with a Zipfian distribution. Comments and stack trace at the end.
import tensorflow as tf
import numpy as np
import keras.layers as layers
from keras.models import Model
from keras import backend as K
from keras import initializers,regularizers,constraints
from keras.models import Model
from keras.layers import Dense
from keras.engine.base_layer import InputSpec
from keras.engine.topology import Layer
from keras.engine.input_layer import Input
from tensorflow.keras.optimizers import Nadam, Adam
np.random.seed(10)
import random
def nce_loss_function(weights, biases, labels, inputs, num_sampled, num_classes, num_true):
if K.learning_phase() == 1:
loss = tf.nn.nce_loss(weights, biases, labels, inputs, num_sampled, num_classes, num_true,
partition_strategy="div")
else:
logits = tf.matmul(inputs, tf.transpose(weights))
logits = tf.nn.bias_add(logits, biases)
labels_one_hot = tf.one_hot(labels, num_classes)
loss = tf.nn.sigmoid_cross_entropy_with_logits(
labels=labels_one_hot[:][0][:],
logits=logits)
loss = tf.reduce_sum(loss, axis=1)
return loss
def sampled_softmax_loss_function(weights, biases, labels, inputs, num_sampled, num_classes, num_true):
if K.learning_phase() == 1:
return tf.nn.sampled_softmax_loss(weights, biases, labels, inputs, num_sampled, num_classes, num_true,
partition_strategy="div")
else:
logits = tf.matmul(inputs, tf.transpose(weights))
logits = tf.nn.bias_add(logits, biases)
labels_one_hot = tf.one_hot(labels, num_classes)
loss = tf.nn.softmax_cross_entropy_with_logits_v2(
labels=labels_one_hot,
logits=logits)
return loss
class Sampling(Layer):
"""Regular densely-connected NN layer with various sampling Loss.
`Sampling` implements the operation:
`output = dot(input, kernel) + bias`
`kernel` is a weights matrix created by the layer, and `bias` is a bias vector
created by the layer. Also, it adds a sampling Loss to the model.
See [reference](http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf).
# Example
```python
inputs = Input(shape=(4,))
target = Input(shape=(1,)) # sparse format, e.g. [1, 3, 2, 6, ...]
net = Dense(8)(inputs)
net = Sampling(units=128, num_sampled=32)([net, target])
model = Model(inputs=[inputs, target], outputs=net)
model.compile(optimizer='adam', loss=None)
x = np.random.rand(1000, 4)
y = np.random.randint(128, size=1000)
model.fit([x, y], None)
```
# Arguments
units: Positive integer, dimensionality of the output space (num classes).
num_sampled: Positive integer, number of classes to sample in Sampling Loss.
type: 'sampled_softmax', 'nce'
num_true: Max # of positive classes, pad to this for variable inputs
kernel_initializer: Initializer for the `kernel` weights matrix
(see [initializers](../initializers.md)).
bias_initializer: Initializer for the bias vector
(see [initializers](../initializers.md)).
kernel_regularizer: Regularizer function applied to
the `kernel` weights matrix
(see [regularizer](../regularizers.md)).
bias_regularizer: Regularizer function applied to the bias vector
(see [regularizer](../regularizers.md)).
activity_regularizer: Regularizer function applied to
the output of the layer (its "activation").
(see [regularizer](../regularizers.md)).
kernel_constraint: Constraint function applied to
the `kernel` weights matrix
(see [constraints](../constraints.md)).
bias_constraint: Constraint function applied to the bias vector
(see [constraints](../constraints.md)).
# Input shape
Two tensors. First one is 2D tensor with shape: `(batch_size, input_dim)`.
Second one is 1D tensor with length `batch_size`
# Output shape
2D tensor with shape: `(batch_size, units)`.
For instance, for a 2D input with shape `(batch_size, input_dim)`,
the output would have shape `(batch_size, units)`.
"""
def __init__(self,
units,
num_sampled,
type='sampled_softmax',
num_true=1,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(Sampling, self).__init__(**kwargs)
self.units = units
self.num_sampled = num_sampled
if self.num_sampled > self.units:
raise Exception('num_sample: {} cannot be greater than units: {}'.format(
num_sampled, units))
self.type = type
if not (self.type == 'nce' or self.type == 'sampled_softmax'):
raise Exception('type {} is not a valid sampling loss type'.format(type))
self.num_true = num_true
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = [InputSpec(min_ndim=2), InputSpec(min_ndim=1)]
self.supports_masking = True
def build(self, input_shape):
assert len(input_shape) == 2
input_dim = input_shape[0][-1]
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
self.input_spec[0] = InputSpec(min_ndim=2, axes={-1: input_dim})
self.built = True
def call(self, inputs):
pred, target = inputs
output = K.dot(pred, self.kernel)
output = K.bias_add(output, self.bias, data_format='channels_last')
# TODO : check train or test mode
if self.type == 'nce':
nce_loss = nce_loss_function(
K.transpose(self.kernel), self.bias, target, pred, self.num_sampled, self.units, self.num_true)
self.add_loss(K.mean(nce_loss))
else:
sampled_softmax_loss = sampled_softmax_loss_function(
K.transpose(self.kernel), self.bias, target, pred, self.num_sampled, self.units, self.num_true)
self.add_loss(K.mean(sampled_softmax_loss))
return output
def compute_output_shape(self, input_shape):
assert input_shape and len(input_shape) == 2
assert input_shape[0][-1]
output_shape = list(input_shape[0])
output_shape[-1] = self.units
return tuple(output_shape)
def get_config(self):
config = {
'units': self.units,
'num_sampled': self.num_sampled,
'kernel_initializer': initializers.serialize(self.kernel_initializer),
'bias_initializer': initializers.serialize(self.bias_initializer),
'kernel_regularizer': regularizers.serialize(self.kernel_regularizer),
'bias_regularizer': regularizers.serialize(self.bias_regularizer),
'activity_regularizer': regularizers.serialize(self.activity_regularizer),
'kernel_constraint': constraints.serialize(self.kernel_constraint),
'bias_constraint': constraints.serialize(self.bias_constraint)
}
base_config = super(Sampling, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def fill_zipf(length, num_classes, num_true=1):
data_onehot = np.zeros((length, num_classes), dtype='float32')
data_labels = np.zeros((length, num_true), dtype='int32')
# all indexes outside of num_classes scattered in existing space
rand = np.random.zipf(1.3, length * num_true) % num_classes
for i in range(length):
for j in range(num_true):
k = rand[i]
data_onehot[i][k] = 1.0
data_labels[i][j] = k
return data_onehot, data_labels
# number of test samples
num_train = 32*500
num_test = 32*500
num_valid = 100
num_epochs = 5
num_hidden = 10
# number of classes
num_classes = 2000
# number of samples for NCE
num_sampled = 24
# number of labels
num_true = 1
# type of negative sampler
sampler_type='sampled_softmax'
inputs = Input(shape=(num_classes,))
target = Input(shape=(num_true,), dtype=tf.int32) # sparse format, e.g. [1, 3, 2, 6, ...]
net = Dense(num_classes)(inputs)
net = Dense(num_hidden, activation='relu')(net)
net = Sampling(units=num_classes, num_sampled=num_sampled, type=sampler_type)([net, target])
model = Model(inputs=[inputs, target], outputs=net)
model.compile(optimizer='adam', loss=None, metrics=['binary_crossentropy'])
model.summary()
train_input, train_output = fill_zipf(num_train, num_classes, num_true)
valid_input, valid_output = fill_zipf(num_valid, num_classes, num_true)
history = model.fit([train_input, train_output], None,
validation_data=([valid_input, valid_output], None),
epochs=num_epochs, verbose=2)
test_input, test_output = fill_zipf(num_test, num_classes, num_true)
predicts = model.predict([test_input, test_output], batch_size=32)
count = 0
for test in range(num_test):
pred = predicts[test]
imax = np.argmax(pred)
if imax == test_output[test]:
count += 1
print("Found {0} out of {1}".format(count/num_true, num_test))
This test works for the single-label case, both 'nce' and 'sampled_softmax'. But, when I set num_true to greater than one, both NCE and Sampled Softmax throw a tensor mismatch exception.
num_true=3
width=2000
sampler_type='sampled_softmax'
With these parameters, for Sampled Softmax, the code throws this exception trace:
File "postable_sampling_tests.py", line 220, in <module>
epochs=num_epochs, verbose=2)
File "/opt/ds/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/opt/ds/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/opt/ds/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/opt/ds/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/opt/ds/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1399, in __call__
run_metadata_ptr)
File "/opt/ds/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 526, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must be broadcastable: logits_size=[32,2000] labels_size=[96,2000]
[[{{node sampling_1/softmax_cross_entropy_with_logits}} = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _class=["loc:#train...s_grad/mul"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](sampling_1/BiasAdd_1, sampling_1/softmax_cross_entropy_with_logits/Reshape_1)]]
32 is the batch_size. Clearly, something is num_true * batch_size but I don't know how to fix this.
If we change the sampler to NCE:
num_true=3
width=2000
sampler_type='nce'
The final two lines of the exception stack:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [32,2000] vs. [3,2000]
[[{{node sampling_1/logistic_loss/mul}} = Mul[T=DT_FLOAT, _class=["loc:#training/Adam/gradients/sampling_1/logistic_loss/mul_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](sampling_1/BiasAdd_1, sampling_1/strided_slice_2)]]
In this case, the labels have not been multiplied by batch_size.
What am I doing wrong? How can I get this wrapper system working for multi-label cases?
You can also use samples softmax with multiple labels, you just have to take the mean of each samples softmax
embeddings = tf.get_variable( 'embeddings',
initializer= tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.get_variable( 'softmax_weights',
initializer= tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.get_variable('softmax_biases',
initializer= tf.zeros([vocabulary_size]), trainable=False )
embed = tf.nn.embedding_lookup(embeddings, train_dataset) #train data set is
embed_reshaped = tf.reshape( embed, [batch_size*num_inputs, embedding_size] )
segments= np.arange(batch_size).repeat(num_inputs)
averaged_embeds = tf.segment_mean(embed_reshaped, segments, name=None)
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=averaged_embeds,
labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))
optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss) #Original learning rate was 1.0
from
https://github.com/Santosh-Gupta/Research2Vec/blob/master/Research2VecTraining2.ipynb

Tensorflow won't matmul inputs and weights. "Dimensions must be equal"

I've been working on a simple tensor flow neural network. My input placeholder is
x = tf.placeholder(tf.float32, shape=[None, 52000, 3]).
My weight matrix is initialized to all zeros as
W = tf.Variable(tf.zeros([52000, 10])).
I tried different combinations with and without the 3 for color channels, but I guess I'm just not understanding the dimensionality because I got the error:
Traceback (most recent call last): File
"C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\common_shapes.py",
line 686, in _call_cpp_shape_fn_impl
input_tensors_as_shapes, status) File "C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py",
line 473, in exit
c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape
must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input
shapes: [?,52000,3], [52000,10].
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "rating.py", line 65, in
y = tf.matmul(x, W) + b # "fake" outputs to train/test File "C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\math_ops.py",
line 1891, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File
"C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\gen_math_ops.py",
line 2436, in _mat_mul
name=name) File "C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py",
line 787, in _apply_op_helper
op_def=op_def) File "C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py",
line 2958, in create_op
set_shapes_for_outputs(ret) File "C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py",
line 2209, in set_shapes_for_outputs
shapes = shape_func(op) File "C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py",
line 2159, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True) File "C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\common_shapes.py",
line 627, in call_cpp_shape_fn
require_shape_fn) File "C:\Users\Everybody\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\common_shapes.py",
line 691, in _call_cpp_shape_fn_impl
raise ValueError(err.message) ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [?,52000,3],
[52000,10].
At first, I thought my next_batch() function was the culprit because I had to make my own due to the fact that I uploaded my images "manually" using scipy.misc.imread(), whose definition reads:
q = 0
def next_batch(batch_size):
x = images[q:q + batch_size]
y = one_hots[q:q + batch_size]
q = (q + batch_size) % len(images)
return x, y
However, after looking through, I don't see what's wrong with this, so I imagine that I'm just confused about dimensionality. It is supposed to be a "flattened" 200x260 color image. It just occurred to me now that maybe I have to flatten the color channels as well? I will place my full code below if curious. I'm a bit new to Tensorflow, so thanks, all. (Yes, it is not a CNN yet, I decided to start simple just to make sure I'm importing my dataset right. And, I know it is tiny, I'm starting my dataset small too.)
############# IMPORT DEPENDENCIES ####################################
import tensorflow as tf
sess = tf.InteractiveSession() #start session
import scipy.misc
import numpy as np
######################################################################
#SET UP DATA #########################################################
images = []
one_hots = []
########### IMAGES ##################################################
#put all the images in a list
for i in range(60):
images.append(scipy.misc.imread('./shoes/%s.jpg' % str(i+1)))
print("One image appended...\n")
#normalize them, "divide" by 255
for image in images:
print("One image normalized...\n")
for i in range(260):
for j in range(200):
for c in range(3):
image[i][j][c]/=255
for image in images:
tf.reshape(image, [52000, 3])
########################################################################
################# ONE-HOT VECTORS ######################################
f = open('rateVectors.txt')
lines = f.readlines()
for i in range(0, 600, 10):
fillerlist = []
for j in range(10):
fillerlist.append(float(lines[i+j][:-1]))
one_hots.append(fillerlist)
print("One one-hot vector added...\n")
########################################################################3
#set placeholders and such for input, output, weights, biases
x = tf.placeholder(tf.float32, shape=[None, 52000, 3])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([52000, 10])) # These are our weights and biases
b = tf.Variable(tf.zeros([10])) # initialized as zeroes.
#########################################################################
sess.run(tf.global_variables_initializer()) #initialize variables in the session
y = tf.matmul(x, W) + b # "fake" outputs to train/test
##################### DEFINING OUR MODEL ####################################
#our loss function
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
#defining our training as gradient descent
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
###################### TRAINING #############################################
#################### OUR CUSTOM BATCH FUNCTION ##############################
q = 0
def next_batch(batch_size):
x = images[q:q + batch_size]
y = one_hots[q:q + batch_size]
q = (q + batch_size) % len(images)
return x, y
#train
for i in range(6):
batch = next_batch(10)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})
print("Batch Number: " + i + "\n")
print("Done training...\n")
################ RESULTS #################################################
#calculating accuracy
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
#print accuracy
print(accuracy.eval(feed_dict={x: images, y_: one_hots}))
Your placeholder should have the dimension [None, 200, 260, 3] where None is the batch size, 200, 260 is the image size, and 3 is the channels.
Your weights should be [filter_height, filter_width, num_channels, num_filters]
Your bias should be [num_filters]
And the dimensions for the labels should be [None, num_classes] where None is the batch size, and num_classes is the number of classes that your images have.
These are just to make sure that math works.
I took these codes from here

Word2Vec Tutorial: Tensorflow TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'

Version of Tensorflow: 1.2.1
Version of Python: 3.5
Operating System: Windows 10
Another poster has asked about this same problem on StackOverflow here, and he appears to be using code from the same Udacity Word2Vec tutorial. So, maybe I'm dense, but the code of this example is so busy and complex that I can't tell what fixed his problem.
The error occurs when I call tf.reduce_means:
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, embed,
train_labels, num_sampled, vocabulary_size))
Right before the call to tf.reduce_mean the key variables have the following data types.
train_dataset.dtype
>> tf.int32
train_labels.dtype
>> tf.int32
valid_dataset.dtype
>> tf.int32
embeddings.dtype
>> tf.float32_ref
softmax_weights.dtype
>> tf.float32_ref
softmax_biases.dtype
>> tf.float32_ref
embed.dtype
>> tf.float32
I tried every permutation of data type in the definitions of the variables train_dataset.dtype, train_labels.dtype and valid_dataset.dtype: making them all int64, all float32, all float64, and combinations of integer and floating point. Nothing worked. I didn't try altering the data types of softmax_weight and softmax_biases, because I'm afraid that might foul up the optimization algorithm. Don't these need to be floats to support the calculus that is done during backpropagation? (Tensorflow is often a very opaque black box with documentation that verges on completely useless, so I can suspect things but never know for sure.)
Program Flow at Time of Error:
After the call to reduce_mean program control transfers to sampled_softmax_loss() in file nn_impl.py which in turn calls _compute_sampled_logits():
logits, labels = _compute_sampled_logits(
weights=weights,
biases=biases,
labels=labels,
inputs=inputs,
num_sampled=num_sampled,
num_classes=num_classes,
num_true=num_true,
sampled_values=sampled_values,
subtract_log_q=True,
remove_accidental_hits=remove_accidental_hits,
partition_strategy=partition_strategy,
name=name)
At this point I check the data types of the passed-in parameters and get the following:
weights.dtype
>> tf.float32_ref
biases.dtype
>> tf.float32_ref
labels.dtype
>> tf.float32
inputs.dtype
>> tf.int32
On the very next step an exception occurs, and I am thrown into the StreamWrapper class in file ansitowin32.py. Running to the end, I get the following Traceback:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\Anaconda3\envs\aind-dog\lib\site-packages\tensorflow\python\framework\op_def_library.py in apply_op(self, op_type_name, name, **keywords)
489 as_ref=input_arg.is_ref,
--> 490 preferred_dtype=default_dtype)
491 except TypeError as err:
C:\Anaconda3\envs\aind-dog\lib\site-packages\tensorflow\python\framework\ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
740 if ret is None:
--> 741 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
742
C:\Anaconda3\envs\aind-dog\lib\site-packages\tensorflow\python\framework\ops.py in _TensorTensorConversionFunction(t, dtype, name, as_ref)
613 "Tensor conversion requested dtype %s for Tensor with dtype %s: %r"
--> 614 % (dtype.name, t.dtype.name, str(t)))
615 return t
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("sampled_softmax_loss/Reshape_1:0", shape=(?, 1, ?), dtype=float32, device=/device:CPU:0)'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-7-66d378b94a16> in <module>()
34 loss = tf.reduce_mean(
35 tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, embed,
---> 36 train_labels, num_sampled, vocabulary_size))
37
38 # Optimizer.
C:\Anaconda3\envs\aind-dog\lib\site-packages\tensorflow\python\ops\nn_impl.py in sampled_softmax_loss(weights, biases, labels, inputs, num_sampled, num_classes, num_true, sampled_values, remove_accidental_hits, partition_strategy, name)
1266 remove_accidental_hits=remove_accidental_hits,
1267 partition_strategy=partition_strategy,
-> 1268 name=name)
1269 sampled_losses = nn_ops.softmax_cross_entropy_with_logits(labels=labels,
1270 logits=logits)
C:\Anaconda3\envs\aind-dog\lib\site-packages\tensorflow\python\ops\nn_impl.py in _compute_sampled_logits(weights, biases, labels, inputs, num_sampled, num_classes, num_true, sampled_values, subtract_log_q, remove_accidental_hits, partition_strategy, name)
1005 row_wise_dots = math_ops.multiply(
1006 array_ops.expand_dims(inputs, 1),
-> 1007 array_ops.reshape(true_w, new_true_w_shape))
1008 # We want the row-wise dot plus biases which yields a
1009 # [batch_size, num_true] tensor of true_logits.
C:\Anaconda3\envs\aind-dog\lib\site-packages\tensorflow\python\ops\math_ops.py in multiply(x, y, name)
284
285 def multiply(x, y, name=None):
--> 286 return gen_math_ops._mul(x, y, name)
287
288
C:\Anaconda3\envs\aind-dog\lib\site-packages\tensorflow\python\ops\gen_math_ops.py in _mul(x, y, name)
1375 A `Tensor`. Has the same type as `x`.
1376 """
-> 1377 result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
1378 return result
1379
C:\Anaconda3\envs\aind-dog\lib\site-packages\tensorflow\python\framework\op_def_library.py in apply_op(self, op_type_name, name, **keywords)
524 "%s type %s of argument '%s'." %
525 (prefix, dtypes.as_dtype(attrs[input_arg.type_attr]).name,
--> 526 inferred_from[input_arg.type_attr]))
527
528 types = [values.dtype]
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'.
Here's the complete program:
# These are all the modules we'll be using later.
# Make sure you can import them before proceeding further.
# %matplotlib inline
from __future__ import print_function
import collections
import math
import numpy as np
import os
import random
import tensorflow as tf
import zipfile
from matplotlib import pylab
from six.moves import range
from six.moves.urllib.request import urlretrieve
from sklearn.manifold import TSNE
print("Working directory = %s\n" % os.getcwd())
def read_data(filename):
"""Extract the first file enclosed in a zip file as a list of words"""
with zipfile.ZipFile(filename) as f:
data = tf.compat.as_str(f.read(f.namelist()[0])).split()
return data
filename = 'text8.zip'
words = read_data(filename)
print('Data size %d' % len(words))
vocabulary_size = 50000
def build_dataset(words):
count = [['UNK', -1]]
count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
dictionary = dict()
# Loop through the keys of the count collection dictionary
# (apparently, zeroing out counts)
for word, _ in count:
dictionary[word] = len(dictionary)
data = list()
unk_count = 0 # count of unknown words
for word in words:
if word in dictionary:
index = dictionary[word]
else:
index = 0 # dictionary['UNK']
unk_count = unk_count + 1
data.append(index)
count[0][1] = unk_count
reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
return data, count, dictionary, reverse_dictionary
data, count, dictionary, reverse_dictionary = build_dataset(words)
print('Most common words (+UNK)', count[:5])
print('Sample data', data[:10])
del words # Hint to reduce memory.
data_index = 0
def generate_batch(batch_size, num_skips, skip_window):
global data_index
assert batch_size % num_skips == 0
assert num_skips <= 2 * skip_window
batch = np.ndarray(shape=(batch_size), dtype=np.int32)
labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
span = 2 * skip_window + 1 # [ skip_window target skip_window ]
buffer = collections.deque(maxlen=span)
for _ in range(span):
buffer.append(data[data_index])
data_index = (data_index + 1) % len(data)
for i in range(batch_size // num_skips):
target = skip_window # target label at the center of the buffer
targets_to_avoid = [ skip_window ]
for j in range(num_skips):
while target in targets_to_avoid:
target = random.randint(0, span - 1)
targets_to_avoid.append(target)
batch[i * num_skips + j] = buffer[skip_window]
labels[i * num_skips + j, 0] = buffer[target]
buffer.append(data[data_index])
data_index = (data_index + 1) % len(data)
return batch, labels
print('data:', [reverse_dictionary[di] for di in data[:8]])
for num_skips, skip_window in [(2, 1), (4, 2)]:
data_index = 0
batch, labels = generate_batch(batch_size=8, num_skips=num_skips, skip_window=skip_window)
print('\nwith num_skips = %d and skip_window = %d:' % (num_skips, skip_window))
print(' batch:', [reverse_dictionary[bi] for bi in batch])
print(' labels:', [reverse_dictionary[li] for li in labels.reshape(8)])
batch_size = 128
embedding_size = 128 # Dimension of the embedding vector.
skip_window = 1 # How many words to consider left and right.
num_skips = 2 # How many times to reuse an input to generate a label.
# We pick a random validation set to sample nearest neighbors. here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent.
valid_size = 16 # Random set of words to evaluate similarity on.
valid_window = 100 # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))
num_sampled = 64 # Number of negative examples to sample.
graph = tf.Graph()
with graph.as_default(), tf.device('/cpu:0'):
# Input data.
train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
# Variables.
embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.Variable(
tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
# Model.
# Look up embeddings for inputs.
embed = tf.nn.embedding_lookup(embeddings, train_dataset)
# Compute the softmax loss, using a sample of the negative labels each time.
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, embed,
train_labels, num_sampled, vocabulary_size))
# Optimizer.
# Note: The optimizer will optimize the softmax_weights AND the embeddings.
# This is because the embeddings are defined as a variable quantity and the
# optimizer's `minimize` method will by default modify all variable quantities
# that contribute to the tensor it is passed.
# See docs on `tf.train.Optimizer.minimize()` for more details.
optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
# Compute the similarity between minibatch examples and all embeddings.
# We use the cosine distance:
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(
normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
I had the same issue and it looks like that two parameters that are passed on to the loss function are swapped around.
If you look at the tensorflow description for 'sample_softmax_loss' (https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss):
sampled_softmax_loss(
weights,
biases,
labels,
inputs,
num_sampled,
num_classes,
num_true=1,
sampled_values=None,
remove_accidental_hits=True,
partition_strategy='mod',
name='sampled_softmax_loss'
)
The third expected parameter is 'labels' and the fourth 'inputs'. In the supplied code, these two parameters seem to have been switched around. I'm a bit puzzled how this is possible. Maybe this used to be different in an older version of TF. Anyway, swapping those two parameters around will solve the problem.

Tensorflow tf.reshape() seems to behave differently to numpy.reshape()

I'm trying to train a LSTM network and it trains successfully in one way, but throws an error in the other way. In the first example I reshape the input array X using numpy reshape and in the other way I reshape it using tensorflow reshape.
Works fine:
import numpy as np
import tensorflow as tf
import tensorflow.contrib.learn as learn
# Parameters
learning_rate = 0.1
training_steps = 3000
batch_size = 128
# Network Parameters
n_input = 4
n_steps = 10
n_hidden = 128
n_classes = 6
X = np.ones([1770,4])
y = np.ones([177])
# NUMPY RESHAPE OUTSIDE RNN_MODEL
X = np.reshape(X, (-1, n_steps, n_input))
def rnn_model(X, y):
# TENSORFLOW RESHAPE INSIDE RNN_MODEL
#X = tf.reshape(X, [-1, n_steps, n_input]) # (batch_size, n_steps, n_input)
# # permute n_steps and batch_size
X = tf.transpose(X, [1, 0, 2])
# # Reshape to prepare input to hidden activation
X = tf.reshape(X, [-1, n_input]) # (n_steps*batch_size, n_input)
# # Split data because rnn cell needs a list of inputs for the RNN inner loop
X = tf.split(0, n_steps, X) # n_steps * (batch_size, n_input)
# Define a GRU cell with tensorflow
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden)
# Get lstm cell output
_, encoding = tf.nn.rnn(lstm_cell, X, dtype=tf.float32)
return learn.models.logistic_regression(encoding, y)
classifier = learn.TensorFlowEstimator(model_fn=rnn_model, n_classes=n_classes,
batch_size=batch_size,
steps=training_steps,
learning_rate=learning_rate)
classifier.fit(X,y)
Does not work:
import numpy as np
import tensorflow as tf
import tensorflow.contrib.learn as learn
# Parameters
learning_rate = 0.1
training_steps = 3000
batch_size = 128
# Network Parameters
n_input = 4
n_steps = 10
n_hidden = 128
n_classes = 6
X = np.ones([1770,4])
y = np.ones([177])
# NUMPY RESHAPE OUTSIDE RNN_MODEL
#X = np.reshape(X, (-1, n_steps, n_input))
def rnn_model(X, y):
# TENSORFLOW RESHAPE INSIDE RNN_MODEL
X = tf.reshape(X, [-1, n_steps, n_input]) # (batch_size, n_steps, n_input)
# # permute n_steps and batch_size
X = tf.transpose(X, [1, 0, 2])
# # Reshape to prepare input to hidden activation
X = tf.reshape(X, [-1, n_input]) # (n_steps*batch_size, n_input)
# # Split data because rnn cell needs a list of inputs for the RNN inner loop
X = tf.split(0, n_steps, X) # n_steps * (batch_size, n_input)
# Define a GRU cell with tensorflow
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden)
# Get lstm cell output
_, encoding = tf.nn.rnn(lstm_cell, X, dtype=tf.float32)
return learn.models.logistic_regression(encoding, y)
classifier = learn.TensorFlowEstimator(model_fn=rnn_model, n_classes=n_classes,
batch_size=batch_size,
steps=training_steps,
learning_rate=learning_rate)
classifier.fit(X,y)
The latter throws the following error:
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell.BasicLSTMCell object at 0x7f1c67c6f750>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.
Traceback (most recent call last):
File "/home/blabla/test.py", line 47, in <module>
classifier.fit(X,y)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/base.py", line 160, in fit
monitors=monitors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 484, in _train_model
monitors=monitors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/graph_actions.py", line 328, in train
reraise(*excinfo)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/graph_actions.py", line 254, in train
feed_dict = feed_fn() if feed_fn is not None else None
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.py", line 366, in _feed_dict_fn
out.itemset((i, self.y[sample]), 1.0)
IndexError: index 974 is out of bounds for axis 0 with size 177
A couple of suggestions:
* use input_fn instead of X, Y to the fit
* use learn.Estimator instead of learn.TensorFlowEstimator
since you have small data, following should work. Otherwise you need to batch your data.
```
def _my_inputs():
return tf.constant(np.ones([1770,4])), tf.constant(np.ones([177]))
I was able to get this working with a couple small changes:
# Parameters
learning_rate = 0.1
training_steps = 10
batch_size = 8
# Network Parameters
n_input = 4
n_steps = 10
n_hidden = 128
n_classes = 6
X = np.ones([177, 10, 4]) # <---- Use shape [batch_size, n_steps, n_input] here.
y = np.ones([177])
def rnn_model(X, y):
X = tf.transpose(X, [1, 0, 2]) #|
X = tf.unpack(X) #| These two lines do the same thing as your code, just a bit simpler ;)
# Define a LSTM cell with tensorflow
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden)
# Get lstm cell output
outputs, _ = tf.nn.rnn(lstm_cell, X, dtype=tf.float64) # <---- I think you want to use the first return value here.
return tf.contrib.learn.models.logistic_regression(outputs[-1], y) # <----uses just the last output for classification, as is typical with RNNs.
classifier = tf.contrib.learn.TensorFlowEstimator(model_fn=rnn_model,
n_classes=n_classes,
batch_size=batch_size,
steps=training_steps,
learning_rate=learning_rate)
classifier.fit(X,y)
I think the central problem you were having was that X has to be shape [batch,...] when passed to fit(...). When you used numpy to reshape it outside the rnn_model() function, X had this shape so training worked.
I can't speak for the quality of the model this solution will produce, but at least it runs!