Tensorflow dataset with vectors of differing shapes - tensorflow

I am trying to create a dataset from vectors which can have differing lengths (the data column). I am currently using the following code:
import tensorflow as tf
data = [[1,2,3,4,5,6],[7,8,9,10]]
shapes = [[3,2],[2,2]]
classes = [0,1]
dataset = tf.data.Dataset.from_tensor_slices(
{"data": tf.constant(data),
"shape": tf.constant(shapes),
"class": tf.constant(classes)})
iterator = dataset.make_one_shot_iterator().get_next()
with tf.Session() as sess:
x = sess.run(dataset)
print(x)
However, I get this error:
Traceback (most recent call last):
File "test2.py", line 7, in <module>
{"data": tf.constant(data),
File "/Users/[username]/Documents/University/Project/Application/env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 214, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/Users/[username]/Documents/University/Project/Application/env/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 442, in make_tensor_proto
_GetDenseDimensions(values)))
ValueError: Argument must be a dense tensor: [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10]] - got shape [2], but wanted [2, 6].
What is the correct method to set up a dataset which can accept vectors of different lengths? This question addresses the issue when reading from a file, however, I am defining the data explicitly.

You either pad the tensors yourself or you use sparse tensors.
I usually use sparse tensors. When you convert the sparse tensors to dense you can specify what the size should be and have the padding done for you.
The usual case for such tensors are either input strings, bags of words or sequences. The embedding operations handle strings and bags of words. The sequences are usually handled with rnn related operations (check out tf.nn.static_rnn for example)
In general you want the tensors to eventually have the same length in the same batch because the matrix operations need to have matrix operands.

Related

Why model.fit() method of keras do not accept any tensor as feature or label argument, on the other hand it accepts numpy arrays

Last time when I was training a dnn model I noticed that When I try to train my model with tensor (dtype = float64) it always gives error but when I train the model with numpy array with same specs(shape, values, dtype) as tensor it shows no error. Why is it so
Code
For feature and labels as tensor replace numpy.arrys in 2nd script with:
celsius_q = tf.Variable([-40, -10, 0, 8, 15, 22, 38], tf.float64)
fahrenheit_a = tf.Variable([-40, 14, 32, 46, 59, 72, 100], tf.float64)
When using feature and label as tensor it shows this error:
Error: ValueError: Failed to find data adapter that can handle input:
<class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable'>,
<class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable'>
Use tf.constant for creating an input tensor in Tensorflow.
tf.Variable can be changed later so this type of tensor is not good for model input. Please refer to this answer https://stackoverflow.com/a/44746203/20388268

1 dimenstional convolution error in using tensorflow

I am studying 1d convolution using tensorflow.
Code:
import numpy as np
import tensorflow as tf
\#####raw data, input length is 24, and feature_len is 6
batch = np.ceil((np.random.rand(24, 6)*10))-5
\#####filter for convoltion, filter width is 3, filter input dim is 6, output dim is 18
eye_filter = tf.constant(np.eye(3*6).reshape(3,6,18).reshape(3,6,18))
\#####here error happened
conv = tf.nn.conv1d(input=batch, filters=eye_filter, stride=1, padding='SAME')
Error Message:
InvalidArgumentError Traceback (most recent call
last)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py
in _create_c_op(graph, node_def, inputs, control_inputs) 1606
try:
-> 1607 c_op = c_api.TF_FinishOperation(op_desc) 1608 except errors.InvalidArgumentError as e:
InvalidArgumentError: Shape must be rank 4 but is rank 3 for
'conv1d_1' (op: 'Conv2D') with input shapes: [24,1,6], [1,3,6,18].
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call
last) 10 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py
in _create_c_op(graph, node_def, inputs, control_inputs) 1608
except errors.InvalidArgumentError as e: 1609 # Convert to
ValueError for backwards compatibility.
-> 1610 raise ValueError(str(e)) 1611 1612 return c_op
ValueError: Shape must be rank 4 but is rank 3 for 'conv1d_1' (op:
'Conv2D') with input shapes: [24,1,6], [1,3,6,18].
Why is filter rank 4, when I reshaped it to 3?
Why is op name is Conv2D, when I did conv1d?
How can I see the convolution result of above two tensor(raw data and filter)?
It's expecting your input tensor to be "Rank 4" meaning it has 4 dimensions, but you've technically given a 2d array.
Technically, Conv1d uses Conv2d as you noticed, according to this API documentation:
conv1d api doc
Your array for input data has a length of 24, and 6 channels for the features.
The TF convolution functions can operate on an array of inputs.
This means your data also has to have an index for which element out of the batch of inputs you want to select. I'm guessing from your example that you want to pass it just one input. To fix this, you need to reshape your tensor to have this extra dimension, but be length 1.
Really, conv1d only needs your input to be rank 3, but it transparently inserts a new dimension of length 1 so it's 2d (Imagine a monitor with resolution 1920x1. Technically 2d, but only 1 pixel high). Then it passes that to conv2d
Instead of keeping the data as a np array, use this function and then reshape it to be [Nth item (length 1)][Width (length 24)][Channel (length 6)]
Here's how I would rewrite your code:
import numpy as np
import tensorflow as tf
#####raw data, input length is 24, and feature_len is 6
batch = np.ceil((np.random.rand(24, 6)*10))-5
batch = tf.convert_to_tensor(batch, dtype=int32)
batch = tf.reshape(batch, shape=[1, 24, 6], dtype=int32)
#####filter for convoltion, filter width is 3, filter input dim is 6, output dim is 18
eye_filter = tf.constant(np.eye(3*6).reshape(3,6,18).reshape(3,6,18))
#####here error happened
# I added the optional data_format parameter
conv = tf.nn.conv1d(input=batch, data_format='NWC', filters=eye_filter, stride=1, padding='SAME')
I chose that specific shape ordering from the conv1d api doc about the data_format parameter having a default of "NWC" or Nth_item Width Channels. In conv2d, it has "NCHW" or similar. I would make sure you understand how that works so in the future you don't get weird results from an array that's shaped a way you didn't expect.
If you want to see the tensor output, you need to either make a graph and run it in a session. Or you can turn on eager execution.
sess = tf.Session()
print(sess.run(conv))
sess.close()
eager execution
Generally, you would use a session for speed with large computations, and use eager execution for debugging, learning, or verifying data is getting imported correctly.

TensorFlow in_top_k evaluation input argumants

I am following the tutorial in this link and trying to change the evaluation method for the model (at the bottom). I would like to get a top-5 evaluation and I'm trying to use to following code:
topFiver=tf.nn.in_top_k(y, y_, 5, name=None)
However, this yields the following error:
File "AlexNet.py", line 111, in <module>
topFiver = tf.nn.in_top_k(pred, y, 5, name=None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 346, in in_top_k
targets=targets, k=k, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 486, in apply_op
_Attr(op_def, input_arg.type_attr))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 59, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: DataType float32 for attr 'T' not in list of allowed values: int32, int64
As far as I can tell, the problem is that tf.nn.in_top_k() only works for tf.int32 or tf.int64 data, but my data is in tf.float32 format. Is there any workaround for this?
The targets argument to tf.nn.in_top_k(predictions, targets, k) must be a vector of class IDs (i.e. indices of columns in the predictions matrix). This means that it only works for single-class classification problems.
If your problem is a single-class problem, then I assume that your y_ tensor is a one-hot encoding of the true labels for your examples (for example because you also pass them to an op like tf.nn.softmax_cross_entropy_with_logits(). In that case, you have two options:
If the labels were originally stored as integer labels, pass them directly to tf.nn.in_top_k() without converting them to one-hot. (Also, consider using tf.nn.sparse_softmax_cross_entropy_with_logits() as your loss function, because it may be more efficient.)
If the labels were originally stored in the one-hot format, you can convert them to integers using tf.argmax():
labels = tf.argmax(y_, 1)
topFiver = tf.nn.in_top_k(y, labels, 5)

How to use tf.train.batch with enqueue_many=true

I'm looking for an example of using tf.train.batch with enqueue_many=True.
In my case, I have an image tensor of shape [299,299,3] and when I call a function get_distortions(image) it will return a new tensor of shape [10,299,299,3] (in this example, it will apply 10 distortions to the image and return them all as a new tensor). I'd then like to enqueue all these by calling tf.train.batch.
I tried this:
example_batch = tf.train.batch(tf.unpack(distortions), 5, enqueue_many=True)
But when I sess.run(example_batch) I get back a list of length 10 (I was expecting a batch of size 5).
Also, how would I include the label to tf.train.batch in this case? The label is the same for all 10 distortions.
Don't unpack distortions. The semantics of enqueue_many is that you feed it a tensor with first dimension being the batching dimension, so a [10, 299, 299, 3] tensor with enqueue_many will result in ten separate items, each of shape 299, 299, 3 being enqueued -- which is what you want.
Documentation for tf.train.batch tells you:
If enqueue_many is True, tensors is assumed to represent a batch of
examples, where the first dimension is indexed by example, and all
members of tensors should have the same size in the first dimension.
If an input tensor has shape [*, x, y, z], the output will have shape
[batch_size, x, y, z]. The capacity argument controls the how long the
prefetching is allowed to grow the queues.
Which is exactly what happens in your case: [10, 299, 299, 3], where 10 is the batch size. So you do not need to do any unpacking and tf.train.batch(distortions, 5, enqueue_many=True) will do the job.

tensorflow MNIST fully_connected_feed.py fails: range() takes at least 2 arguments (1 given)

I'm having trouble running the example in one of the tensor flow tutorials. The tutorial says to run I just need to type python fully_connected_feed.py. When I do this it gets through fetching the input data, but then fails, like so:
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Traceback (most recent call last):
File "fully_connected_feed.py", line 225, in <module>
tf.app.run()
File "/Users/me/anaconda/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "fully_connected_feed.py", line 221, in main
run_training()
File "fully_connected_feed.py", line 141, in run_training
loss = mnist.loss(logits, labels_placeholder)
File "/Users/me/tftmp/mnist.py", line 96, in loss
indices = tf.expand_dims(tf.range(batch_size), 1)
TypeError: range() takes at least 2 arguments (1 given)
I think this error is caused because there is some problem with session setup and/or tensor evaluation. This is the function in mnist.py causing the problem:
def loss(logits, labels):
"""Calculates the loss from the logits and the labels.
Args:
logits: Logits tensor, float - [batch_size, NUM_CLASSES].
labels: Labels tensor, int32 - [batch_size].
Returns:
loss: Loss tensor of type float.
"""
# Convert from sparse integer labels in the range [0, NUM_CLASSSES)
# to 1-hot dense float vectors (that is we will have batch_size vectors,
# each with NUM_CLASSES values, all of which are 0.0 except there will
# be a 1.0 in the entry corresponding to the label).
batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(batch_size), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(
concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, onehot_labels,
name='xentropy')
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
return loss
If I put all the code in the loss function inside a with tf.Session(): block, it gets past this error. However, I get other errors later about uninitialised variables, so I'm guessing something major is going wrong with session setup or initialisation, or something. Being new to tensor flow I'm a little at a loss. Any ideas?
[NB: I havent edited the code at all, just downloaded from the tensorflow tutorials and tried to run as instructed, with python fully_connected_feed.py]
This issue arises because in the latest version of the TensorFlow source on GitHub, tf.range() has been updated to be more permissive with its arguments (previously it required two arguments; now it has the same semantics as Python's range() built-in function), and the fully_connected_feed.py example has been updated to exploit this.
However, if you try to run this version against the binary distribution of TensorFlow, you will get this error because the change to tf.range() has not been incorporated into the binary package.
The easiest solution is to download the old version of mnist.py. Alternatively, you could build from source to use the latest version of the tutorial.
you can right result fix mnist code like this :
indices = tf.expand_dims(tf.range(0,batch_size),1)
TypeError: range() takes at least 2 arguments (1 given)
That's the error.
Looking at the tensorflow docs for range, we can see that range has a function signature of start, limit, delta=1, name='range'. This means that at least two arguments are required for function invocation. Your example only shows one argument provided.
An example can be found in the docs:
# 'start' is 3
# 'limit' is 18
# 'delta' is 3
tf.range(start, limit, delta) ==> [3, 6, 9, 12, 15]