While reading a tensorflow segmentation, I am trying to figure out how does the following implementation aiming to do?
A x tensor is defined as follows self.x = tf.placeholder("float", shape=[None, None, None, n_label]).
Later, one function tries to invoke a transformed tensor "x1", which is defined as x1=tf.reshape(self.x, [-1, n_label])
My understanding is that tf.reshape(self.x, [-1,n_label])should try to re-shape
x tensor into a 1-D vector.
But I am kind of confusing about the x defined this way as shape=[None, None, None, n_label] and x1 transformed as such. What really should x1 look like and why doing so?
None means we don't want to specify dimension when creating a graph, rather want to determine it in the runtime. For instance, it could be useful when you want to use different minibatch sizes during train and for the inference.
Reshape with -1 for some dimension means just 'preserve the total size of a tensor'. For example, reshape.(x, [-1, 2]) for x of shape [3, 4, 2] would produce a new tensor of shape [12, 2].
Related
I'm trying to build a model which takes list of sparse tensors as input. (list length is equal to batch size)
The reason I use sparse tensor is that I have to pass adjacency matrix to my GNN model and it is very sparse. (~99%)
I'm familiar with using pytorch, and it is very easy to feed sparse tensor into the network.
However I found that I have to use tf.data.Dataset or keras.utils.Sequence for making dataset in tensorflow.
But those methods throw error to me when I use list of sparse tensors as input.
For example, code below makes TypeError
import tensorflow as tf
tf.data.Dataset.from_tensor_slices(sparse_lists)
TypeError: Neither a SparseTensor nor SparseTensorValue:
[<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2e25b5c0>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c22ada0>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c22a400>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed240>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed390>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed470>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed5c0>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed710>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed828>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed940>].
I know that it will work if I concat all sparse tensors in list as a huge tensor.
However it is not my option because I have to use indexing for sparse tensors later.
(If I concat 2D sparse tensors into 3D sparse tensors, I cannot use indexing like below)
Some3DSparseTensor[:10]
Also, it will take more time because I have to slice 3D tensors for matrix multiplication with other dense networks.
Furthermore, I know that it will be fine if I make sparse tensor by indices, values for every batch, but it would take too much time for each batch.
As a result, I want to make tf.data.Dataset to be able to generate batch from list of sparse tensors due to indexing, time issue.
Can anybody help me? :)
Long story short,
What I have: List of sparse tensors (e.g 1000000 length list)
What I need to do: Batch list of sparse tensors (e.g 1024 length list, not a sparse concat)
If the SparseTensors have the same dense_shape you can create a unique SparseTensor instead of a list and pass it to from_tensor_slices.
For example the following code produce separate SparseTensors from a large SparseTensor s splitting them along the first dimension
s = tf.sparse.SparseTensor(
indices=tf.constant([[0, 0, 0], [1, 0, 0], [1, 0, 1], [2, 1, 1]], dtype=tf.int64),
values=tf.range(4, dtype=tf.float32),
dense_shape=(3, 2, 2))
d = tf.data.Dataset.from_tensor_slices(s)
for t in d:
print(t)
>>> SparseTensor(indices=tf.Tensor([[0 0]], shape=(1, 2), dtype=int64), values=tf.Tensor([0.], shape=(1,), dtype=float32), dense_shape=tf.Tensor([2 2], shape=(2,), dtype=int64))
SparseTensor(indices=tf.Tensor(
[[0 0]
[0 1]], shape=(2, 2), dtype=int64), values=tf.Tensor([1. 2.], shape=(2,), dtype=float32), dense_shape=tf.Tensor([2 2], shape=(2,), dtype=int64))
SparseTensor(indices=tf.Tensor([[1 1]], shape=(1, 2), dtype=int64), values=tf.Tensor([3.], shape=(1,), dtype=float32), dense_shape=tf.Tensor([2 2], shape=(2,), dtype=int64))
To use from_tensor_slices in this way, you need a function to convert the list sparse_lists to a large SparseTensor s (reported below).
To recap, you can do
import tensorflow as tf
def sparse_list_to_sparse_tensor(sparse_lists):
n = len(sparse_lists)
shape = sparse_lists[0].dense_shape
out_shape = (n, *shape)
out_values = tf.concat([s.values for s in sparse_lists], axis=0)
out_indices = []
for i, s in enumerate(sparse_lists):
element_idx = tf.cast(tf.fill((s.indices.shape[0], 1), i), dtype=tf.int64)
out_indices.append(tf.concat([element_idx, s.indices], axis=1))
out_indices = tf.concat(out_indices, axis=0)
return tf.sparse.SparseTensor(out_indices, out_values, out_shape)
tf.data.Dataset.from_tensor_slices(sparse_list_to_sparse_tensor(sparse_lists))
An alternative solution uses from_tensor_slices on every sparse tensor (after the addition of a dummy batch dimension) to create many datasets with a single element that can be concatenated in a single dataset.
dataset = None
for sparse_tensor in sparse_list:
batched_sparse_tensor = tf.sparse.expand_dims(sparse_tensor, axis=0)
element_dataset = tf.data.Dataset.from_tensor_slices(batched_sparse_tensor)
if dataset is None:
dataset = element_dataset
else:
dataset = dataset.concatenate(element_dataset)
Notice that using this solution the sparse tensors can have different dense_shapes.
From TF v1.x as shown below, x is an entry with dim [None, 784] to train my example model.
It looks similar to [?, 784] from tensorboard.
For some reason, I have to reshape x to [1, 784] to predict, that is x needs to look like [1, 784] instead of [?, 784] to predict after training the model.
Any suggestions?
with tf.name_scope('Input_Layer'):
x = tf.placeholder("float",shape=[None, 784]
,name="x")
x_image = tf.reshape(x, [-1, 28, 28, 1])
...
The "?" in tensorflow indicates that this dimension is not fixed. So it can vary from call to call. The function predict expects a shaped tensor [n_examples,784] where n_examples is the number of examples.
In your case, since you only need to predict one example, you need to reshape it to [1,784] , i.e., n_examples=1
I was trying to train a sequence-to-sequence LSTM model with a dataset with three labels: [1, 0] for detection of class 1, [0, 1] for detection of class 2, and [0, 0] for detection of nothing. After getting the outputs from the LSTM network, I applied a fully connected layer to each cell's output the following way:
outputs, state = tf.nn.dynamic_rnn(cell, input)
# Shape of outputs is [batch_size, n_time_steps, n_hidden]
# As matmul works only on matrices, reshape to get the
# time dimension into the batch dimension
outputs = tf.reshape(outputs, [-1, n_hidden])
# Shape is [batch_size * n_time_steps, n_hidden]
w = tf.Variable(tf.truncated_normal(shape=[n_hidden, 2], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[2]))
logit = tf.add(tf.matmul(outputs, w), b, name='logit')
# Reshape back to [batch_size, n_time_steps, 2]
logit = tf.reshape(logit, [batch_size, -1, 2])
On the output, I apply tf.nn.sigmoid_cross_entropy_with_logits and reduce the mean. The model seems to work just fine achieving high accuracy and recall, except for the fact that in almost all the cases it outputs either [0, 0], or [1, 1]. The two logit outputs from the fully connected layer always have very similar values (but not the same). This effectively puts a hard-cap on precision of 50%, which the model converges to (but not a fraction of a percent above).
Now, my intuition would tell me that something must be wrong with the training step and both fully connected outputs are trained on the same data, but curiously enough when I replace my own implementation with the prepackaged one from tf.contrib:
outputs, state = tf.nn.dynamic_rnn(cell, input)
logit = tf.contrib.layers.fully_connected(outputs, 2, activation_fn=None)
without changing a single other thing, the model starts training properly. Now, the obvious solution would be to just use that implementation, but why doesn't the first one work?
I'm trying to unstack a Tensor because I need a sequence as input for the RNN. I am using variable sequence lengths which prevents me from correctly using tf.unstack.
def MapToSequences(x):
# x.get_shape().as_list() = [64, 1, None, 512]
x = tf.squeeze(x)
# tf.shape(x) = [None, None, None], at runtime would be [64, seqlen, 512]
x = tf.transpose(x, perm=[1, 0, 2])
# [seqlen, 64, 512]
# Here I'd like to unstack with seqlen as num
x = tf.unstack(x) # Cannot infer num from shape (?, ?, ?)
return x
I tried using tf.shape(x) to infer the seqlen and use it as num, but I get Expected int for argument 'num' not <tf.Tensor 'strided_slice:0' shape=() dtype=int32>
I believe this may be answered elsewhere, but here's an answer here. You cannot use tf.unstack with non-inferrable dimensions.
This is because of how tensorflow is designed with computation graphs defining transformations of Tensors. Each operation adds a node, and each Tensor is an edge between Nodes. When you tf.unstack a Tensor you generate multiple new Tensors (edges). If the number of new tensors created from a tf.unstack operation is undefined then the computation graph has an undefined number of edges which must not be.
Operations that don't add multiple new edges to the graph are allowed to have input Tensors with inferred dimensions (most operations).
To get around this one has two choices useful for the case of batched operations, i.e. in the case when you are trying to tf.unstack a Tensor with dimensions (batch_size, ...) and batch_size is inferrable.
Choice 1
I would use the batch_shape argument to keras.topology.Input.
The weight Tensors produced will always be interchangable with another model generated with different batch_size.
Unless you need access to the computation graph with that non-inferrable dimension there is no reason why you should not that this route.
Choice 2
A second option, in the case when you know a maximal batch_size, is to use tf.dynamic_partition.
tensor = tf.placeholder(tf.float32,shape=(None,10))
partitions = tf.range(max_batch_size)
num_partitions = max_batch_size
partitioned = tf.dynamic_partition(tensor, partitions, num_partitions, name='dynamic_unstack')
When you actually give a batch_size it will produce unstacked Tesors for the first batch_size indices, and [] empty Tensors for the rest.
I have used the model described here on the 0.6.0 branch. The code can be found here. I have done some minor changes to the linked code.
In my code I create two models, one for training and one for validation, very similar as it is done in the Tensorflow Tutorial.
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')
The first model, the one for training, seems to be created just fine, but the second, used for validation, does not. The output gets a None dimension! The row I'm refering to is on row 134 in the linked code:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
I've added these lines right after the reshape of the output:
output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])
and that gives me this:
Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)
This problem only happens with the 'validation model', not with the 'training model'. For the 'training model' I get expected output:
Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)
(Note that with the 'validation model' I use a batch_size=1 instead of batch_size=2 that I use for the training model)
From what I understand, using -1 as input to the reshape function, will figure the output shape out automagically! But then why do I get None? Nothing in my config fed to the model has a None value.
Thank you for all the help and tips!
TL;DR: A dimension being None simply means that shape inference could not determine an exact shape for the output tensor, at graph-building time. When you run the graph, the tensor will have the appropriate run-time shape.
If you're not interested in how shape inference works, you can stop reading now.
Shape inference applies local rules, based on a "shape function" that takes the shapes of the inputs to an operation and computes (possibly incomplete) shapes for the outputs of an operation. To figure out why tf.reshape() gives an incomplete shape, we have to look at its inputs, and work backwards:
The shape argument to tf.reshape() includes a [-1], which means "figure the output shape automagically" based on the shape of the tensor input.
The tensor input is the output of tf.concat() on the same line.
The inputs to tf.concat() are computed by a tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() op.
The tf.tanh() op produces an output of size [?, hidden_size], and the tf.sigmoid() op produces an output of size [batch_size, hidden_size].
The tf.mul() op performs NumPy-style broadcasting. A dimension will only be broadcast if it has size 1. Consider three cases where we compute tf.mul(x, y):
If x has shape [1, 10], and y has shape [5, 10], then broadcasting will happen, and the output shape will be [5, 10].
If x has shape [1, 10], and y has shape [1, 10], then there will be no broadcasting, and the output shape will be [1, 10].
However, if x has shape [1, 10], and y has shape [?, 10], there is insufficient static information to tell whether broadcasting will happen (even though we happen to know that case 2 applies at runtime).
Therefore, when batch_size is 1, the tf.mul() op produces an output with the shape [?, hidden_size]; but when batch_size is greater than 1, the output shape is [batch_size, hidden_size].
Where shape inference breaks down, it can be appropriate to use the Tensor.set_shape() method to add information. This would potentially be useful in the BasicLSTMCell implementation, where we know more than it is possible to infer about the shapes of the outputs.