How to add elements of tensor as scalar summaries in Tensorflow? - tensorflow

I have tensor of 10 elements. How can I add each element as scalar summary, preferably displayed on the same graph in Tensorboard?

You can access them as if the tensor were a numpy array: tensor[i,j], where the i and j are the indiceswhere the element is located (tensor[i] in the case the elemnt is a vector).
Then add them to the summary:
for i in tensor:
tf.summary.scalar("tensor"+ str(i), tensor[i], collections= "tensor")
Merge them: merged_summary = tf.summary.merge_all(key=['tensor'])
Run it: merged = sess.run(merged_summary, feed_dict={...}) and write it to file writer: writer.add_summary(merged, epoch).
To be able to merge them in the same graph I only know one way which brakes the last merging: using different file writers for each value in the tensor. Nevertheless, the following links could be useful:
https://www.quora.com/How-do-you-plot-training-and-validation-loss-on-the-same-graph-using-TensorFlow%E2%80%99s-TensorBoard
https://github.com/tensorflow/tensorflow/issues/7089
https://github.com/tensorflow/tensorboard/issues/300
https://github.com/tensorflow/tensorboard/pull/664

Related

Python numpy: (IndexError: too many indices for array) How to choose specific index to my matrix?

I'm trying to build a model from an array with 572 rows and 8 columns loaded with NumPy. Define the sets using the line address for a new array:
train_x = x_vals[(11:34, 46:98, 110:268, 280:342, 354:408, 420:428, 440:478, 490:538, 550:571]
test_x = x_vals[0:10, 35:45, 99:109, 269:279, 343:353, 409:419, 429:439, 479:489, 539:549]
train_y = y_vals[11:34, 46:98, 110:268, 280:342, 354:408, 420:428, 440:478, 490:538, 550:571]
test_y = y_vals[0:10, 35:45, 99:109, 269:279, 343:353, 409:419, 429:439, 479:489, 539:549]
I'm trying to test my model with 99 samples and calibrate with 473. Although the Spyder environment accepts the declarations of the lines above, at the time of running the program it appears:
train_x = x_vals[11:34, 46:98, 110:268, 280:342, 354:408, 420:428, 440:478, 490:538, 550:571]
IndexError: too many indices for array
What is missing in the declaration of the sets above?

Stacking list of lists vertically using np.vstack is throwing an error

I am following this piece of code http://queirozf.com/entries/scikit-learn-pipeline-examples in order to develop a Multilabel OnevsRest classifier for text. I would like to compute the hamming_score and thus would need to binarize my test labels as well. I thus have:
X_train, X_test, labels_train, labels_test = train_test_split(meetings, labels, test_size=0.4)
Here, labels_train and labels_test are list of lists
[['dog', 'cat'], ['cat'], ['people'], ['nice', 'people']]
Now I need to binarize all my labels, I am therefore doing this...
all_labels = np.vstack([labels_train, labels_test])
mlb = MultiLabelBinarizer().fit(all_labels)
As directed by in the link. But that throws
ValueError: all the input array dimensions except for the concatenation axis must match exactly
I used np.column_stack as directed here
numpy array concatenate: "ValueError: all the input arrays must have same number of dimensions"
but that throws the same error.
How can the dimensions be the same if I am splitting on train and test, I am bound to get different shapes right? Please help, thank you.
MultilabelBinarizer works on list of lists directly, so you dont need to stack them using numpy. Directly send the list without stacking.
all_labels = labels_train + labels_test
mlb = MultiLabelBinarizer().fit(all_labels)

Feeding .npy (numpy files) into tensorflow data pipeline

Tensorflow seems to lack a reader for ".npy" files.
How can I read my data files into the new tensorflow.data.Dataset pipline?
My data doesn't fit in memory.
Each object is saved in a separate ".npy" file. each file contains 2 different ndarrays as features and a scalar as their label.
It is actually possible to read directly NPY files with TensorFlow instead of TFRecords. The key pieces are tf.data.FixedLengthRecordDataset and tf.io.decode_raw, along with a look at the documentation of the NPY format. For simplicity, let's suppose that a float32 NPY file containing an array with shape (N, K) is given, and you know the number of features K beforehand, as well as the fact that it is a float32 array. An NPY file is just a binary file with a small header and followed by the raw array data (object arrays are different, but we're considering numbers now). In short, you can find the size of this header with a function like this:
def npy_header_offset(npy_path):
with open(str(npy_path), 'rb') as f:
if f.read(6) != b'\x93NUMPY':
raise ValueError('Invalid NPY file.')
version_major, version_minor = f.read(2)
if version_major == 1:
header_len_size = 2
elif version_major == 2:
header_len_size = 4
else:
raise ValueError('Unknown NPY file version {}.{}.'.format(version_major, version_minor))
header_len = sum(b << (8 * i) for i, b in enumerate(f.read(header_len_size)))
header = f.read(header_len)
if not header.endswith(b'\n'):
raise ValueError('Invalid NPY file.')
return f.tell()
With this you can create a dataset like this:
import tensorflow as tf
npy_file = 'my_file.npy'
num_features = ...
dtype = tf.float32
header_offset = npy_header_offset(npy_file)
dataset = tf.data.FixedLengthRecordDataset([npy_file], num_features * dtype.size, header_bytes=header_offset)
Each element of this dataset contains a long string of bytes representing a single example. You can now decode it to obtain an actual array:
dataset = dataset.map(lambda s: tf.io.decode_raw(s, dtype))
The elements will have indeterminate shape, though, because TensorFlow does not keep track of the length of the strings. You can just enforce the shape since you know the number of features:
dataset = dataset.map(lambda s: tf.reshape(tf.io.decode_raw(s, dtype), (num_features,)))
Similarly, you can choose to perform this step after batching, or combine it in whatever way you feel like.
The limitation is that you had to know the number of features in advance. It is possible to extract it from the NumPy header, though, just a bit of a pain, and in any case very hardly from within TensorFlow, so the file names would need to be known in advance. Another limitation is that, as it is, the solution requires you to either use only one file per dataset or files that have the same header size, although if you know that all the arrays have the same size that should actually be the case.
Admittedly, if one considers this kind of approach it may just be better to have a pure binary file without headers, and either hard code the number of features or read them from a different source...
You can do it with tf.py_func, see the example here.
The parse function would simply decode the filename from bytes to string and call np.load.
Update: something like this:
def read_npy_file(item):
data = np.load(item.decode())
return data.astype(np.float32)
file_list = ['/foo/bar.npy', '/foo/baz.npy']
dataset = tf.data.Dataset.from_tensor_slices(file_list)
dataset = dataset.map(
lambda item: tuple(tf.py_func(read_npy_file, [item], [tf.float32,])))
Does your data fit into memory? If so, you can follow the instructions from the Consuming NumPy Arrays section of the docs:
Consuming NumPy arrays
If all of your input data fit in memory, the simplest way to create a Dataset from them is to convert them to tf.Tensor objects and use Dataset.from_tensor_slices().
# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
In the case that the file doesn't fit into memory, it seems like the only recommended approach is to first convert the npy data into a TFRecord format, and then use the TFRecord data set format, which can be streamed without fully loading into memory.
Here is a post with some instructions.
FWIW, it seems crazy to me that TFRecord cannot be instantiated with a directory name or file name(s) of npy files directly, but it appears to be a limitation of plain Tensorflow.
If you can split the single large npy file into smaller files that each roughly represent one batch for training, then you could write a custom data generator in Keras that would yield only the data needed for the current batch.
In general, if your dataset cannot fit in memory, storing it as one single large npy file makes it very hard to work with, and preferably you should reformat the data first, either as TFRecord or as multiple npy files, and then use other methods.
Problem setup
I had a folder with images that were being fed into an InceptionV3 model for extraction of features. This seemed to be a huge bottleneck for the entire process. As a workaround, I extracted features from each image and then stored them on disk in a .npy format.
Now I had two folders, one for the images and one for the corresponding .npy files. There was an evident problem with the loading of .npy files in the tf.data.Dataset pipeline.
Workaround
I came across TensorFlow's official tutorial on show attend and tell which had a great workaround for the problem this thread (and I) were having.
Load numpy files
First off we need to create a mapping function that accepts the .npy file name and returns the numpy array.
# Load the numpy files
def map_func(feature_path):
feature = np.load(feature_path)
return feature
Use the tf.numpy_function
With the tf.numpy_function we can wrap any python function and use it as a TensorFlow op. The function must accept numpy object (which is exactly what we want).
We create a tf.data.Dataset with the list of all the .npy filenames.
dataset = tf.data.Dataset.from_tensor_slices(feature_paths)
We then use the map function of the tf.data.Dataset API to do the rest of our task.
# Use map to load the numpy files in parallel
dataset = dataset.map(lambda item: tf.numpy_function(
map_func, [item], tf.float16),
num_parallel_calls=tf.data.AUTOTUNE)

How to visualize a tensor summary in tensorboard

I'm trying to visualize a tensor summary in tensorboard. However I can't see the tensor summary at all in the board. Here is my code:
out = tf.strided_slice(logits, begin=[self.args.uttWindowSize-1, 0], end=[-self.args.uttWindowSize+1, self.args.numClasses],
strides=[1, 1], name='softmax_truncated')
tf.summary.tensor_summary('softmax_input', out)
where out is a multi-dimensional tensor. I guess there must be something wrong with my code. Probably I used the tensor_summary function incorrectly.
What you do is you create a summary op, but you don't invoke it and don't write the summary (see documentation).
To actually create a summary you need to do the following:
# Create a summary operation
summary_op = tf.summary.tensor_summary('softmax_input', out)
# Create the summary
summary_str = sess.run(summary_op)
# Create a summary writer
writer = tf.train.SummaryWriter(...)
# Write the summary
writer.add_summary(summary_str)
Explicitly writing a summary (last two lines) is only necessary if you don't have a higher level helper like a Supervisor. Otherwise you invoke
sv.summary_computed(sess, summary_str)
and the Supervisor will handle it.
More info, also see:
How to manually create a tf.Summary()
Hopefully a workaround which achieves what you want. ..
If you wish to view the tensor values, you can convert them using as_string, then use summary.text. The values will appear in the tensorboard text tab.
Not tried with 3D tensors, but feel free to slice according to needs.
code snippet, which includes use of inserting a print statement to get console output as well.
predictions = tf.argmax(reshaped_logits, 1)
txtPredictions = tf.Print(tf.as_string(predictions),[tf.as_string(predictions)], message='predictions', name='txtPredictions')
txtPredictions_op = tf.summary.text('predictions', txtPredictions)
Not sure whether this is kinda obvious, but you could use something like
def make_tensor_summary(tensor, name='defaultTensorName'):
for i in range(tensor.get_shape()[0]:
for j in range(tensor.get_shape()[1]:
tf.summary.scalar(Name + str(i) + '_' + str(j), tensor[i, j])
in case you know it is a 'matrix-shaped' Tensor in advance.

How to expand a Tensorflow Variable

Is there any way to make a Tensorflow Variable larger? Like, let's say I wanted to add a neuron to a layer of a neural network in the middle of training. How would I go about doing that? An answer in This question told me how to change the shape of the variable, to expand it to fit another row of weights, but I don't know how to initialize those new weights.
I figure another way of going about this might involve combining variables, as in initializing the weights first in a second variable and then adding that in as a new row or column of the first variable, but I can't find anything that lets me do that either.
There are various ways you could accomplish this.
1) The second answer in that post (https://stackoverflow.com/a/33662680/5548115) explains how you can change the shape of a variable by calling 'assign' with validate_shape=False. For example, you could do something like
# Assume var is [m, n]
# Add the new 'data' of shape [1, n] with new values
new_neuron = tf.constant(...)
# If concatenating to add a row, concat on the first dimension.
# If new_neuron was [m, 1], you would concat on the second dimension.
new_variable_data = tf.concat(0, [var, new_neuron]) # [m+1, n]
resize_var = tf.assign(var, new_variable_data, validate_shape=False)
Then when you run resize_var, the data pointed to by 'var' will now have the updated data.
2) You could also create a large initial variable, and call tf.slice on different regions of the variable as training progresses, since you can dynamically change the 'begin' and 'size' attributes of slice.
Simply using tf.concat for expand a Tensorflow Variable,you can see the api_docs
for detail.
v1 = tf.Variable(tf.zeros([5,3]),dtype=tf.float32)
v2 = tf.Variable(tf.zeros([1,3]),dtype=tf.float32)
v3 = tf.concat(0,[v1, v2])
Figured it out. It's kind of a roundabout process, but it's the only one I can tell that actually functions. You need to first unpack the variables, then append the new variable to the end, then pack them back together.
If you're expanding along the first dimension, it's rather short: only 7 lines of actual code.
#the first variable is 5x3
v1 = tf.Variable(tf.zeros([5, 3], dtype=tf.float32), "1")
#the second variable is 1x3
v2 = tf.Variable(tf.zeros([1, 3], dtype=tf.float32), "2")
#unpack the first variable into a list of size 3 tensors
#there should be 5 tensors in the list
change_shape = tf.unpack(v1)
#unpack the second variable into a list of size 3 tensors
#there should be 1 tensor in this list
change_shape_2 = tf.unpack(v2)
#for each tensor in the second list, append it to the first list
for i in range(len(change_shape_2)):
change_shape.append(change_shape_2[i])
#repack the list of tensors into a single tensor
#the shape of this resultant tensor should be [6, 3]
final = tf.pack(change_shape)
If you want to expand along the second dimension, it gets somewhat longer.
#First variable, 5x3
v3 = tf.Variable(tf.zeros([5, 3], dtype=tf.float32))
#second variable, 5x1
v4 = tf.Variable(tf.zeros([5, 1], dtype=tf.float32))
#unpack tensors into lists of size 3 tensors and size 1 tensors, respectively
#both lists will hold 5 tensors
change = tf.unpack(v3)
change2 = tf.unpack(v4)
#for each tensor in the first list, unpack it into its own list
#this should make a 2d array of size 1 tensors, array will be 5x3
changestep2 = []
for i in range(len(change)):
changestep2.append(tf.unpack(change[i]))
#do the same thing for the second tensor
#2d array of size 1 tensors, array will be 5x1
change2step2 = []
for i in range(len(change2)):
change2step2.append(tf.unpack(change2[i]))
#for each tensor in the array, append it onto the corresponding array in the first list
for j in range(len(change2step2[i])):
changestep2[i].append(change2step2[i][j])
#pack the lists in the array back into tensors
changestep2[i] = tf.pack(changestep2[i])
#pack the list of tensors into a single tensor
#the shape of this resultant tensor should be [5, 4]
final2 = tf.pack(changestep2)
I don't know if there's a more efficient way of doing this, but this works, as far as it goes. Changing further dimensions would require more layers of lists, as necessary.