What is the structure of the data and labels in tensorflow.examples.tutorials.mnist input_data - tensorflow

I'm trying to learn to introduce data to conv nets properly in Tensorflow, and a majority of example code uses from import tensorflow.examples.tutorials.mnist import input_data.
It's simple when you can use this to access mnist data, but not helpful when trying to establish the equivalent way to structure and introduce non-mnist data to similar models.
What is the structure of the data being imported through the mnist examples, so that I can use example cnn walkthrough code and manipulate my data to mirror the structure of the mnist data?

The format of the MNIST data obtained from that example code depends on exactly how you initialize the DataSet class. Calling DataSet.next_batch(batch_size) returns two NumPy arrays, representing batch_size images and labels respectively. They have the following formats.
If the DataSet was initialized with reshape=True (the default), the images array is a batch_size by 784 matrix, in which each row contains the pixels of one MNIST image. The default type is tf.float32, and the values are pixel intensities between 0.0 and 1.0.
If the DataSet was initialized with reshape=False, the images array is batch_size by 28 by 28 by 1 4-dimensional tensor. The 28 corresponds to the height and width of each image in pixels; the 1 corresponds to the number of channels in the images, which are grayscale and so have only a single channel.
If the DataSet was initialized with one_hot=False (the default), the labels array is a vector of length batch_size, in which each value is the label (an integer from 0 to 9) representing the digit in the respective image.
If the DataSet was initialized with one_hot=True, the labels array is a batch_size by 10 matrix, in which each row is all zeros, except for a 1 in the column that corresponds to the label of the respective image.
Note that if you are interested in convolutional networks, initializing the DataSet with reshape=False is probably what you want, since that will retain spatial information about the images that will be used by the convolutional operators.

Related

How to format data for 1d CNN?

I have a dataset that I need to use with a 1d CNN, however I am not sure how to structure the data dimensions so that I can be used with the 1d CNN.
The data has 5 output classes, however, the input data is where I'm unsure how to proceed. Each output has a matrix of data associated with it, that is 16 x 8000. In other words, for every output I have, there is an associated matrix of numbers that must be fed together to reach that output. I have multiple of these data 16 x 1800 matrices for different samples, from which I am trying to make a prediction.
I was wondering how I can create a data frame for this that can be passed into a 1-d CNN?
More specifically, what would the input_shape parameter be set to in the model?
Right now, I am thinking that my output will be (# samples, 5). My input would be along the lines of ( # samples, (16 x 1800)), but I don't know how this would be implemented in keras?
Any help would be appreciated.

Keras Conv3D Layer with Discrete Values

I'm trying to build a model that will learn features of a 3D space. Unlike image processing, the values of the 3D matrix are not continuous; they represent some discrete value of what "material" can be found at that specific coordinate (grass with value 1 or stairs with value 2 for example).
Is it possible to train a model to learn the features of the space without interpolating in-between values? For example, I don't want the neural net to deduce 1.5 to be some kind of grass stairs.
You'll want to use one-hot encoding, which represents categorical values as arrays of zeroes with a single value set to one. This means that grass (id = 1) would be [0, 1, 0, 0, ...] and stairs (id = 2) would be [0, 0, 1, 0, ...]. To perform one-hot encoding, look into keras' to_categorical function.
Further reading:
one-hot encoding tutorial
one-hot preprocessing using to_categorical
one-hot on the fly using an embedding layer
As any categorical model, this should be a "one-hot" data.
The "channels" dimension of your data should have a size of n-materials.
Values = 0 mean there is no presence of that material
Values = 1 mean there is presence of that material
So, your input shape will be something like (samples, spatial1, spatial2, spatial3, materials). If your data is currently shaped as (samples, s1, s2, s3) and has the materias as integers as you described, you can use to_categorical to transform the integers to "one-hot".
Although I am not sure if this is what you are asking for, I would imagine that t after the bottleneck of the convolutional network, one would typically use a flatten layer and then the output goes to a dense layer. The output layer, if using sigmoid activation will give you probabilities for each of the classes which have to be one-hot encoded, as others have suggested.
If you want the output of the network itself to be in discreet values, I suppose you can use some sort of step-wise activation function in the output layer. However you have to take care that your loss remains differentiable throughout the network (which is why such activation functions are not available in keras). This might be of interest: https://github.com/keras-team/keras/issues/7370

How to modify the tensorflow loss function to suit multi labels on the same image

Tensorflow is fairly new to me and the way i would have the loss calculated on the mnist dataset was using the softmax_cross_entropy_with_logits function.
This function worked on that dataset due to the label input being a single label on each image
What im trying to do is to train a CNN on the mscoco dataset which has multiple labels on the same image with 80 classes total.
Is there a function that makes that possible?
My label input is currently somewhat a modified onehot representation, meaning that for each image i have a list of 80 elements having 0 for categories not in the image and 1 for categories present in an image
I.e. an image with a human and a dog would have a list of [0,1,0,0,1] assuming i have 5 classes with dogs and humans being in index 1 and 4
For multi-label classification problem, you can use the sigmoid function available in tensorflow (tf.nn.sigmoid_cross_entropy_with_logits). It would take the onehot encoded label input along with the final logits layer as its input.

Transpose tensorboard embedding projections

My model is trying to predict scores for 163 items using variety of inputs. It uses keras on tensorflow backend.
Following the approach in Keras - Save image embedding of the mnist data set to capture layer weights, I am capturing embedding data for final layer which is Dense(163). Since final dense layer is getting 128 inputs, weights matrix is 128x163. In Tensorboard Projector, I can see it visualizes 128 points very well.
However when I try to map it to my real world items using meta data, I have 163 items names but Tensorboard Projecter is visualizing 128x163 weight matrix by dimension 0 i.e. 128 points. Is there any way to make it visualize points by dimension 1 (163 points) in Tensorboard Projector?

Why do we flatten the data before we feed it into tensorflow?

I'm following udacity MNIST tutorial and MNIST data is originally 28*28 matrix. However right before feeding that data, they flatten the data into 1d array with 784 columns (784 = 28 * 28).
For example,
original training set shape was (200000, 28, 28).
200000 rows (data). Each data is 28*28 matrix
They converted this into the training set whose shape is (200000, 784)
Can someone explain why they flatten the data out before feeding to tensorflow?
Because when you're adding a fully connected layer, you always want your data to be a (1 or) 2 dimensional matrix, where each row is the vector representing your data. That way, the fully connected layer is just a matrix multiplication between your input (of size (batch_size, n_features)) and the weights (of shape (n_features, n_outputs)) (plus the bias and the activation function), and you get an output of shape (batch_size, n_outputs). Plus, you really don't need the original shape information in a fully connected layer, so it's OK to lose it.
It would be more complicated and less efficient to get the same result without reshaping first, that's why we always do it before a fully connected layer. For a convolutional layer, on the opposite, you'll want to keep the data in original format (width, height).
That is a convention with fully connected layers. Fully connected layers connect every node in the previous layer with every node in the successive layer so locality is not an issue for this type of layer.
Additionally by defining the layer like this we can efficiently calculate the next step by calculating the formula: f(Wx + b) = y. This would not be as easily possible with multidimensional input and reshaping the input is low cost and easy to accomplish.