The meaning of question mark (?) in keras layers input size - tensorflow

I'm using python2 with keras, tensorflow.
x = Input((32,), name="input1")
I think x's shape is (32,) but print(x) 's result is that 'shape(?,32)'.
What is means of 'shape(?,32)'?
And '?' means what and 32 means what..?

When you define tour input with Input((32,), name="input1") you are telling Keras that each input will be 1-dimensional with size 32. However you might send more than one input during training/predicting. For example if you send in 10 samples, each with length 32, you will actually send in a tensor with shape (10, 32).
Since the topology of the network is not dependent on the number of samples you send in, the shape may vary and is presented as (?,32) where ? is the number of samples.

Related

understanding tensorflow conv1d trainable variable shape

I'm using Keras with TensorFlow 2 and I have a trained model with the weights corresponding to each layer of my model but the shape of some conv1d layers confused me.
I set the convolutional layers to have 64 filters with a length of 16 but the shape of my weight vector is like (16,64,64) at the end.
can someone explain this to me? I suppose that 16 is the length of every filter and the last 64 is my num_filters, what is the other one, I mean how is that 3-dimensional? it should be (16,64) or something.
and besides, isn't this odd to specify the length of every filter on z-axis ? (of course with assuming the computer science version of representing dimensions (z,x,y instead of x,y,z))
what I get is something like this :
name:conv1d/kernel:0 shape:(16,64,64) dtype:<dtype:'float32'> numpy=...
thank you guys in advance.
to answer my own question, the first 64 corresponds to depth of the data we are facing. for instance, if you want to have 5 filters with 32-element length for a data which has 10 features (in other words the input depth of the conv layer is 10) your variable shape will be : (32,10,5)

How does Keras produce output of different size y, given an input of size x?

I am new to neural network here. I am reading a lot of guides and tutorial where they will start with an lstm layer where the input size differs from the output size
eg. model.add(LSTM(100, input_shape=(20, 1))) ->
before doing ->
model.add(Dense(80, activation='relu')), etc.
presumably, the output layer for the lstm here has size 100, where the input has only 20
for a dense layer I can imagine how that works because there are plenty of graphs depicting that, but how can a lstm produce output layer of very different size from the input?
and also importantly, of what range of value can the output be given the input (let's say of 20) effectively be? would any value make sense?
The output size can be anything. For example, in case of feeding word embeddings of 256 length and output size 1000 length, it somewhat follows the below steps:
Embedding goes into the LSTM (Here, I am ignoring the batch and sequence length; just one word embedding in one time-step)
The Weight Matrix (Waa, Way, Wax etc are initialized) : These matrices shapes depends upon the output size you gave (e.g. 100 above)
All the needed calculations are followed as per LSTM semantics
The output of 1000 vector length is generated

How to use the black/white image as the input to tensorflow

When implementing the reinforcement learning with tensorflow, the inputs are black/white images. Each pixel can be represented as a bit 1/0.
Can I give the data directly to tensorflow, with each bit as a feature? Or I had to expand the bits to bytes before sending to tensorflow? I'm new to tensorflow, so some code example would be nice.
Thanks
You can directly load the Image data as you would normally do, the Image being binary will have no effect other that the input channel width becoming 1 for the input.
Whenever you put an Image through a convnet, each output filter generally learns features for all the channels, so in case of a binary image, there is a separate kernel defined for each input channel / output channel combination (Since Only 1 input channel) in the first layer.
Each channel is defined by it's number of filters and there exists a 2D kernel for each input channel which averages over all filters, so you will have weights/parameters equal to input_channels * number_of_filters * filter_dims, here for the first layer input_channels becomes one.
Since you asked for some sample code.
Let your image be in a tensor X, simply use
X_out = tf.nn.conv2d(X, filters = 6, kernel_size = [height,width])
After that you can apply an activation, this will make your output image have 6 channels. If you face any problem or have some doubts, feel free to comment, for theoretical clarification, check out https://www.coursera.org/learn/convolutional-neural-networks/lecture/nsiuW/one-layer-of-a-convolutional-network
Edit
Since the question was about simple neural net, not conv net, here is the code for that,
X_train is the variable in which image is stored as (n_x,n_x) byte resolution, n_x is used later.
You will need to flatten the input.
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
This first flattens the image horizontally and then transposes it to arrange it vertically.
Then you will create placeholder tensor X as :
X = tf.placeholder(tf.bool,[n_x*n_x,None]) #Your Input tensor should have dimension same as your input layer.
let W, b be weight and bias respectively.
Z1 = tf.add(tf.matmul(W1,X),b1) #Linear Transformation step
A1 = tf.nn.relu(Z1) #Activation Step
And you keep on creating your graph, I think that answers your question, if not let me know.

Tensorflow dimensions /placeholders

I want to run a neural network in tensorflow. I am trying to do email classification, so my training data is an array of count vectorized documents.
Im trying to understand the dimensions for how I should input data into tensorflow. I am creating placeholders like this:
X = tf.placeholder(tf.int64, [None, #features]
Y = tf.placeholder(tf.int64, [None, #labels])
then later, I have to transform the actual y_train to have dimensionality (1, #observations) since I get some dimensionality errors when I run the code.
Should the placeholders and the variables have the same dimensionality? What is the correspondence? I am getting out of memory errors, so am concerned that I have something wrong with the input dimensions.
A little unsure as to what your "#" symbols refer. This if often used to mean "number" in which case what you have written would be incorrect. To be clear you want to define your placeholders for X and Y as
X = tf.placeholder(tf.int64, [None, input_dimensions])
Y = tf.placeholder(tf.int64, [None, 1])
Here the None values accommodate the number of samples in the training data you pass in; if you feed in 10 emails, None will be 10. The input_dimensions means "how long is the vector that represents a single training example". In the case of a grey-scale image this would be equal to the number of pixels, in the case of your e-mail inputs this should be the length of the longest vectorized email.
All of your email inputs will need to be input at the same length, and a common practice for all those shorter than the longest email is to pad the vectors up to the max length with zeros.
When comparing Y to the training labels (y_train) they should both be tensors of the same shape. So as Y has shape (number_of_emails, 1), so should y_train. You can convert from (1, number_of_emails) to (number_of_emails, 1) using
y_train = tf.reshape(y_train, [-1,1])
Finally the out of memory errors are unlikely to be to do with any dimension miss-match, but more likely you are feeding too many emails into the network at once. Each time you feed in some emails as X they must be held in memory. If there are many emails, feeding them all in at once will exhaust the memory resources (particularly if training on a GPU). For this reason it is common practice to batch your inputs into smaller groups fed in sequentially. Tensorflow provides a guide to importing data, as well as specific help on batching.

What are the effects of padding a tensor?

I'm working on a problem using Keras that has been presenting me with issues:
My X data is all of shape (num_samples, 8192, 8), but my Y data is of shape (num_samples, 4), where 4 is a one-hot encoded vector.
Both X and Y data will be run through LSTM layers, but the layers are rejecting the Y data because it doesn't match the shape of the X data.
Is padding the Y data with 0s so that it matches the dimensions of the X data unreasonable? What kind of effects would that have? Is there a better solution?
Edited for clarification:
As requested, here is more information:
My Y data represents the expected output of passing the X data through my model. This is my first time working with LSTMs, so I don't have an architecture in mind, but I'd like to use an architecture that works well with classifying long (8192-length) sequences of words into one of several categories. Additionally, the dataset that I have is of an immense size when fed through an LSTM, so I'm currently using batch-training.
Technologies being used:
Keras (Tensorflow Backend)
TL;DR Is padding one tensor with zeroes in all dimensions to match another tensor's shape a bad idea? What could be a better approach?
First of all, let's make sure your representation is actually what you think it is; the input to an LSTM (or any recurrent layer, for that matter) must be of dimensionality: (timesteps, shape), i.e. if you have 1000 training samples, each consisting of 100 timesteps, with each timestep having 10 values, your input shape will be (100,10,). Therefore I assume from your question that each input sample in your X set has 8192 steps and 8 values per step. Great; a single LSTM layer can iterate over these and produce 4-dimensional representations with absolutely no problem, just like so:
myLongInput = Input(shape=(8192,8,))
myRecurrentFunction = LSTM(4)
myShortOutput = myRecurrentFunction(myLongInput)
myShortOutput.shape
TensorShape([Dimension(None), Dimension(4)])
I assume your problem stems from trying to apply yet another LSTM on top of the first one; the next LSTM expects a tensor that has a time dimension, but your output has none. If that is the case, you'll need to let your first LSTM also output the intermediate representations at each time step, like so:
myNewRecurrentFunction=LSTM(4, return_sequences=True)
myLongOutput = myNewRecurrentFunction(myLongInput)
myLongOutput.shape
TensorShape([Dimension(None), Dimension(None), Dimension(4)])
As you can see the new output is now a 3rd order tensor, with the second dimension now being the (yet unassigned) timesteps. You can repeat this process until your final output, where you usually don't need the intermediate representations but rather only the last one. (Sidenote: make sure to set the activation of your last layer to a softmax if your output is in one-hot format)
On to your original question, zero-padding has very little negative impact on your network. The network will strain itself a bit in the beginning trying to figure out the concept of the additional values you have just thrown at it, but will very soon be able to learn they're meaningless. This comes at a cost of a larger parameter space (therefore more time and memory complexity), but doesn't really affect predictive power most of the time.
I hope that was helpful.