I want to run a neural network in tensorflow. I am trying to do email classification, so my training data is an array of count vectorized documents.
Im trying to understand the dimensions for how I should input data into tensorflow. I am creating placeholders like this:
X = tf.placeholder(tf.int64, [None, #features]
Y = tf.placeholder(tf.int64, [None, #labels])
then later, I have to transform the actual y_train to have dimensionality (1, #observations) since I get some dimensionality errors when I run the code.
Should the placeholders and the variables have the same dimensionality? What is the correspondence? I am getting out of memory errors, so am concerned that I have something wrong with the input dimensions.
A little unsure as to what your "#" symbols refer. This if often used to mean "number" in which case what you have written would be incorrect. To be clear you want to define your placeholders for X and Y as
X = tf.placeholder(tf.int64, [None, input_dimensions])
Y = tf.placeholder(tf.int64, [None, 1])
Here the None values accommodate the number of samples in the training data you pass in; if you feed in 10 emails, None will be 10. The input_dimensions means "how long is the vector that represents a single training example". In the case of a grey-scale image this would be equal to the number of pixels, in the case of your e-mail inputs this should be the length of the longest vectorized email.
All of your email inputs will need to be input at the same length, and a common practice for all those shorter than the longest email is to pad the vectors up to the max length with zeros.
When comparing Y to the training labels (y_train) they should both be tensors of the same shape. So as Y has shape (number_of_emails, 1), so should y_train. You can convert from (1, number_of_emails) to (number_of_emails, 1) using
y_train = tf.reshape(y_train, [-1,1])
Finally the out of memory errors are unlikely to be to do with any dimension miss-match, but more likely you are feeding too many emails into the network at once. Each time you feed in some emails as X they must be held in memory. If there are many emails, feeding them all in at once will exhaust the memory resources (particularly if training on a GPU). For this reason it is common practice to batch your inputs into smaller groups fed in sequentially. Tensorflow provides a guide to importing data, as well as specific help on batching.
Related
I'm using python2 with keras, tensorflow.
x = Input((32,), name="input1")
I think x's shape is (32,) but print(x) 's result is that 'shape(?,32)'.
What is means of 'shape(?,32)'?
And '?' means what and 32 means what..?
When you define tour input with Input((32,), name="input1") you are telling Keras that each input will be 1-dimensional with size 32. However you might send more than one input during training/predicting. For example if you send in 10 samples, each with length 32, you will actually send in a tensor with shape (10, 32).
Since the topology of the network is not dependent on the number of samples you send in, the shape may vary and is presented as (?,32) where ? is the number of samples.
While taking an input for a word prediction model in rnn tensorflow,
why do we need a 3d tensor?
Please look at the code below.
Why do we need that extra 1 here?
x = tf.placeholder("float", [None, n_input, 1])
Emm, I guess you are using simple predict model, maybe just for demonstrate. We tried to use RNN to predict model with an basic idea that each word in a sentence will be affected by the former or the latter words(that's what we called context), so we use the sequential inputs of words in a sentence to represent words appear one by one with the time passes.
So we need a Tensor with a shape [batch_size, words_counts, words_represent], and in your situation, words_counts is n_input represents time steps, and the word represent tensor words_represent is shape (1, ) tuple.
BUT, in real practice, we not just transfer each word into an (1, ) tuple, we may use word embedding to create a meaningful and useful tensor represent of a word. So maybe i guess that you have tried a simple demo or may i making mistakes.
I'm working on a problem using Keras that has been presenting me with issues:
My X data is all of shape (num_samples, 8192, 8), but my Y data is of shape (num_samples, 4), where 4 is a one-hot encoded vector.
Both X and Y data will be run through LSTM layers, but the layers are rejecting the Y data because it doesn't match the shape of the X data.
Is padding the Y data with 0s so that it matches the dimensions of the X data unreasonable? What kind of effects would that have? Is there a better solution?
Edited for clarification:
As requested, here is more information:
My Y data represents the expected output of passing the X data through my model. This is my first time working with LSTMs, so I don't have an architecture in mind, but I'd like to use an architecture that works well with classifying long (8192-length) sequences of words into one of several categories. Additionally, the dataset that I have is of an immense size when fed through an LSTM, so I'm currently using batch-training.
Technologies being used:
Keras (Tensorflow Backend)
TL;DR Is padding one tensor with zeroes in all dimensions to match another tensor's shape a bad idea? What could be a better approach?
First of all, let's make sure your representation is actually what you think it is; the input to an LSTM (or any recurrent layer, for that matter) must be of dimensionality: (timesteps, shape), i.e. if you have 1000 training samples, each consisting of 100 timesteps, with each timestep having 10 values, your input shape will be (100,10,). Therefore I assume from your question that each input sample in your X set has 8192 steps and 8 values per step. Great; a single LSTM layer can iterate over these and produce 4-dimensional representations with absolutely no problem, just like so:
myLongInput = Input(shape=(8192,8,))
myRecurrentFunction = LSTM(4)
myShortOutput = myRecurrentFunction(myLongInput)
myShortOutput.shape
TensorShape([Dimension(None), Dimension(4)])
I assume your problem stems from trying to apply yet another LSTM on top of the first one; the next LSTM expects a tensor that has a time dimension, but your output has none. If that is the case, you'll need to let your first LSTM also output the intermediate representations at each time step, like so:
myNewRecurrentFunction=LSTM(4, return_sequences=True)
myLongOutput = myNewRecurrentFunction(myLongInput)
myLongOutput.shape
TensorShape([Dimension(None), Dimension(None), Dimension(4)])
As you can see the new output is now a 3rd order tensor, with the second dimension now being the (yet unassigned) timesteps. You can repeat this process until your final output, where you usually don't need the intermediate representations but rather only the last one. (Sidenote: make sure to set the activation of your last layer to a softmax if your output is in one-hot format)
On to your original question, zero-padding has very little negative impact on your network. The network will strain itself a bit in the beginning trying to figure out the concept of the additional values you have just thrown at it, but will very soon be able to learn they're meaningless. This comes at a cost of a larger parameter space (therefore more time and memory complexity), but doesn't really affect predictive power most of the time.
I hope that was helpful.
Many of the examples which I have learnt to code are of scalar input numbers. I want to try vector input. With example of https://github.com/tencia/stocks_rnn
I tried to change the code to input [x,x^2] instead of x, with following two lines of changes. But I get error.
in STOCKLSTM:
self._input_data = tf.placeholder(tf.float32, [2, batch_size, num_steps])
In main/Epoch
cost, state, _ = session.run([m.cost, m.final_state, eval_op],
{m.input_data: (x,x**2), m.targets: y, m.initial_state: state})
ERROR:
ValueError: Cannot feed value of shape (2, 30, 10) for Tensor u'model/Placeholder:0', which has shape '(30, 10)'
Any ideas if thought direction is correct? I feel severely punished for bunking tensor classes in grad :(
Karma
Here the problem is giving the batch size as a value in to the place holder. Make that 2 , none. That means it can get any amount of batch data.When using placeholders don't initiate each and everything because they are flexible structures.
I am trying to pass a list of 2d numpy arrays with different sizes to a convolutional neural network using feed_dict parameter.
x = tf.placeholder(tf.float32, [batch_size, None, None, None])
y = tf.placeholder(tf.float32, [batch_size, 1])
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
optimizer.run(feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})
and I am getting the following error :
ValueError: setting an array element with a sequence.
I understood that batch[0] has to contain arrays with the same size.
I am trying to find a way to apply the optimization using variable sized batch of arrays but all the suggested solutions ask to resize the arrays which is not possible in my case because these arrays are not images and contain DNA Fragments with different sizes (any modifications on any element of the array will cause a lost of important information)
Anyone has an idea ?
The matrix provided needs to have a consistent size across rows and columns. One row, or column, can not be a different size than any other.
Matrix #1 Matrix #2
1 2 3 1 2 3
None 4 5 6
None 7 8 9
No operations will work on Matrix #1, which is essentially what you have. If you want to feed in vairable size matrices (different sizes among matices, but size size with in rows and columns) this
may solve your problem
Args:
shape: The shape of the tensor to be fed (optional). If the shape is
not specified, you can feed a tensor of any shape.
Or you if you are looking for a sparse tensor (tf.sparse_placeholder() -- undefined elements are set to zero), this question may help.