I've tried searching but I don't think I can find an answer to this... I am working on a neural network that derives its training data from outputs of an array of other neural networks when they are fed with simulated data. There is no "dataset"; each minibatch generates a new set of simulated data. So, if it's to be able to run efficiently, I need to aggregate the outputs from the array of other networks in a tf.function (which will run once per minibatch). But... I cannot for the life of me figure out if there is a way to loop over an array of non-Tensors -- everything seems to be about indexing into tensors (e.g. with tf.gather).
To be clear, the elements of the array are objects that contain tf.keras.models.Sequential objects with further tf.functions that further transform their outputs and train those networks, and they all work great on-graph. I just need to be able to hook into those somehow. Here is a very rough simplified version of what I'm trying to do:
networks_outputs = tf.TensorArray(size=N, dtype=tf.float32, element_shape=blah, infer_shape=True)
for n in tf.range(N):
input_data_n = tf.concat([input_data[:, n], input_data[:, N:N + K]], axis=1)
network_output = networks_array[n].some_tf_function(input_data_n)
networks_outputs = networks_outputs.write(n, network_output)
networks_outputs = networks_outputs.stack()
-- but this gives me the error:
TypeError: list indices must be integers or slices, not Tensor
Related
since I am not very experienced I am struggling with a siamese twin network.
I have 2 images which run trough the same CNN and generate each a distinct feature vector. I would like to train a further network interpreting these two image vectors (each with 32 elements). In an intermediate step I would like to use these vectors as input for a function NCC which is located as a Layer between the CNN and the NN and defined in the following snippet ( i.e. the output should be used for the next NN):
def NCC(a, b):
l=a.shape[1]
av_a=tf.math.reduce_mean(a)
av_b=tf.math.reduce_mean(b)
a=a-av_a
b=b-av_b
norm_a=tf.math.sqrt(tf.math.reduce_sum(a*a))
norm_b=tf.math.sqrt(tf.math.reduce_sum(b*b))
a=a/norm_a
b=b/norm_b
A=tf.reshape(tf.repeat(a, axis=0, repeats=l),(l,l))
B=tf.reshape(tf.repeat(b, axis=0, repeats=l),(l,l))
ncc=Flatten()(A*tf.transpose(B))
return ncc
The output vector (for batchsize=1) should have a 32x32=1024 elements. It seems to work for a batchsize of 1. If I increase the batch size I run into trouble because the input vectors are now tensors with shape=(batch_size,32). I think this is a very stupid question- But how can I circumvent this issue? (It should be noted I wish also to have an output tensor with shape=(batch_size,1024))
Thanks in advance
Mike
When I use pre-trained word vectors to do classification with LSTM, I wondered how to deal with embedding lookup table larger than 2gb in tensorflow.
To do this, I tried to make embedding lookup table like the code below,
data = tf.nn.embedding_lookup(vector_array, input_data)
got this value error.
ValueError: Cannot create a tensor proto whose content is larger than 2GB
variable vector_array on the code is numpy array, and it contains about 14 million unique tokens and 100 dimension word vectors for each word.
thank you for your helping with
You need to copy it to a tf variable. There's a great answer to this question in StackOverflow:
Using a pre-trained word embedding (word2vec or Glove) in TensorFlow
This is how I did it:
embedding_weights = tf.Variable(tf.constant(0.0, shape=[embedding_vocab_size, EMBEDDING_DIM]),trainable=False, name="embedding_weights")
embedding_placeholder = tf.placeholder(tf.float32, [embedding_vocab_size, EMBEDDING_DIM])
embedding_init = embedding_weights.assign(embedding_placeholder)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.run(embedding_init, feed_dict={embedding_placeholder: embedding_matrix})
You can then use the embedding_weights variable for performing the lookup (remember to store word-index mapping)
Update: Use of the variable is not required but it allows you to save it for future use so that you don't have to re-do the whole thing again (it takes a while on my laptop when loading very large embeddings). If that's not important, you can simply use placeholders like Niklas Schnelle suggested
For me the accepted answer doesn't seem to work. While there is no error the results were terrible (when compared to a smaller embedding via direct initialization) and I suspect the embeddings were just the constant 0 the tf.Variable() is initialized with.
Using just a placeholder without an extra variable
self.Wembed = tf.placeholder(
tf.float32, self.embeddings.shape,
name='Wembed')
and then feeding the embedding on every session.run() of the graph seems to work however.
Using feed_dict with large embeddings was too slow for me with TF 1.8, probably due to the issue mentioned by Niklas Schnelle.
I ended up with the following code:
embeddings_ph = tf.placeholder(tf.float32, wordVectors.shape, name='wordEmbeddings_ph')
embeddings_var = tf.Variable(embeddings_ph, trainable=False, name='wordEmbeddings')
embeddings = tf.nn.embedding_lookup(embeddings_var,input_data)
.....
sess.run(tf.global_variables_initializer(), feed_dict={embeddings_ph:wordVectors})
I relatively new to tensorflow, but even with a lot of research I was unable to find a documentation of certain variable meanings.
For my current project, I want to train a DNN with the help of tensorflow, and afterwards I want to extract the weight and bias matrices from it to use it in another application OUTSIDE tensorflow. For the first try, I set up a simple network with a [4, 10, 2] structure, which predicts a binary outcome.
I used 3 real_valued_columns and a single sparse_column_with_keys (wrapped in an embedding_column) as features:
def build_estimator(optimizer=None, activation_fn=tf.sigmoid):
"""Build an estimator"""
# Sparse base columns
column_stay_point = tf.contrib.layers.sparse_column_with_keys(
column_name='stay_point',
keys=['no', 'yes'])
# Continuous base columns
column_heading = tf.contrib.layers.real_valued_column('heading')
column_velocity = tf.contrib.layers.real_valued_column('velocity')
column_acceleration = tf.contrib.layers.real_valued_column('acceleration')
pedestrian_feature_columns = [column_heading,
column_velocity,
column_acceleration,
tf.contrib.layers.embedding_column(
column_stay_point,
dimension=8,
initializer=tf.truncated_normal_initializer)]
# Create classifier
estimator = tf.contrib.learn.DNNClassifier(
hidden_units=[10],
feature_columns=pedestrian_feature_columns,
model_dir='./tmp/pedestrian_model',
n_classes=2,
optimizer=optimizer,
activation_fn=activation_fn)
return estimator
I called this function with default arguments and used estimator.fit(...) to train the DNN. Aside from some warnings concerning the deprecated 'scalar_summary' function, it ran successfully and produced reasonable results. I printed all variables of the model by using the following line:
var = {k: estimator.get_variable_value(k) for k in estimator.get_variable_names())
I expected to get a weight matrices of size 10x4 and 2x10 as well as bias matrices of size 10x1 and 2x1. But I got the following:
'dnn/binary_logistic_head/dnn/learning_rate': 0.05 (actual value, scalar)
'dnn/input_from_feature_columns/stay_point_embedding/weights': 2x8 array
'dnn/hiddenlayer_0/weights/hiddenlayer_0/weights/part_0/Adagrad': 11x10 array
'dnn/input_from_feature_columns/stay_point_embedding/weights/int_embedding/weights/part_0/Adagrad': 2x8 array
'dnn/hiddenlayer_0/weights': 11x10 array
'dnn/logits/biases': 1x1' array
'dnn/logits/weights/nn/dnn/logits/weights/part_0/Adagrad': 10x1 array
'dnn/logits/weights': 10x1 array
'dnn/logits/biases/dnn/dnn/logits/biases/part_0/Adagrad': 1x1 array
'global_step': 5800, (actual value, scalar)
'dnn/hiddenlayer_0/biases': 1x10 array
'dnn/hiddenlayer_0/biases//hiddenlayer_0/biases/part_0/Adagrad': 1x10 array
Is there any documentation what these cryptic names mean and why do the matrices have these weird dimensions? Also, why are there references to the Adagrad optimizer despite never specifying it?
Any help is highly appreciated!
The number of input nodes in your network is 11 and not 4
8(embedding_column)+column_heading(1),column_velocity(1),column_acceleration(1) = 11
And based on the variable names the output is a binary logistic node, so the number of output nodes is only one and not 2.
Below are the weights/biases you are interested in.
dnn/hiddenlayer_0/weights': 11x10 array --> There are the weights from inputs to hidden nodes
dnn/hiddenlayer_0/biases': 1x10 array --> Biases of hidden nodes
dnn/logits/weights': 10x1 array --> Weights from hidden nodes to the output node
dnn/logits/biases': 1x1' array --> Bias of the output node.
why are there references to the Adagrad optimizer despite never specifying it?
Most probably the default optimizer is AdaGrad.
I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.
I'm making my first steps learning TF and have some trouble training RNNs.
My toy problem goes like this: a two layers LSTM + dense layer network is fed with raw audio data and should test whether a certain frequency is present in the sound.
so the network should 1 to 1 map float(audio data sequence) to float(pre-chosen frequency volume)
I've got this to work on Keras and seen a similar TFLearn solution but would like to implement this on bare Tensorflow in a relatively efficient way.
what i've done:
lstm = rnn_cell.BasicLSTMCell(LSTM_SIZE,state_is_tuple=True,forget_bias=1.0)
lstm = rnn_cell.DropoutWrapper(lstm)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * 2,state_is_tuple=True)
outputs, states = rnn.dynamic_rnn(stacked_lstm, in, dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])
last = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
network= tf.matmul(last, W) + b
# cost function, optimizer etc...
during training I fed this with (BATCH_SIZE, SEQUENCE_LEN,1) batches and it seems like the loss converged correctly but I can't figure out how to predict with the trained network.
My (awful lot of) questions:
how do i make this network return a sequence right from Tensorflow without going back to python for each sample(feed a sequence and predict a sequence of the same size)?
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
after some research:
how do i make this network return a sequence right from Tensorflow
without going back to python for each sample(feed a sequence and
predict a sequence of the same size)?
you can use state_saving_rnn
class Saver():
def __init__(self):
self.d = {}
def state(self, name):
if not name in self.d:
return tf.zeros([1,LSTM_SIZE],tf.float32)
return self.d[name]
def save_state(self, name, val):
self.d[name] = val
return tf.identity('save_state_name') #<-important for control_dependencies
outputs, states = rnn.state_saving_rnn(stacked_lstm, inx, Saver(),
('lstmstate', 'lstmstate2', 'lstmstate3', 'lstmstate4'),sequence_length=[EVAL_SEQ_LEN])
#4 states are for two layers of lstm each has hidden and CEC variables to restore
network = [tf.matmul(outputs[-1], W) for i in xrange(EVAL_SEQ_LEN)]
one problem is that state_saving_rnn is using rnn() and not dynamic_rnn() therefore unroll at compile time EVAL_SEQ_LEN steps you might want to re-implement state_saving_rnn with dynamic_rnn if you want to input long sequences
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
you can use dynamic_rnn and supply initial_state. this is probably just as efficient as state_saving_rnn. look at state_saving_rnn implementations for reference
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
dynamic_rnn does do unrolling at runtime similarly to compile time rnn(). I guess it returns all the steps for you to branch the graph in some other places - after less time steps. in a network that use [one time step input * current state -> one output, new state] like the one described above it's not needed in testing but could be used for training truncated time back propagation