How to expand a Tensorflow Variable - tensorflow

Is there any way to make a Tensorflow Variable larger? Like, let's say I wanted to add a neuron to a layer of a neural network in the middle of training. How would I go about doing that? An answer in This question told me how to change the shape of the variable, to expand it to fit another row of weights, but I don't know how to initialize those new weights.
I figure another way of going about this might involve combining variables, as in initializing the weights first in a second variable and then adding that in as a new row or column of the first variable, but I can't find anything that lets me do that either.

There are various ways you could accomplish this.
1) The second answer in that post (https://stackoverflow.com/a/33662680/5548115) explains how you can change the shape of a variable by calling 'assign' with validate_shape=False. For example, you could do something like
# Assume var is [m, n]
# Add the new 'data' of shape [1, n] with new values
new_neuron = tf.constant(...)
# If concatenating to add a row, concat on the first dimension.
# If new_neuron was [m, 1], you would concat on the second dimension.
new_variable_data = tf.concat(0, [var, new_neuron]) # [m+1, n]
resize_var = tf.assign(var, new_variable_data, validate_shape=False)
Then when you run resize_var, the data pointed to by 'var' will now have the updated data.
2) You could also create a large initial variable, and call tf.slice on different regions of the variable as training progresses, since you can dynamically change the 'begin' and 'size' attributes of slice.

Simply using tf.concat for expand a Tensorflow Variable,you can see the api_docs
for detail.
v1 = tf.Variable(tf.zeros([5,3]),dtype=tf.float32)
v2 = tf.Variable(tf.zeros([1,3]),dtype=tf.float32)
v3 = tf.concat(0,[v1, v2])

Figured it out. It's kind of a roundabout process, but it's the only one I can tell that actually functions. You need to first unpack the variables, then append the new variable to the end, then pack them back together.
If you're expanding along the first dimension, it's rather short: only 7 lines of actual code.
#the first variable is 5x3
v1 = tf.Variable(tf.zeros([5, 3], dtype=tf.float32), "1")
#the second variable is 1x3
v2 = tf.Variable(tf.zeros([1, 3], dtype=tf.float32), "2")
#unpack the first variable into a list of size 3 tensors
#there should be 5 tensors in the list
change_shape = tf.unpack(v1)
#unpack the second variable into a list of size 3 tensors
#there should be 1 tensor in this list
change_shape_2 = tf.unpack(v2)
#for each tensor in the second list, append it to the first list
for i in range(len(change_shape_2)):
change_shape.append(change_shape_2[i])
#repack the list of tensors into a single tensor
#the shape of this resultant tensor should be [6, 3]
final = tf.pack(change_shape)
If you want to expand along the second dimension, it gets somewhat longer.
#First variable, 5x3
v3 = tf.Variable(tf.zeros([5, 3], dtype=tf.float32))
#second variable, 5x1
v4 = tf.Variable(tf.zeros([5, 1], dtype=tf.float32))
#unpack tensors into lists of size 3 tensors and size 1 tensors, respectively
#both lists will hold 5 tensors
change = tf.unpack(v3)
change2 = tf.unpack(v4)
#for each tensor in the first list, unpack it into its own list
#this should make a 2d array of size 1 tensors, array will be 5x3
changestep2 = []
for i in range(len(change)):
changestep2.append(tf.unpack(change[i]))
#do the same thing for the second tensor
#2d array of size 1 tensors, array will be 5x1
change2step2 = []
for i in range(len(change2)):
change2step2.append(tf.unpack(change2[i]))
#for each tensor in the array, append it onto the corresponding array in the first list
for j in range(len(change2step2[i])):
changestep2[i].append(change2step2[i][j])
#pack the lists in the array back into tensors
changestep2[i] = tf.pack(changestep2[i])
#pack the list of tensors into a single tensor
#the shape of this resultant tensor should be [5, 4]
final2 = tf.pack(changestep2)
I don't know if there's a more efficient way of doing this, but this works, as far as it goes. Changing further dimensions would require more layers of lists, as necessary.

Related

using gather on argmax is different than taking max

I'm trying to learn to train a double-DQN algorithm on tensorflow and it doesn't work. to make sure everything is fine I wanted to test something. I wanted to make sure that using tf.gather on the argmax is exactly the same as taking the max: let's say I have a network called target_network:
first let's take the max:
next_qvalues_target1 = target_network.get_symbolic_qvalues(next_obs_ph) #returns tensor of qvalues
next_state_values_target1 = tf.reduce_max(next_qvalues_target1, axis=1)
let's try it in a different way- using argmax and gather:
next_qvalues_target2 = target_network.get_symbolic_qvalues(next_obs_ph) #returns same tensor of qvalues
chosen_action = tf.argmax(next_qvalues_target2, axis=1)
next_state_values_target2 = tf.gather(next_qvalues_target2, chosen_action)
diff = tf.reduce_sum(next_state_values_target1) - tf.reduce_sum(next_state_values_target2)
next_state_values_target2 and next_state_values_target1 are supposed to be completely identical. so running the session should output diff = . but it does not.
What am I missing?
Thanks.
Found out what went wrong. chosen action is of shape (n, 1) so I thought that using gather on a variable that's (n, 4) I'll get a result of shape (n, 1). turns out this isn't true. I needed to turn chosen_action to be a variable of shape (n, 2)- instead of [action1, action2, action3...] I needed it to be [[1, action1], [2, action2], [3, action3]....] and use gather_nd to be able to take specific elements from next_qvalues_target2 and not gather, because gather takes complete rows.

Word2Vec + LSTM on API Sequence

I am trying to apply word2Vec and LSTM on a dataset that contains files' API trace log including API function calls and their parameters for a binary classification.
The data looks like:
File_ID, Label, API Trace log
1, M, kernel32 LoadLibraryA kernel32.dll
kernel32 GetProcAddress MZ\x90 ExitProcess
...
2, V, kernel32 GetModuleHandleA RPCRT4.dll
kernel32 GetCurrentThreadId d\x8B\x0D0 POINTER POINTER
...
The API trace including: module name, API function name, parameters (that separated by blank space)
Take first API trace of file 1 as example, kernel32 is the module name, LoadLibraryA is function name, kernel32.dll is parameter. Each API trace is separated by \n so that each line represents a API sequence information sequentially.
Firstly I trained a word2vec model based on the line sentence of all API trace log. There are about 5k API function calls, e.g. LoadLibraryA, GetProcAddress. However, because parameter value could be vary, the model becomes quite big (with 300,000 vocabulary) after including those parameters.
After that, I trained a LSTM by applying word2vec's embedding_wrights, the model structure looks like:
model = Sequential()
model.add(Embedding(output_dim=vocab_dim, input_dim=n_symbols, \
mask_zero=False, weights=[embedding_weights], \
trainable=False))
model.add(LSTM(dense_dim,kernel_initializer='he_normal', dropout=0.15,
recurrent_dropout=0.15, implementation=2))
model.add(Dropout(0.3))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=batch_size, callbacks=[early_stopping, parallel_check_cb])
The way I get embedding_weights is to create a matrix, for each vocabulary in word2vec model, map the index of the word in the model, to it's vector
def create_embedding_weights(model, max_index=0):
# dimensionality of your word vectors
num_features = len(model[model.vocab.keys()[0]])
n_symbols = len(model.vocab) + 1 # adding 1 to account for 0th index (for masking)
# Only word2vec feature set
embedding_weights = np.zeros((max(n_symbols + 1, max_index + 1), num_features))
for word, value in model.vocab.items():
embedding_weights[value.index, :] = model[word]
return embedding_weights
For training data, what I did is that for each word in API call, convert the actual word to the index of word2vec model so that it's consistent to the index in embedding_weights above. e.g. kernel32 -> 0, LoadLibraryA -> 1, kernel32.dll -> 2. GetProcAddress -> 4, MZ\x90 -> 5, ExitProcess ->6
So the train data for file 1 looks like [0, 1, 2, 3, 4, 5, 6]. Noted, I didn't do line split for each API trace. As a result, the model may not know where is the start and end of API trace? And the training accuracy of the model is pretty bad - accuracy is 50% :(
My question is that, when prepare the training and validation dataset, should I also split the line when mapping the actual words to their index? then The above training data would be changed to following, each API trace is separated by a line, and maybe padd the missing value to -1 which doesn't exist in word2vec's indexes.
[[0, 1, 2, -1]
[3, 4, 5, 6]]
Meanwhile I am using a very simple structure for training, while word2vec model is quite big, any suggestion on structure would also be appreciated.
I would at least split the trace lines in three:
Module (make a dictionary and an embedding)
Function (make a dictionary and an embedding)
Parameters (make a dictionary and an embedding - see details later)
Since this is a very specific application, I believe it would be best to keep the embeddings trainable (the whole point of the embeddings is to create meaningful vectors, and the meanings depend a lot on the model that is going to use them. Question: how did you create the word2vec model? From what data does it learn?).
This model would have more inputs. All of them as integers from zero to max dictionary index. Consider using mask_zero=True and padding all files to maxFileLines.
moduleInput = Input(maxFileLines,)
functionInput = Input(maxFileLines,)
For the parameters, I'd probably make a subsequence as if the list of parameters were a sentence. (Again, mask_zero=True, and pad up to maxNumberOfParameters)
parametersInput = Input(maxFileLines, maxNumberOfParameters)
Function and module embeddings:
moduleEmb = Embedding(.....mask_zero=True,)(moduleInput)
functionEmb = Embedding(.....mask_zero=True)(functionInput)
Now, for the parameters, I though of creating a sequence of sequences (maybe this is too much). For that, I first transfer the lines dimension to the batch dimension and work with only length = maxNumberOfParameters:
paramEmb = Lambda(lambda x: K.reshape(x,(-1,maxNumberOfParameters)))(parametersInput)
paramEmb = Embedding(....,mask_zero=True)(paramEmb)
paramEmb = Lambda(lambda x: K.reshape(x,(-1,maxFileLines,embeddingSize)))(paramEmb)
Now we concatenate all of them in the last dimension and we're ready to get into the LSTMs:
joinedEmbeddings = Concatenate()([moduleEmb,functoinEmb,paramEmb])
out = LSTM(...)(joinedEmbeddings)
out = ......
model = Model([moduleInput,functionInput,parametersInput], out)
How to prepare the inputs
With this model, you need three separate inputs. One for the module, one for the function and one for the parameters.
These inputs will contain only indices (no vectors). And they don't need a previous word2vec model. Embeddings are word2vec transformers.
So, get the file lines and split. First we split by commas, then we split the API calls by spaces:
import numpy as np
#read the file
loadedFile = open(fileName,'r')
allLines = [l.strip() for l in loadedFile.readlines()]
loadedFile.close()
#split by commas
splitLines = []
for l in allLines[1:]: #use 1 here only if you have headers in the file
splitLines.append (l.split(','))
splitLines = np.array(splitLines)
#get the split values and separate ids, targets and calls
ids = splitLines[:,0]
targets = splitLines[:,1]
calls = splitLines[:,2]
#split the calls by space, adding dummy parameters (spaces) to the max length
splitCalls = []
for c in calls:
splitC = c.strip().split(' ')
#pad the parameters (space for dummy params)
for i in range(len(splitC),maxParams+2):
splitC.append(' ')
splitCalls.append(splitC)
splitCalls = np.array(splitCalls)
modules = splitCalls[:,0]
functions = splitCalls[:,1]
parameters = splitCalls[:,2:] #notice the parameters have an extra dimension
Now lets make the indices:
modIndices, modCounts = np.unique(modules,return_counts=True)
funcIndices, funcCounts = np.unique(functions,return_counts=True)
#for de parameters, let's flatten the array first (because we have 2 dimensions)
flatParams = parameters.reshape((parameters.shape[0]*parameters.shape[1],))
paramIndices, paramCounts = np.unique(flatParams,return_counts=True)
These will create a list of unique words and get their counts. Here you can customize which words you're going to group in "another word" class. (Maybe based on the counts, if the count is too little, make it an "another word").
Let's then make the dictionaries:
def createDic(uniqueWords):
dic = {}
for i,word in enumerate(uniqueWords):
dic[word] = i + 1 # +1 because we want to reserve the zeros for padding
return dic
Just take care with the parameters, because we used a dummy space there:
moduleDic = createDic(modIndices)
funcDic = createDic(funcIndices)
paramDic = createDic(paramIndices[1:]) #make sure the space got the first position here
paramDic[' '] = 0
Well, now we just replace the original values:
moduleData = [moduleDic[word] for word in modules]
funcData = [funcDic[word] for word in functions]
paramData = [[paramDic[word] for word in paramLine] for paramLine in parameters]
Pad them:
for i in range(len(moduleData),maxFileLines):
moduleData.append(0)
funcData.append(0)
paramData.append([0] * maxParams)
Do this for every file, and store in a list of files:
moduleTrainData = []
functionTrainData = []
paramTrainData = []
for each file do the above and:
moduleTrainData.append(moduleData)
functionTrainData.append(funcData)
paramTrainData.append(paramData)
moduleTrainData = np.asarray(moduleTrainData)
functionTrainData = np.asarray(functionTrainData)
paramTrainData = np.asarray(paramTrainData)
That's all for the inputs.
model.fit([moduleTrainData,functionTrainData,paramTrainData],outputLabels,...)

How can I efficiently replace the last row of a rank-2 tensor with zeros?

Let us say that I have a rank-2 tensor (a matrix). I want fill the last row of this pre-existing matrix with zeros. I would not like tensorflow to copy the whole matrix in a new place, because it is huge. Is it possible to do?
The answer is based on David Parks' suggestion to look into this thread:
How to do slice assignment in Tensorflow
Using this answer I have arrived at the exact solution to my problem:
a = tf.Variable(tf.ones([10, 36, 36]))
value = tf.zeros([36, 36])
d = tf.scatter_update(a, 9 , value)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print a.eval(session=sess)
sess.run(d)
print a.eval(session=sess)

What does tf.gather_nd intuitively do?

Can you intuitively explain or give more examples about tf.gather_nd for indexing and slicing into high-dimensional tensors in Tensorflow?
I read the API, but it is kept quite concise that I find myself hard to follow the function's concept.
Ok, so think about it like this:
You are providing a list of index values to index the provided tensor to get those slices. The first dimension of the indices you provide is for each index you will perform. Let's pretend that tensor is just a list of lists.
[[0]] means you want to get one specific slice(list) at index 0 in the provided tensor. Just like this:
[tensor[0]]
[[0], [1]] means you want get two specific slices at indices 0 and 1 like this:
[tensor[0], tensor[1]]
Now what if tensor is more than one dimensions? We do the same thing:
[[0, 0]] means you want to get one slice at index [0,0] of the 0-th list. Like this:
[tensor[0][0]]
[[0, 1], [2, 3]] means you want return two slices at the indices and dimensions provided. Like this:
[tensor[0][1], tensor[2][3]]
I hope that makes sense. I tried using Python indexing to help explain how it would look in Python to do this to a list of lists.
You provide a tensor and indices representing locations in that tensor. It returns the elements of the tensor corresponding to the indices you provide.
EDIT: An example
import tensorflow as tf
sess = tf.Session()
x = [[1,2,3],[4,5,6]]
y = tf.gather_nd(x, [[1,1],[1,2]])
print(sess.run(y))
[5, 6]

assign certain entries of Tensor, like set_subtensor of Theano

Can I just assign values to certain entries in a tensor? I got this problems when I compute the cross correlation matrix of a NxP feature matrix feats, where N is observations and P is dimension. Some columns are constant so the standard deviation is zero, and I don't want to devide by std for those constant column. Here is what I did:
fmean, fvar = tf.nn.moments(feats, axes = [0], keep_dims = False)
fstd = tf.sqrt(fvar)
feats = feats - fmean
sel = (fstd != 0)
feats[:, sel] = feats[:, sel]/ fstd[sel]
corr = tf.matmul(tf.transpose(feats), feats)
However, I got this error: TypeError: 'Tensor' object does not support item assignment. Is there any workaround for such issue?
You can make your feats a tf.Variable and use tf.scatter_update to update locations selectively.
It's a bit awkward in that scatter_update needs a list of linear indices to update, so you'd need to convert your [:, sel] implicit 2D specification into explicit list of 1D indices. There's example of constructing 1D indices from 2D here
There's some work in simplifying this kind of use-case in issue #206