Accessing learned weights of a DNN in CNTK - cntk

How can one access to the learned weights of a DNN saved as following:
lstm_network_output.save(model_path)

The weights/parameters of a network can be accessed by calling ‘lstm_network_output.parameters’ which returns a list of ‘Parameter’ variable objects. The value of a Parameter can be obtained using ‘value’ property of the Parameter object in the form of a numpy array. The value of the Parameter can be updated by ‘.value = ’.

If you used name= properties in creating your model, you can also identify layers by name. For example:
model = Sequential([Embedding(300, name='embed'), Recurrence(LSTM(500)), Dense(10)])
E = model.embed.E # accesses the embedding matrix of the embed layer
To know that the parameter is .E, please consult the docstring of the respective function (e.g. help(Embedding)). (In Dense and Convolution, the parameters would be .W and .b.)
The pattern above is for named layers, which are created using as_block(). You can also name intermediate variables, and access them in the same way. E.g.:
W = Parameter((13,42), init=0, name='W')
x = Input(13)
y = times(x, W, name='times1')
W_recovered = y.times1.W
# e.g. check the shape to see that they are the same
W_recovered.shape # --> (13, 42)
W.shape # --> (13, 42)
Technically, this will search all parameters that feed y. In case of a more complex network, you may end up having multiple parameters of the same name. Then an error will be thrown due to the ambiguity. In that case, you must work the .parameters tuple mentioned in Anna's response.

This python code worked for me to visualize some weights:
import numpy as np
import cntk as C
dnnFile = C.cntk_py.Function.load('Models\ConvNet_MNIST_5.dnn') # load model from MS example
layer8 = dnnFile.parameters()[8].value()
filter_num = 0
sliced = layer8.asarray()[ filter_num ][ 0 ] # shows filter works on input image
print(sliced)

Related

TensorFlow Federated - Loading and preprocessing data on a remote client

Part of the simulation program that I am working on allows clients to load local data from their device without the server being able to access that data.
Following the idea from this post, I have the following code configured to assign the client a path to load the data from. Although the data is in svmlight format, loading it line-by-line can still allow it to be preprocessed afterwards.
client_paths = {
'client_0': '<path_here>',
'client_1': '<path_here>',
}
def create_tf_dataset_for_client_fn(id):
path = client_paths.get(id)
data = tf.data.TextLineDataset(path)
path_source = tff.simulation.datasets.ClientData.from_clients_and_fn(client_paths.keys(), create_tf_dataset_for_client_fn)
The code above allows a path to be loaded during runtime from the remote client's-side by the following line of code.
data = path_source.create_tf_dataset_for_client('client_0')
Here, the data variable can be iterated through and can be used to display the contents on the client on the remote device when calling tf.print(). But, I need to preprocess this data into an appropriate format before continuing. I am presently attempting to convert this from a string Tensor in svmlight format into a SparseTensor of the appropriate format.
The issue is that, although the defined preprocessing method works in a standalone scenario (i.e. when defined as a function and tested on a manually defined Tensor of the same format), it fails when the code is executed during the client update #tf.function in the tff algorithm. Below is the specified error when executing the notebook cell which contains a #tff.tf_computation function which calls an #tf.function which does the preprocessing and retrieves the data.
ValueError: Shape must be rank 1 but is rank 0 for '{{node Reshape_2}} = Reshape[T=DT_INT64, Tshape=DT_INT32](StringToNumber_1, Reshape_2/shape)' with input shapes: [?,?], [].
Since the issue occurs when executing the client's #tff.tf_computation update function which calls the #tf.function with the preprocessing code, I am wondering how I can allow the function to perform the preprocessing on the data without errors. I assume that if I can just get the functions to properly be run when defined that when called remotely it will work.
Any ideas on how to address this issue? Thank you for your help!
For reference, the preprocessing function uses tf computations to manipulate the data. Although not optimal yet, below is the code presently being used. This is inspired from this link on string_split examples. I have extracted the code to put directly into the client's #tf.function after loading the TextLineDataset as well, but this also fails.
def decode_libsvm(line):
# Split the line into columns, delimiting by a blank space
cols = tf.strings.split([line], ' ')
# Retrieve the labels from the first column as an integer
labels = tf.strings.to_number(cols.values[0], out_type=tf.int32)
# Split all column pairs
splits = tf.strings.split(cols.values[1:], ':')
# Convert splits into a sparse matrix to retrieve all needed properties
splits = splits.to_sparse()
# Reshape the tensor for further processing
id_vals = tf.reshape(splits.values, splits.dense_shape)
# Retrieve the indices and values within two separate tensors
feat_ids, feat_vals = tf.split(id_vals, num_or_size_splits=2, axis=1)
# Convert the indices into int64 numbers
feat_ids = tf.strings.to_number(feat_ids, out_type=tf.int64)
# To reload within a SparseTensor, add a dimension to feat_ids with a default value of 0
feat_ids = tf.reshape(feat_ids, -1)
feat_ids = tf.expand_dims(feat_ids, 1)
feat_ids = tf.pad(feat_ids, [[0,0], [0,1]], constant_values=0)
# Extract and flatten the values
feat_vals = tf.strings.to_number(feat_vals, out_type=tf.float32)
feat_vals = tf.reshape(feat_vals, -1)
# Configure a SparseTensor to contain the indices and values
sparse_output = tf.SparseTensor(indices=feat_ids, values=feat_vals, dense_shape=[1, <shape>])
return {"x": sparse_output, "y": labels}
Update (Fix)
Following the advice from Jakub's comment, the issue was fixed by enclosing the reshape and expand_dim calls in [], when needed. Now there is no issue running the code within tff.
def decode_libsvm(line):
# Split the line into columns, delimiting by a blank space
cols = tf.strings.split([line], ' ')
# Retrieve the labels from the first column as an integer
labels = tf.strings.to_number(cols.values[0], out_type=tf.int32)
# Split all column pairs
splits = tf.strings.split(cols.values[1:], ':')
# Convert splits into a sparse matrix to retrieve all needed properties
splits = splits.to_sparse()
# Reshape the tensor for further processing
id_vals = tf.reshape(splits.values, splits.dense_shape)
# Retrieve the indices and values within two separate tensors
feat_ids, feat_vals = tf.split(id_vals, num_or_size_splits=2, axis=1)
# Convert the indices into int64 numbers
feat_ids = tf.strings.to_number(feat_ids, out_type=tf.int64)
# To reload within a SparseTensor, add a dimension to feat_ids with a default value of 0
feat_ids = tf.reshape(feat_ids, [-1])
feat_ids = tf.expand_dims(feat_ids, [1])
feat_ids = tf.pad(feat_ids, [[0,0], [0,1]], constant_values=0)
# Extract and flatten the values
feat_vals = tf.strings.to_number(feat_vals, out_type=tf.float32)
feat_vals = tf.reshape(feat_vals, [-1])
# Configure a SparseTensor to contain the indices and values
sparse_output = tf.SparseTensor(indices=feat_ids, values=feat_vals, dense_shape=[1, <shape>])
return {"x": sparse_output, "y": labels}

Testing on some basic example in trying to better understand about .padded_batch in TensorFlow

I have a data a very simple one to test on my understanding about the usage of tf.padded_batch
text file is saved as .txt format:
test = "I use tensorflow for this data\n
I will be testing\n
The current tensorflow data
Please do mark that I am using tensorflow version 2.0 so I do not need to use tf.Session to initialize my variables
dataset = tf.data.TextLineDataset("test.txt")
dataset = dataset.map(lambda string: tf.string_split([string]).values)
dataset = dataset.padded_batch(2)
for x in dataset:
print(x.numpy())
Error that I received:
TypeError: padded_batch() missing 1 required positional argument: 'padded_shapes'
Expected output:
[[b'I' b'use' b'tensorflow' b'for' b'this' b'data']
[b'I' b'will' b'be' b'testing' b'unknown' b'unknown']]
[[b'The' b'current' b'tensorflow' b'data' b'unknown' b'unknown']]
How should I configure my padded_shapes and also padded_values? I wish to make the length of the tensor to be the same by insert "unknown" for each empty element. (This might be a little confused by above shows my expected results.)
Please note that tf.data.Dataset().dataset.padded_batch expects the shape of your inputs, and in your case, since you want the padded value to be "unknown" the padding value that you will use. Below is the code snipped you want to use.
dataset = tf.data.TextLineDataset("test.txt")
dataset = dataset.map(lambda string: tf.string_split([string]).values)
dataset = dataset.padded_batch(3, padded_shapes=[None], padding_values="unknown")
for x in dataset:
print(x.numpy())
# [[b'I' b'use' b'tensorflow' b'for' b'this' b'data']
# [b'I' b'will' b'be' b'testing' b'unknown' b'unknown']
# [b'The' b'current' b'tensorflow' b'data' b'unknown' b'unknown']]

Tensorflow/Keras, How to convert tf.feature_column into input tensors?

I have the following code to average embeddings for list of item-ids.
(Embedding is trained on review_meta_id_input, and used as look up for pirors_input and for getting average embedding)
review_meta_id_input = tf.keras.layers.Input(shape=(1,), dtype='int32', name='review_meta_id')
priors_input = tf.keras.layers.Input(shape=(None,), dtype='int32', name='priors') # array of ids
item_embedding_layer = tf.keras.layers.Embedding(
input_dim=100, # max number
output_dim=self.item_embedding_size,
name='item')
review_meta_id_embedding = item_embedding_layer(review_meta_id_input)
selected = tf.nn.embedding_lookup(review_meta_id_embedding, priors_input)
non_zero_count = tf.cast(tf.math.count_nonzero(priors_input, axis=1), tf.float32)
embedding_sum = tf.reduce_sum(selected, axis=1)
item_average = tf.math.divide(embedding_sum, non_zero_count)
I also have some feature columns such as..
(I just thought feature_column looked cool, but not many documents to look for..)
kid_youngest_month = feature_column.numeric_column("kid_youngest_month")
kid_age_youngest_buckets = feature_column.bucketized_column(kid_youngest_month, boundaries=[12, 24, 36, 72, 96])
I'd like to define [review_meta_id_iput, priors_input, (tensors from feature_columns)] as an input to keras Model.
something like:
inputs = [review_meta_id_input, priors_input] + feature_layer
model = tf.keras.models.Model(inputs=inputs, outputs=o)
In order to get tensors from feature columns, the closest lead I have now is
fc_to_tensor = {fc: input_layer(features, [fc]) for fc in feature_columns}
from https://github.com/tensorflow/tensorflow/issues/17170
However I'm not sure what the features are in the code.
There's no clear example on https://www.tensorflow.org/api_docs/python/tf/feature_column/input_layer either.
How should I construct the features variable for fc_to_tensor ?
Or is there a way to use keras.layers.Input and feature_column at the same time?
Or is there an alternative than tf.feature_column to do the bucketing as above? then I'll just drop the feature_column for now;
The behavior you desire could be achieved through following steps.
This works in TF 2.0.0-beta1, but may being changed or even simplified in further reseases.
Please check out issue in TensorFlow github repository Unable to use FeatureColumn with Keras Functional API #27416. There you will find the more general example and useful comments about tf.feature_column and Keras Functional API.
Meanwhile, based on the code in your question the input tensor for feature_column could be get like this:
# This you have defined feauture column
kid_youngest_month = feature_column.numeric_column("kid_youngest_month")
kid_age_youngest_buckets = feature_column.bucketized_column(kid_youngest_month, boundaries=[12, 24, 36, 72, 96])
# Then define layer
feature_layer = tf.keras.layers.DenseFeatures(kid_age_youngest_buckets)
# The inputs for DenseFeature layer should be define for each original feature column as dictionary, where
# keys - names of feature columns
# values - tf.keras.Input with shape =(1,), name='name_of_feature_column', dtype - actual type of original column
feature_layer_inputs = {}
feature_layer_inputs['kid_youngest_month'] = tf.keras.Input(shape=(1,), name='kid_youngest_month', dtype=tf.int8)
# Then you can collect inputs of other layers and feature_layer_inputs into one list
inputs=[review_meta_id_input, priors_input, [v for v in feature_layer_inputs.values()]]
# Then define outputs of this DenseFeature layer
feature_layer_outputs = feature_layer(feature_layer_inputs)
# And pass them into other layer like any other
x = tf.keras.layers.Dense(256, activation='relu')(feature_layer_outputs)
# Or maybe concatenate them with outputs from your others layers
combined = tf.keras.layers.concatenate([x, feature_layer_outputs])
#And probably you will finish with last output layer, maybe like this for calssification
o=tf.keras.layers.Dense(classes_number, activation='softmax', name='sequential_output')(combined)
#So you pass to the model:
model_combined = tf.keras.models.Model(inputs=[s_inputs, [v for v in feature_layer_inputs.values()]], outputs=o)
Also note. In model fit() method you should pass info which data sould be used for each input.
One way, if you use tf.data.Dataset, take care that you have used the same names for features in Dataset and for keys in feature_layer_inputs dictionary
Other way use explicite notation like:
model.fit({'review_meta_id_input': review_meta_id_data, 'priors_input': priors_data, 'kid_youngest_month': kid_youngest_month_data},
{'outputs': o},
...
)

Word2Vec + LSTM on API Sequence

I am trying to apply word2Vec and LSTM on a dataset that contains files' API trace log including API function calls and their parameters for a binary classification.
The data looks like:
File_ID, Label, API Trace log
1, M, kernel32 LoadLibraryA kernel32.dll
kernel32 GetProcAddress MZ\x90 ExitProcess
...
2, V, kernel32 GetModuleHandleA RPCRT4.dll
kernel32 GetCurrentThreadId d\x8B\x0D0 POINTER POINTER
...
The API trace including: module name, API function name, parameters (that separated by blank space)
Take first API trace of file 1 as example, kernel32 is the module name, LoadLibraryA is function name, kernel32.dll is parameter. Each API trace is separated by \n so that each line represents a API sequence information sequentially.
Firstly I trained a word2vec model based on the line sentence of all API trace log. There are about 5k API function calls, e.g. LoadLibraryA, GetProcAddress. However, because parameter value could be vary, the model becomes quite big (with 300,000 vocabulary) after including those parameters.
After that, I trained a LSTM by applying word2vec's embedding_wrights, the model structure looks like:
model = Sequential()
model.add(Embedding(output_dim=vocab_dim, input_dim=n_symbols, \
mask_zero=False, weights=[embedding_weights], \
trainable=False))
model.add(LSTM(dense_dim,kernel_initializer='he_normal', dropout=0.15,
recurrent_dropout=0.15, implementation=2))
model.add(Dropout(0.3))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=batch_size, callbacks=[early_stopping, parallel_check_cb])
The way I get embedding_weights is to create a matrix, for each vocabulary in word2vec model, map the index of the word in the model, to it's vector
def create_embedding_weights(model, max_index=0):
# dimensionality of your word vectors
num_features = len(model[model.vocab.keys()[0]])
n_symbols = len(model.vocab) + 1 # adding 1 to account for 0th index (for masking)
# Only word2vec feature set
embedding_weights = np.zeros((max(n_symbols + 1, max_index + 1), num_features))
for word, value in model.vocab.items():
embedding_weights[value.index, :] = model[word]
return embedding_weights
For training data, what I did is that for each word in API call, convert the actual word to the index of word2vec model so that it's consistent to the index in embedding_weights above. e.g. kernel32 -> 0, LoadLibraryA -> 1, kernel32.dll -> 2. GetProcAddress -> 4, MZ\x90 -> 5, ExitProcess ->6
So the train data for file 1 looks like [0, 1, 2, 3, 4, 5, 6]. Noted, I didn't do line split for each API trace. As a result, the model may not know where is the start and end of API trace? And the training accuracy of the model is pretty bad - accuracy is 50% :(
My question is that, when prepare the training and validation dataset, should I also split the line when mapping the actual words to their index? then The above training data would be changed to following, each API trace is separated by a line, and maybe padd the missing value to -1 which doesn't exist in word2vec's indexes.
[[0, 1, 2, -1]
[3, 4, 5, 6]]
Meanwhile I am using a very simple structure for training, while word2vec model is quite big, any suggestion on structure would also be appreciated.
I would at least split the trace lines in three:
Module (make a dictionary and an embedding)
Function (make a dictionary and an embedding)
Parameters (make a dictionary and an embedding - see details later)
Since this is a very specific application, I believe it would be best to keep the embeddings trainable (the whole point of the embeddings is to create meaningful vectors, and the meanings depend a lot on the model that is going to use them. Question: how did you create the word2vec model? From what data does it learn?).
This model would have more inputs. All of them as integers from zero to max dictionary index. Consider using mask_zero=True and padding all files to maxFileLines.
moduleInput = Input(maxFileLines,)
functionInput = Input(maxFileLines,)
For the parameters, I'd probably make a subsequence as if the list of parameters were a sentence. (Again, mask_zero=True, and pad up to maxNumberOfParameters)
parametersInput = Input(maxFileLines, maxNumberOfParameters)
Function and module embeddings:
moduleEmb = Embedding(.....mask_zero=True,)(moduleInput)
functionEmb = Embedding(.....mask_zero=True)(functionInput)
Now, for the parameters, I though of creating a sequence of sequences (maybe this is too much). For that, I first transfer the lines dimension to the batch dimension and work with only length = maxNumberOfParameters:
paramEmb = Lambda(lambda x: K.reshape(x,(-1,maxNumberOfParameters)))(parametersInput)
paramEmb = Embedding(....,mask_zero=True)(paramEmb)
paramEmb = Lambda(lambda x: K.reshape(x,(-1,maxFileLines,embeddingSize)))(paramEmb)
Now we concatenate all of them in the last dimension and we're ready to get into the LSTMs:
joinedEmbeddings = Concatenate()([moduleEmb,functoinEmb,paramEmb])
out = LSTM(...)(joinedEmbeddings)
out = ......
model = Model([moduleInput,functionInput,parametersInput], out)
How to prepare the inputs
With this model, you need three separate inputs. One for the module, one for the function and one for the parameters.
These inputs will contain only indices (no vectors). And they don't need a previous word2vec model. Embeddings are word2vec transformers.
So, get the file lines and split. First we split by commas, then we split the API calls by spaces:
import numpy as np
#read the file
loadedFile = open(fileName,'r')
allLines = [l.strip() for l in loadedFile.readlines()]
loadedFile.close()
#split by commas
splitLines = []
for l in allLines[1:]: #use 1 here only if you have headers in the file
splitLines.append (l.split(','))
splitLines = np.array(splitLines)
#get the split values and separate ids, targets and calls
ids = splitLines[:,0]
targets = splitLines[:,1]
calls = splitLines[:,2]
#split the calls by space, adding dummy parameters (spaces) to the max length
splitCalls = []
for c in calls:
splitC = c.strip().split(' ')
#pad the parameters (space for dummy params)
for i in range(len(splitC),maxParams+2):
splitC.append(' ')
splitCalls.append(splitC)
splitCalls = np.array(splitCalls)
modules = splitCalls[:,0]
functions = splitCalls[:,1]
parameters = splitCalls[:,2:] #notice the parameters have an extra dimension
Now lets make the indices:
modIndices, modCounts = np.unique(modules,return_counts=True)
funcIndices, funcCounts = np.unique(functions,return_counts=True)
#for de parameters, let's flatten the array first (because we have 2 dimensions)
flatParams = parameters.reshape((parameters.shape[0]*parameters.shape[1],))
paramIndices, paramCounts = np.unique(flatParams,return_counts=True)
These will create a list of unique words and get their counts. Here you can customize which words you're going to group in "another word" class. (Maybe based on the counts, if the count is too little, make it an "another word").
Let's then make the dictionaries:
def createDic(uniqueWords):
dic = {}
for i,word in enumerate(uniqueWords):
dic[word] = i + 1 # +1 because we want to reserve the zeros for padding
return dic
Just take care with the parameters, because we used a dummy space there:
moduleDic = createDic(modIndices)
funcDic = createDic(funcIndices)
paramDic = createDic(paramIndices[1:]) #make sure the space got the first position here
paramDic[' '] = 0
Well, now we just replace the original values:
moduleData = [moduleDic[word] for word in modules]
funcData = [funcDic[word] for word in functions]
paramData = [[paramDic[word] for word in paramLine] for paramLine in parameters]
Pad them:
for i in range(len(moduleData),maxFileLines):
moduleData.append(0)
funcData.append(0)
paramData.append([0] * maxParams)
Do this for every file, and store in a list of files:
moduleTrainData = []
functionTrainData = []
paramTrainData = []
for each file do the above and:
moduleTrainData.append(moduleData)
functionTrainData.append(funcData)
paramTrainData.append(paramData)
moduleTrainData = np.asarray(moduleTrainData)
functionTrainData = np.asarray(functionTrainData)
paramTrainData = np.asarray(paramTrainData)
That's all for the inputs.
model.fit([moduleTrainData,functionTrainData,paramTrainData],outputLabels,...)

How to "append" Op at the beginning of a TensorFlow graph?

I have a GraphDef proto file which I am importing using tf.import_graph_def. Ops can be added at the end of the graph like this:
final_tensor = tf.import_graph_def(graph_def, name='', return_elements=['final_tensor'])
new_tensor = some_op(final_tensor)
But I want to add Ops at the beginning of the graph, so essentially the first Op in the graph_def needs to take the output of my Op as input, how do I do it?
Finally found a way to do this. I am sure the function Yarolsav mentioned in the comments does something similar internally.
new_input = graph_def.node.add()
new_input.op = 'new_op_name' # eg: 'Const', 'Placeholder', 'Add' etc
new_input.name = 'some_new_name'
# set any attributes you want for new_input here
old_input.input[0] = 'some_new_name' # must match with the name above
For details about how to set the attributes, see this file.
The script #Priyatham gives in the link is a good example how to add node in tf graph_def. name, op, input, attr are 4 required elements. name and op could be assigned, whereas input should use extend and attr should use CopyFrom method for assignment, like:
new_node = graph_def.node.add()
new_node.op = "Cast"
new_node.name = "To_Float"
new_node.input.extend(["To_Float"])
new_node.attr["DstT"].CopyFrom(attr_value_pb2.AttrValue(type=types_pb2.DT_FLOAT))
new_node.attr["SrcT"].CopyFrom(attr_value_pb2.AttrValue(type=types_pb2.DT_FLOAT))
new_node.attr["Truncate"].CopyFrom(attr_value_pb2.AttrValue(b=True))