How to implement the tensor product of two layers in Keras/Tf - tensorflow

I'm trying to set up a DNN for classification and at one point I want to take the tensor product of a vector with itself. I'm using the Keras functional API at the moment but it isn't immediately clear that there is a layer that does this already.
I've been attempting to use a Lambda layer and numpy in order to try this, but it's not working.
Doing a bit of googling reveals
tf.linalg.LinearOperatorKronecker, which does not seem to work either.
Here's what I've tried:
I have a layer called part_layer whose output is a single vector (rank one tensor).
keras.layers.Lambda(lambda x_array: np.outer(x_array, x_array),) ( part_layer) )
Ideally I would want this to to take a vector of the form [1,2] and give me [[1,2],[2,4]].
But the error I'm getting suggests that the np.outer function is not recognizing its arguments:
AttributeError: 'numpy.ndarray' object has no attribute '_keras_history
Any ideas on what to try next, or if there is a simple function to use?

You can use two operations:
If you want to consider the batch size you can use the Dot function
Otherwise, you can use the the dot function
In both case the code should look like this:
dot_lambda = lambda x_array: tf.keras.layers.dot(x_array, x_array)
# dot_lambda = lambda x_array: tf.keras.layers.Dot(x_array, x_array)
keras.layers.Lambda(dot_lamda)( part_layer)
Hope this help.

Use tf.tensordot(x_array, x_array, axes=0) to achieve what you want. For example, the expression print(tf.tensordot([1,2], [1,2], axes=0)) gives the desired result: [[1,2],[2,4]].
Keras/Tensorflow needs to keep an history of operations applied to tensors to perform the optimization. Numpy has no notion of history, so using it in the middle of a layer is not allowed. tf.tensordot performs the same operation, but keeps the history.

Related

TensorFlow: how find out that tensor already created

I'm trying to create big tensor with shape 42000x28x28, data comes as a ArrayBuffer, let's assume that I create tensor something like that:
tf.tensor4d(new Uint8Array(xsTrainBuffer), [42000 * 0.9, 28, 28, 1]).div<Tensor<Rank.R4>>(255);
The question is - how find out that tensor is already prepared and I may use it in model training, since this function doesn't return any promise, but as for me it looks a little bit weird consider this operation as synchronous one.
I believe a better approach would be the tf.data.array() function. This will create a DataSet for you that is ideal for working with large amounts of data.
While the function itself is synchronous, it does a lazy loading of the data - so immediately after you call the function, if you try to log the data, it will show empty. However, use the forEachAsync() function to log the data and you will see that it reads it correctly.
Check out this iris-fitDataset example where they use the tf.data.array() function.
Note that Dataset's have their own model functions such as fitDataset() instead of fit() so you don't have to convert the Dataset into anything else.

Customized aggregation algorithm for gradient updates in tensorflow federated

I have been trying to implement this paper . Basically what I want to do is sum the per client loss and compare the same with previous epoch. Then for each constituent layer of the model compare the KL divergence between the weights of the server and the client model to get the layer specific parameter updates and then doing a softmax and to decide whether an adaptive update or a normal FedAvg approach is needed.
The algorithm is as follows-
FedMed
I tried to make use of the code here to build a custom federated avg process. I got the basic understanding that there are some tf.computations and some tff.computations which are involved. I get that I need to make changes in the orchestration logic in the run_one_round function and basically manipulate the client outputs to do adaptive averaging instead of the vanilla federated averaging. The client_update tf.computation function basically returns all the values that I need i.e the weights_delta (can be used for client based model weights), model_output(which can be used to calculate the loss).
But I am not sure where exactly I should make the changes.
#tff.federated_computation(federated_server_state_type,
federated_dataset_type)
def run_one_round(server_state, federated_dataset):
server_message = tff.federated_map(server_message_fn, server_state)
server_message_at_client = tff.federated_broadcast(server_message)
client_outputs = tff.federated_map(
client_update_fn, (federated_dataset, server_message_at_client))
weight_denom = client_outputs.client_weight
# todo
# instead of using tff.federated_mean I wish to do a adaptive aggregation based on the client_outputs.weights_delta and server_state model
round_model_delta = tff.federated_mean(
client_outputs.weights_delta, weight=weight_denom)
#client_outputs.weights_delta has all the client model weights.
#client_outputs.client_weight has the number of examples per client.
#client_outputs.model_output has the output of the model per client example.
I want to make use of the server model weights using server_state object.
I want to calculate the KL divergence between the weights of server model and each client's model per layer. Then use a relative weight to aggregate the client weights instead of vanilla federated averaging.
Instead of using tff.federated_mean I wish to use a different strategy basically an adaptive one based on the algorithm above.
So I needed some suggestions on how to go about implementing this.
Basically what I want to do is :
1)Sum all the values of client losses.
2)Calculate the KL divergence per layerbasis of all the clients with server and then determine whether to use adaptive optimization or FedAvg.
Also is there a way to manipulate this value as a python value which will be helpful for debugging purposes( I tried to use tf.print but that was not helpful either). Thanks!
Simplest option: compute weights for mean on clients
If I read the algorithm above correctly, we need only compute some weights for a mean on-the-fly. tff.federated_mean accepts an optional CLIENTS-placed weight argument, so probably the simplest option here is to compute the desired weights on the clients and pass them in to the mean.
This would look something like (assuming the appropriate definitions of the variables used below, which we will comment on):
#tff.federated_computation(...)
def round_function(...):
...
# We assume there is a tff.Computation training_fn that performs training,
# and we're calling it here on the correct arguments
trained_clients = tff.federated_map(training_fn, clients_placed_arguments)
# Next we assume there is a variable in-scope server_model,
# representing the 'current global model'.
global_model_at_clients = tff.federated_broadcast(server_model)
# Here we assume a function compute_kl_divergence, which takes
# two structures of tensors and computes the KL divergence
# (as a scalar) between them. The two arguments here are clients-placed,
# so the result will be as well.
kl_div_at_clients = tff.federated_map(compute_kl_divergence,
(global_model_at_clients, trained_clients))
# Perhaps we wish to not use raw KL divergence as the weight, but rather
# some function thereof; if so, we map a postprocessing function to
# the computed divergences. The result will still be clients-placed.
mean_weight = tff.federated_map(postprocess_divergence, kl_div_at_clients)
# Now we simply use the computed weights in the mean.
return tff.federated_mean(trained_clients, weight=mean_weight)
More flexible tool: tff.federated_reduce
TFF generally encourages algorithm developers to implement whatever they can 'in the aggregation', and as such exposes some highly customizable primitives like tff.federated_reduce, which allow you to run arbitrary TensorFlow "in the stream" between clients and server. If the above reading of the desired algorithm is incorrect and something more involved is needed, or you wish to flexibly experiment with totally different notions of aggregation (something TFF encourages and is designed to support), this may be the option for you.
In TFF's heuristic typing language, tff.federated_reduce has signature:
<{T}#CLIENTS, U, (<U, T> -> U)> -> U#SERVER
Meaning, federated_reduce take a value of type T placed at the clients, a 'zero' in a reduction algebra of type U, and a function accepting a U and a T and producing a U, and applies this function 'in the stream' on the way between clients and server, producing a U placed at the server. The function (<U, T> -> U) will be applied to the partially accumulated value U, and the 'next' element in the stream T (note however that TFF does not guarantee ordering of these values), returning another partially accumulated value U. The 'zero' should represent whatever 'partially accumulated' means over the empty set in your application; this will be the starting point of the reduction.
Application to this problem
The components
Your reduction function needs access to two pieces of data: the global model state and the result of training on a given client. This maps quite nicely to the type T. In this application, we will have something like:
T = <server_model=server_model_type, trained_model=trained_model_type>
These two types are likely to be the same, but may not necessarily be so.
Your reduction function will accept the partial aggregate, your server model and your client-trained model, returning a new partial aggregate. Here we will start assuming the same reading of the algorithm as above, that of a weighted mean with particular weights. Generally, the easiest way to compute a mean is to keep two accumulators, one for numerator and one for denominator. This will affect the choice of zero and reduction function below.
Your zero should contain a structure of tensors with value 0 mapping to the weights of your model--this will be the numerator. This would be generated for you if you had an aggregation like tff.federated_sum (as TFF knows what the zero should be), but for this case you'll have to get your hands on such a tensor yourself. This shouldn't be too hard with tf.nest.map_structure and tf.zeros_like.
For the denominator, we will assume we just need a scalar. TFF and TF are much more flexible than this--you could keep a per-layer or per-parameter denominator if desired--but for simplicity we will assume that we just want to divide by a single float in the end.
Therefore our type U will be something like:
U = <numerator=server_model_type, denominator=tf.float32>
Finally we come to our reduction function. It will be more or less a different composition of the same pieces above; we will make slightly tighter assumptions about them here (in particular, that all the local functions are tff.tf_computations--a technical assumption, arguably a bug on TFF). Our reduction function will be along the lines (assuming appropriate type aliases):
#tff.tf_computation(U, T)
def reduction(partial_accumulate, next_element):
kl_div = compute_kl_divergence(
next_element.server_model, next_element.trained_model)
weight = postprocess_divergence(kl_div)
new_numerator = partial_accumulate.numerator + weight * next_element.trained_model
new_denominator = partial_accumulate.denominator + weight
return collections.OrderedDict(
numerator=new_numerator, denominator=new_denominator)
Putting them together
The basic outline of a round will be similar to the above; but we have put more computation 'in the stream', and consequently there wil be less on the clients. We assume here the same variable definitions.
#tff.federated_computation(...)
def round_function(...):
...
trained_clients = tff.federated_map(training_fn, clients_placed_arguments)
global_model_at_clients = tff.federated_broadcast(server_model)
# This zip I believe is not necessary, but it helps my mental model.
reduction_arg = tff.federated_zip(
collections.OrderedDict(server_model=global_model_at_clients,
trained_model=trained_clients))
# We assume a zero as specified above
return tff.federated_reduce(reduction_arg,
zero,
reduction)

Where can I find exactly how Tensorflow does matrix multiplication?

For example, I want to do a matrix multiplication, and in doing so, I use the tf.matmul operation inside the tensorflow. And, i want to optimize matrix mulptiplication in tf. However, I cannot reach where the matrix Multiplication is made exactly in tf_matmul. Is there any people who can help me to do this ?
We need to do some code tracing to figure out what is being called and what is happening
1) tensorflow.python.ops.math_ops called via tf.matmul
2) tf.matmul returns either a sparse_matmul (which calls gen_math_ops.sparse_matmul) or gen_math_ops.batch_mat_mul
3) The gen_math_ops script is automatically generated but the underlying code is math_ops.cc
All the best!

What exactly qualifies as a 'Tensor' in TensorFlow?

I am new to TensorFlow and just went through the eager execution tutorial and came across the tf.decode_csv function. Not knowing about it, I read the documentation. https://www.tensorflow.org/api_docs/python/tf/decode_csv
I don't really understand it.
The documentation says 'records: A Tensor of type string.'
So, my question is: What qualifies as a 'Tensor'?
I tried the following code:
dec_res = tf.decode_csv('0.1,0.2,0.3', [[0.0], [0.0], [0.0]])
print(dec_res, type(dec_res))
l = [[1,2,3],[4,5,6],[7,8,9]]
r = tf.reshape(l, [9,-1])
print(l, type(l))
print(r, type(r))
So the list dec_res contains tf.tensor objects. That seems reasonable to me. But is an ordinary string also a 'Tensor' according to the documentation?
Then I tried something else with the tf.reshape function. In the documentation https://www.tensorflow.org/api_docs/python/tf/reshape it says that 'tensor: A Tensor.' So, l is supposed to be a tensor. But it is not of type tf.tensor but simply a python list. This is confusing.
Then the documentation says
Returns:
A Tensor. Has the same type as tensor.
But the type of l is list where the type of r is tensorflow.python.framework.ops.Tensor. So the types are not the same.
Then I thought that TensorFlow is very generous with things being a tensor. So I tried:
class car(object):
def __init__(self, color):
self.color = color
red_car = car('red')
#test_reshape = tf.reshape(red_car, [1, -1])
print(red_car.color) # to check, that red_car exists.
Now, the line in comments results in an error.
So, can anyone help me to find out, what qualifies as a 'Tensor'?
P.S.: I tried to read the source code of tf.reshape as given in the documentation
Defined in tensorflow/python/ops/gen_array_ops.py.
But this file does not exist in the Github repo. Does anyone know how to read it?
https://www.tensorflow.org/programmers_guide/tensors
TensorFlow, as the name indicates, is a framework to define and run
computations involving tensors. A tensor is a generalization of
vectors and matrices to potentially higher dimensions. Internally,
TensorFlow represents tensors as n-dimensional arrays of base
datatypes.
What you are observing commes from the fact that tensorflow operations (like reshape) can be built from various python types using the function tf.convert_to_tensor:
https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor
All standard Python op constructors apply this function to each of
their Tensor-valued inputs, which allows those ops to accept numpy
arrays, Python lists, and scalars in addition to Tensor objects

taking the gradient in Tensorflow, tf.gradient

I am using this function of tensorflow to get my function jacobian. Came across two problems:
The tensorflow documentation is contradicted to itself in the following two paragraph if I am not mistaken:
gradients() adds ops to the graph to output the partial derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.
Blockquote
Blockquote
Returns:
A list of sum(dy/dx) for each x in xs.
Blockquote
According to my test, it is, in fact, return a vector of len(ys) which is the sum(dy/dx) for each x in xs.
I do not understand why they designed it in a way that the return is the sum of the columns(or row, depending on how you define your Jacobian).
How can I really get the Jacobian?
4.In the loss, I need the partial derivative of my function with respect to input (x), but when I am optimizing with respect to the network weights, I define x as a placeholder whose value is fed later, and weights are variable, in this case, can I still define the symbolic derivative of function with respect to input (x)? and put it in the loss? ( which later when we optimize with respect to weights will bring second order derivative of the function.)
I think you are right and there is a typo there, it was probably meant to be "of length len(ys)".
For efficiency. I can't explain exactly the reasoning, but this seems to be a pretty fundamental characteristic of how TensorFlow handles automatic differentiation. See issue #675.
There is no straightforward way to get the Jacobian matrix in TensorFlow. Take a look at this answer and again issue #675. Basically, you need one call to tf.gradients per column/row.
Yes, of course. You can compute whatever gradients you want, there is no real difference between a placeholder and any other operation really. There are a few operations that do not have a gradient because it is not well defined or not implemented (in which case it will generally return 0), but that's all.