Can TensorFlow support spiking neurons? - tensorflow

I looked around for tutorials/articles/examples/... to use spiking neurons (e.g. of the SRM/Spike Response Model type) in TensorFlow, but I could not find anything.
Is it possible to simulate these models in TensorFlow at all?
Can TensorFlow simulate models which explicitely depend on time?
Are there any plug-ins/extensions/data files which can add this capability?
Is the GPU supported?

I was also interested in this problem and have done exactly what Pietro mentioned. i.e. Took a matlab implementation of a simplified Hodgkin-Huxley model and converted it to Tensorflow.
Have a look at https://github.com/jotia1/spiking-net-tensorflow
https://joshuaarnold.com.au/simulating-spiking-nets-in-tensorflow/ for the blog post with some of my thoughts on the whole process. broken link
Interested in hearing your thoughts on it.

Yes tensorflow can implement spiking neuron models. It is a general purpose computation framework.
Is there an implementation available: I don't think so but I have a friend who is interested in this project.
The GPU is supported for many/most of the tensorflow operations. You'll have to check the docs to see which ones are not supported.

As pointed out by Steven, Tensorflow is a computation framework and as such allows implementing any algorithm.
The main difference between Tensorflow and other computation framework like Matlab or numpy/scipy is that it relies on computation graphs: you do not perform the operations directly, but instead build a graph of operations that is later evaluated inside a session.
I was also interested in Spiking neurons and Tensorflow and found that question. As joti, I implemented the same Matlab exercise in Tensorflow (link to my blog post)
Here are for instance two operations defining the membrane and recovery factor increments assuming you provide u, v and i:
n = 10
SPIKING_THRESHOLD = 35.0
v = tf.placeholder(tf.float32, shape=[n])
u = tf.placeholder(tf.float32, shape=[n])
i = tf.placeholder(tf.float32)
# Evaluate which neurons have reached the spiking threshold
has_fired_op = tf.greater_equal(v, tf.constant(SPIKING_THRESHOLD, shape=v.shape))
# Evaluate membrane potential increment for the considered time interval
# dv = 0 if the neuron fired, dv = 0.04v*v + 5v + 140 + I -u otherwise
dv_op = tf.where(has_fired_op,
tf.zeros(v.shape),
tf.subtract(tf.add_n([tf.multiply(tf.square(v), 0.04),
tf.multiply(v, 5.0),
tf.constant(140.0, shape=v.shape),
i]),
self.u))
# Evaluate membrane recovery decrement for the considered time interval
# du = 0 if the neuron fired, du = a*(b*v -u) otherwise
du_op = tf.where(has_fired_op,
tf.zeros([v.shape]),
tf.multiply(A, tf.subtract(tf.multiply(B, v), u)))
And you evaluate them like that:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
feed = {u: np.full((n), -13.0), v: np.full((n), -65.0), i : 7.0}
dv, du = sess.run([dv_op, du_op], feed_dict=feed)
Note that this is just an example to illustrate how Tensorflow works, and not an actual simulation of spiking neuron: usually you want to evaluate also u and v based on synaptic input (in that case, the placeholders will be the synapse inputs).

Related

TensorFlow / Keras Model as function of its weights (and inputs) / Making a tf model pure functional, non-stateful

As the title suggests I'm looking for a way to copy an arbitrary tensorflow (/keras) model, such that I can run the same computational graph from the model, but have the weights (or different weights or a tensor copy of them) as part of the input of the function, something like the following, where an implementation (or idea how to implement) 'smart_copy_function' is what is missing:
model = some_model() #a TF/Keras Model
model_from_weights = smart_copy_function(model)
x = tf.some_model_input # imagine some random input to the model here, e.g. an mnist image
y_model = model(x) #normal way to call model
y_model_from_weights = model_from_weights(model.trainable_weights, x) #same call to copied model
y_model == y_model_from_weights #should be True
I believe this should be doable in a somewhat easy way, as the respective computational graph does exist in TF anyway already.
'This sounds stupid, why would you do this?': I want to build an analog to the PyTorch MetaLearning Framework higher for TensorFlow, since gradients through calls of TF optimizer 'apply_gradients' and variable 'assign' are not supported. To achieve such gradients through parameter gradient updates the above seems to be the way to go. Such gradients through parameter updates are in turn very important for Meta-Learning Research, with papers like 'Model-Agnostic Meta Learning' and 'Teaching with Commentaries' being somewhat famous examples / use cases.

How is get_updates() of optimizers.SGD used in Keras during training?

I am not familiar with the inner workings of Keras and have difficulty understanding how Keras uses the get_updates() function of optimizers.SGD during training.
I searched quite a while on the internet, but only got few details. Specifically, my understanding is that the parameters/weights update rule of SGD is defined in the get_updates() function. But it appears that get_updates() isn't literally called in every iteration during training; otherwise 'moments' wouldn't carry from one iteration to the next to implement momentum correctly, as it's reset in every call, c.f. optimizers.py:
shapes = [K.get_variable_shape(p) for p in params]
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
As pointed out in https://github.com/keras-team/keras/issues/7502, get_updates() only defines 'a symbolic computation graph'. I'm not sure what that means. Can someone give a more detailed explanation of how it works?
For example, how is the 'v' computed in one iteration got passed to 'moments' in the next iteration to implement momentum? I'd also appreciate it if someone can point me to some tutorial about how this works.
Thanks a lot! (BTW, I'm using tensorflow, if it matters.)
get_updates() defines graph operations that update the gradients.
When the graph is evaluated for training it will look somehow like this:
forward passes compute a prediction value
loss computes a cost
backward passes compute gradients
gradients are updated
Updating the gradients is a graph computation itself; i.e. the snippet of code that you quote defines how to perform the operation by specifying which tensors are involves and what math operations occur. The math operations themselves are not occurring at that point.
moments is a vectors of tensors defined in the code above. The code creates a graph operation that updates each moments element.
Every iteration of the graph will run this update operation.
The following link tries to explain the concept of the computational graph in TensorFlow:
https://www.tensorflow.org/guide/graphs
Keras uses the same underlying ideas but abstract the user from having to deal with the low level details. Defining a model in traditional TensorFlow 1.0 API requires a much higher level of detail.

BatchNormalization in Keras

How do I update moving mean and moving variance in keras BatchNormalization?
I found this in tensorflow documentation, but I don't know where to put train_op or how to work it with keras models:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
No posts I found say what to do with train_op and whether you can use it in model.compile.
You do not need to manually update the moving mean and variances if you are using the BatchNormalization layer. Keras takes care of updating these parameters during training, and to keep them fixed during testing (by using the model.predict and model.evaluate functions, same as with model.fit_generator and friends).
Keras also keeps track of the learning phase so different codepaths run during training and validation/testing.
If you need just update the weights for existing model with some new values then you can do the following:
w = model.get_layer('batchnorm_layer_name').get_weights()
# Order: [gamma, beta, mean, std]
for j in range(len(w[0])):
gamma = w[0][j]
beta = w[1][j]
run_mean = w[2][j]
run_std = w[3][j]
w[2][j] = new_run_mean_value1
w[3][j] = new_run_std_value2
model.get_layer('batchnorm_layer_name').set_weights(w)
There are two interpretations of the question: the first is assuming that the goal is to use high level training api and this question was answered by Matias Valdenegro.
The second - as discussed in the comments - is whether it is possible to use batch normalization with the standard tensorflow optimizer as discussed here keras a simplified tensorflow interface and the section "Collecting trainable weights and state updates". As mentioned there the update ops are accessible in layer.updates and not in tf.GraphKeys.UPDATE_OPS, in fact if you have a keras model in tensorflow you can optimize with a standard tensorflow optimizer and batch normalization like this
update_ops = model.updates
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
and then use a tensorflow session to fetch the train_op. To distinguish training and evaluation modes of the batch normalization layer you need to feed the
learning phase state of the keras engine (see "Different behaviors during training and testing" on the same tutorial page as given above). This would work for example like this
...
# train
lo, _ = tf_sess.run(fetches=[loss, train_step],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 1})
...
# eval
lo = tf_sess.run(fetches=[loss],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 0})
I tried this in tensorflow 1.12 and it works with models containing batch normalization. Given my existing tensorflow code and in the light of approaching tensorflow version 2.0 I was tempted to use this approach myself, but given that this approach is not being mentioned in the tensorflow documentation I am not sure this will be supported in the long term and I finally have decided to not use it and to invest a little bit more to change the code to use the high level api.

Tensorflow slow inference speed in a loop

I am working on a reinforcement learning implementation using Tensorflow. After profiling on the training procedure, I found something really weird:
The following code is in a training loop:
state_batch, \
action_batch, \
reward_batch, \
next_state_batch, \
is_episode_finished_batch = self.data_manager.get_next_batch()
state_batch = np.divide(state_batch, 10.0)
next_state_batch = np.divide(next_state_batch, 10.0)
# Calculate y for the td_error of the critic
y_batch = []
next_action_batch = self.actor_network.target_evaluate(
next_state_batch, action_batch)
q_value_batch = self.critic_network.target_evaluate(
next_state_batch, next_action_batch)
for i in range(0, self.batch_size):
if is_episode_finished_batch[i]:
y_batch.append([reward_batch[i]])
else:
y_batch.append(reward_batch[i] + GAMMA * q_value_batch[i])
# Now that we have the y batch, train the critic
self.critic_network.train(y_batch, state_batch, action_batch)
# Then get the action gradient batch and adapt the gradient with the gradient inverting method
action_batch_for_gradients = self.actor_network.evaluate(
state_batch, action_batch)
q_gradient_batch = self.critic_network.get_action_gradient(
state_batch, action_batch_for_gradients)
q_gradient_batch = self.grad_inv.invert(
q_gradient_batch, action_batch_for_gradients)
# Now we can train the actor
self.actor_network.train(q_gradient_batch, state_batch, action_batch)
actor_network and critic_network are two classes that implement actor and critic in actor-critic algorithm. Each of them has their own network and operations, but all in the same graph and will run within the same session. Each of the member function (like evaluate, train...) contains a session.run and feed the data they need by passing parameter.
I observed that action_batch_for_gradients runs extremely slow, taking 0.x seconds to do one inference, even much slower than the self.critic_network.train. action_batch_for_gradients is simply an inference operation in actor network to get action. I then copy this line and duplicate it and found that only the first action_batch_for_gradients, right after self.critic_network.train is slow, but the second one is of the normal speed of a forward operation. I think it has something to do with switching within a graph, between training a network and forward in another network. But I can't tell how to avoid.
I found some discussions on stackoverflow about using same graph in the loop, instead of building new ones each time, to speed up using tensorflow. But I already build the graph beforehand and only run the different part of the graph in the training loop. So I don't know how i wrongly use tensorflow on this loop training. I am using Tensorflow 1.6.
I would appreciate your help!

Ways to implement multi-GPU BN layers with synchronizing means and vars

I'd like to know the possible ways to implement batch normalization layers with synchronizing batch statistics when training with multi-GPU.
Caffe Maybe there are some variants of caffe that could do, like link. But for BN layer, my understanding is that it still synchronizes only the outputs of layers, not the means and vars. Maybe MPI can synchronizes means and vars but I think MPI is a little difficult to implemnt.
Torch I've seen some comments here and here, which show the running_mean and running_var can be synchronized but I think batch mean and batch var can not or are difficult to synchronize.
Tensorflow Normally, it is the same as caffe and torch. The implementation of BN refers this. I know tensorflow can distribute an operation to any device specified by tf.device(). But the computation of means and vars is in the middle of BN layer, so if I gather the means and vars in cpu, my code will be like this:
cpu_gather = []
label_batches = []
for i in range(num_gpu):
with tf.device('/gpu:%d' % i):
with tf.variable_scope('block1', reuse=i > 0):
image_batch, label_batch = cifar_input.build_input('cifar10', train_data_path, batch_size, 'train')
label_batches.append(label_batch)
x = _conv('weights', image_batch, 3, 3, 16, _stride_arr(1))
block1_gather.append(x)
with tf.device('/cpu:0'):
print block1_gather[0].get_shape()
x1 = tf.concat(block1_gather, 0)
# print x1.get_shape()
mean, variance = tf.nn.moments(x1, [0, 1, 2], name='moments')
for i in range(num_gpu):
with tf.device('/gpu:%d' % i):
with tf.variable_scope('block2', reuse=i > 0):
shape = cpu_gather[i].get_shape().as_list()
assert len(shape) in [2, 4]
n_out = shape[-1]
beta, gamma, moving_mean, moving_var = get_bn_variables(n_out, True, True)
x = tf.nn.batch_normalization(
cpu_gather[i], mean, variance, beta, gamma, 0.00001)
x = _relu(x)
That is just for one BN layer. For gathering statistics in cpu, I have to break the code. If I have more than 100 BN layers, that will be cumbersome.
I am not expert in those libraries so maybe there are some misunderstanding, feel free to point out my errors.
I do not care much about training speed. I am doing image segmentation which consumes much GPU memory and BN needs a reasonable batch size (e.g. larger than 16) for stable statistics. So using multi-GPU is inevitable. In my opinion, tensorflow might be the best choice but I can't resolve the breaking code problem. Solution with other libraries will be welcome too.
I'm not sure if I fully understand your question, but provided you set up your variable scope properly, the tf.GraphKeys.UPDATE_OPS collection should automatically have the update ops for batch_norm for each of your towers. If all of the update_ops are applied synchronously, they will be implicitly averaged by the parameter server, all you have to do is make sure the updates are applied before you average and apply gradients. (If I understand your intentions correctly).
Because of variable scope each set of update ops will update the same variables, so to synchronize the update ops all you need to do is gate your gradient calculation on the complete set of update ops. You should also encapsulate all of your batch norm layers in a single name_scope to avoid grabbing any extraneous ops in UPDATE_OPS. Code skeleton below:
update_ops = []
for i, device in enumerate(devices):
with tf.variable_scope('foo', reuse=bool(i > 0)):
with tf.name_scope('tower_%d' % i) as name_scope:
with tf.device(device):
# Put as many batch_norm layers as you want here
update_ops.extend(tf.get_collection(tf.GraphKeys.UPDATE_OPS,
name_scope))
# make gradient calculation ops here
with tf.device(averaging_device):
with tf.control_dependencies(update_ops):
# average and apply gradients.
If you wanna try this on some existing code, try just deleting the if i == 0 line here: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10_estimator/cifar10_main.py#L115
You're going to see some slow down (we usually only use one tower to compute batch norm statistics for this reason), but it should do what you want.
A specialized keras layer SyncBatchNormalization is available Since TF2.2
https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/SyncBatchNormalization
I've figured out a way to implement sync batch norm in pure tensorflow and pure python.
The code makes it possible to train PSPNet on Cityscapes and get comparable performance.