Converting a fully connected neural network with variable number of hidden layers from tensorflow to pytorch

I recently started learning pytorch and I am trying to convert a part of a large script including coding a MLP with variable number of hidden layers from Tensorflow to pytorch.
import tensorflow as tf
### Base neural network
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = {'w':[], 'b':[]}
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params['w'].append(tf.Variable(tf.random_normal([n_in, n_out], stddev=std)))
params['b'].append(tf.Variable(tf.mul(bias_init, tf.ones([n_out,]))))
return params
def mlp(X, params):
h = [X]
for w,b in zip(params['w'][:-1], params['b'][:-1]):
h.append( tf.nn.relu( tf.matmul(h[-1], w) + b ) )
#h.append( tf.nn.tanh( tf.matmul(h[-1], w) + b ) )
return tf.matmul(h[-1], params['w'][-1]) + params['b'][-1]
def compute_nll(x, x_recon_linear):
return tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(x_recon_linear, x), reduction_indices=1, keep_dims=True)
def gauss_cross_entropy(mean_post, std_post, mean_prior, std_prior):
d = (mean_post - mean_prior)
d = tf.mul(d,d)
return tf.reduce_sum(-tf.div(d + tf.mul(std_post,std_post),(2.*std_prior*std_prior)) - tf.log(std_prior*2.506628), reduction_indices=1, keep_dims=True)
how could I write down similarly weights and bias variables and attach them in each hidden layer in pytorch?
how could I convert gauss_cross_entropy and compute_nll
functions as well (finding equivalent syntax)?
Are these two codes compatible?
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as func
from torch.distributions import Normal, Categorical, Independent
from copy import
device = "cpu"
if torch.cuda.is_available():
device = "cuda:0"
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = {'w':[], 'b':[]}
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params['w'].append(torch.tensor(Normal([n_in, n_out], torch.tensor([std])) ,requires_grad=True))
params['b'].append(torch.tensor(torch.mul(bias_init, torch.ones([n_out,])),requires_grad=True))
return params
def mlp(X, params):
h = [X]
for w,b in zip(params['w'][:-1], params['b'][:-1]):
h.append( torch.nn.ReLU( tf.matmul(h[-1], w) + b ) )
return torch.matmul(h[-1], params['w'][-1]) + params['b'][-1]
def compute_nll(x, x_recon_linear):
return torch.sum(func.binary_cross_entropy_with_logits(x_recon_linear, x), reduction_indices=1, keep_dims=True)
def gauss_cross_entropy(mu_post, sigma_post, mu_prior, sigma_prior):
d = (mu_post - mu_prior)
d = torch.mul(d,d)
return torch.sum(-torch.div(d + torch.mul(sigma_post,sigma_post),(2.*sigma_prior*sigma_prior)) - torch.log(sigma_prior*2.506628), reduction_indices=1, keep_dims=True)
What is the substitute function for tf.placeholder in pytorch? For instance here:
class VAE(object):
def __init__(self, hyperParams):
self.X = tf.placeholder("float", [None, hyperParams['input_d']])
self.prior = hyperParams['prior']
self.K = hyperParams['K']
self.encoder_params = self.init_encoder(hyperParams)
self.decoder_params = self.init_decoder(hyperParams)
and also how should I change tf.shape in this line: tf.random_normal(tf.shape(self.sigma[-1]))

How could I write down similar weights and bias variables and attach them in each hidden layer in PyTorch?
An easier way to define those is to create a list containing the params as (weight, bias) tuples:
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = []
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
nn.init.normal_(torch.empty(n_in, n_out)).requires_grad_(True),
return params
Above I define my parameters as 'empty' (created with uninitialized data) tensors with torch.empty. I have used in-place functions such as nn.init.normal_ (there are many others available) and torch.Tensor.fill_ to fill the tensor with an arbitrary value (maybe it is .mul_(bias_init) you are looking for, based on your TensorFlow sample?).
For the inference code, you don't actually need to store the intermediate layer results:
def mlp(x, params):
for i, (W, b) in enumerate(params):
x = x#W + b
if i < len(params) - 1:
x = torch.relu(x)
return x
How could I convert gauss_cross_entropy and compute_nll functions as well (finding equivalent syntax)?
You can use PyTorch functions and mathematical operators to define your logic. For compute_loss you were using the built-in, which actually does not require summation after it, by default the losses of the batch elements are averaged.
def compute_loss(y_pred, y_true):
return F.binary_cross_entropy_with_logits(y_pred, y_true)
What is the substitute function for tf.placeholder in Pytorch?
You don't have placeholders in PyTorch, you compute your outputs explicitly using PyTorch operators, then you should be able to backpropagate through those operators and get the gradients for each parameter.
How should I change tf.shape in this line: tf.random_normal(tf.shape(self.sigma[-1]))
Function tf.shape returns the shape of the tensor, in PyTorch you call torch.Tensor.shape or by calling torch.Tensor.size: i.e. self.sigma[-1].shape or self.sigma[-1].size().


How to compute batch-wise Jacobians using vmap in JAX?

I want to solve a 2D-differential equation using neural network and working with the JAX library. The neural network function I am using basically approximates the function u = f(x,y) and goes something like this:
def f(params, inputs_x, inputs_y):
inputs = jnp.concatenate((inputs_x, inputs_y), axis=1)
for w, b in params:
outputs =, w)
inputs = jnn.swish(outputs)
return outputs
params is a PyTree that contains the weights and biases matrices. For the 2D problem, let's take layer sizes as something like [2,5,1]. There are 10 batches of (x_inputs, y_inputs) passed onto the function, hence inputs_x, inputs_y both are of shapes (10,1). Therefore, the output I want should also have the shape (10,1). But, the real problem comes when I'm trying to find out du/dx, du/dy, d2u/dx2 or d2u/dy2. I am writing something like this:
u = lambda x,y: f(params, x, y)
u = lambda x,y: f(params, x)
u_x = lambda x,y: vmap(jacfwd(u,argnums=0), in_axes=(0,0))(x,y)
u_xx = lambda x,y: vmap(jacfwd(u_x,argnums=0), in_axes=(0,0))(x,y)
I am getting errors.
If I was solving a 1D differential equation, then everything was going fine. In that case, the neural network function is something like this:
def f(params, inputs):
for w, b in params:
outputs =, w)
inputs = jnn.swish(outputs)
return outputs
u = lambda x,: f(params, x)
u_x = lambda x: vmap(jacfwd(u,argnums=0))(x)
Layer Sizes are [1,5,1] and I pass 10 batches of inputs into the neural network function and compute the gradients using vmap. Everything works fine!
As soon as I have a 2D problem and two input neurons, the layer sizes become [2,5,1] and then I pass 10 batches of inputs for both x and y together, vmap doesn't work anymore. I wanted to find du/dx, du/dy, d2u/dx2 or d2u/dy2 using the neural network and four functions below, and I expect all the four functions to return me results of shape (10,1), but I am getting error.
It looks like your function is not compatible with vmap, because it expects explicit batch dimensions. You can fix this by concatenating along axis=-1 rather than axis=1. Then your function calls could look something like the following:
from functools import partial
import jax
import jax.numpy as jnp
from jax import nn as jnn
def f(params, inputs_x, inputs_y):
inputs = jnp.concatenate((inputs_x, inputs_y), axis=-1)
for w, b in params:
outputs =, w)
inputs = jnn.swish(outputs)
return outputs
# Some example inputs and parameters
inputs_x = jnp.ones((10, 1))
inputs_y = jnp.ones((10, 1))
params = [
(jnp.ones((2, 5)), 1),
(jnp.ones((5, 1)), 1)
u = partial(f, params)
# u: (10,1)->(10,1)
print(u(inputs_x, inputs_y).shape)
# (10, 1)
# u: (1)->(1) batched to (10,1)->(10,1)
print(jax.vmap(u)(inputs_x, inputs_y).shape)
# (10, 1)
# ∇u: (1) -> (1,1) batched to (10,1)->(10,1,1)
print(jax.vmap(jax.jacobian(u))(inputs_x, inputs_y).shape)
# (10, 1, 1)
# ∇²u: (1) -> (1,1,1) batched to (10,1)->(10,1,1,1)
print(jax.vmap(jax.hessian(u))(inputs_x, inputs_y).shape)
# (10, 1, 1, 1)

Get logits of a trained Keras model [duplicate]

I am building a deconvolution network. I would like to add a layer to it which is the reverse of a softmax. I tried to write a basic python function that returns the inverse of a softmax for a given matrix and put that in a tensorflow Lambda and add it to my model.
I have no error but when I doing a predict I only have 0 at the exit. When I don't add this layer to my network I have output something other than zeros. This therefore justifies that they are due to my inv_softmax function which is bad.
Can you enlighten me how to proceed?
I define my funct as this :
def inv_softmax(x):
S = np.zeros((1,1,10)) #(1,1,10) is the shape of the datas that my layer will receive
for j in range(np.max(np.shape(x))):
for i in range(np.max(np.shape(x))):
S[0,0,i] = np.log(x[0,0,i]+C
except ValueError:
print("ValueError in inv_softmax")
S = tf.convert_to_tensor(S,dtype=tf.float32)
return S
I add it to my network as :
x = ...
x = layers.Lambda(lambda x : inv_softmax(x),name='inv_softmax',output_shape=[1,1,10])(x)
x = ...
If you need more of my code or others informations ask me please.
Try this:
import tensorflow as tf
def inv_softmax(x, C):
return tf.math.log(x) + C
import math
input = tf.keras.layers.Input(shape=(1,10))
x = tf.keras.layers.Lambda(lambda x : inv_softmax(x, math.log(10.)),name='inv_softmax')(input)
model = tf.keras.Model(inputs=input, outputs=x)
a = tf.zeros([1, 1, 10])
a = tf.nn.softmax(a)
a = model(a)
Thanks it works !
I put :
import keras.backend as K
def inv_softmax(x,C):
return K.log(x)+K.log(C)

Keras custom layer on ragged tensor to reduce dimensionallity

I'm trying to write a custom layer that will handle variable-length vectors, and reduce them to the same length vector.
The length is known in advance because the reason for the variable lengths is that I have several different data types that I encode using a different number of features.
In a sense, it is similar to Embedding only for numerical values.
I've tried using padding, but the results were bad, so I'm trying this approach instead.
So, for example let's say I have 3 data types, which I encode with 3, 4, 6 length vectors.
arr = [
# example one (data type 1 [len()==3], datat type 3[len()==6]) - force values as floats
# example two (data type 2 [len()==4], datat type 3len()==6]) - force values as floats
I tried implementing a custom layer like:
class DimensionReducer(tf.keras.layers.Layer):
def __init__(self, output_dim, expected_lengths):
super(DimensionReducer, self).__init__()
self._supports_ragged_inputs = True
self.output_dim = output_dim
for l in expected_lengths:
setattr(self,f'w_{l}', self.add_weight(shape=(l, self.output_dim),initializer='random_normal',trainable=True))
setattr(self, f'b_{l}',self.add_weight(shape=(self.output_dim,), initializer='random_normal',trainable=True))
def call(self, inputs):
# batch
if len(inputs.shape) == 3:
result = []
for i,x in enumerate(inputs):
_result = []
for v in x:
l = len(v)
w = getattr(self, f'w_{l}')
b = getattr(self, f'b_{l}')
out = tf.matmul([v],w) + b
result.append(tf.concat(_result, 0))
r = tf.stack(result)
print("batch output:",r.shape)
return r
Which seems to be working when called directly:
dim = DimensionReducer(3, [3,4,6])
But when I try to incorporate it into a model, it fails:
import tensorflow as tf
val_ragged = tf.ragged.constant(arr)
inputs_ragged = tf.keras.layers.Input(shape=(None,None), ragged=True)
outputs_ragged = DimensionReducer(3, [3,4,6])(inputs_ragged)
model_ragged = tf.keras.Model(inputs=inputs_ragged, outputs=outputs_ragged)
# this one with RaggedTensor doesn't
AttributeError: 'DimensionReducer' object has no attribute 'w_Tensor("dimension_reducer_98/strided_slice:0", shape=(), dtype=int32)'
I'm not sure how am I to implement such a layer, or what I'm doing wrong.

how to calculate entropy on float numbers over a tensor in python keras

I have been struggling on this and could not get it to work. hope someone can help me with this.
I want to calculate the entropy on each row of the tensor. Because my data are float numbers not integers I think I need to use bin_histogram.
For example a sample of my data is tensor =[[0.2, -0.1, 1],[2.09,-1.4,0.9]]
Just for information My model is seq2seq and written in keras with tensorflow backend.
This is my code so far: I need to correct rev_entropy
class entropy_measure(Layer):
def __init__(self, beta,batch, **kwargs):
self.beta = beta
self.batch = batch
self.uses_learning_phase = True
self.supports_masking = True
super(entropy_measure, self).__init__(**kwargs)
def call(self, x):
return K.in_train_phase(self.rev_entropy(x, self.beta,self.batch), x)
def get_config(self):
config = {'beta': self.beta}
base_config = super(entropy_measure, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def rev_entropy(self, x, beta,batch):
for i in x:
i = pd.Series(i)
p_data = i.value_counts() # counts occurrence of each value
entropy = entropy(p_data) # get entropy from counts
rev = 1/(1+entropy)
return rev
new_f_w_t = x * (rev.reshape(rev.shape[0], 1))*beta
return new_f_w_t
Any input is much appreciated:)
It looks like you have a series of questions that come together on this issue. I'll settle it here.
You calculate entropy in the following form of scipy.stats.entropy according to your code:
scipy.stats.entropy(pk, qk=None, base=None)
Calculate the entropy of a distribution for given probability values.
If only probabilities pk are given, the entropy is calculated as S =
-sum(pk * log(pk), axis=0).
Tensorflow does not provide a direct API to calculate entropy on each row of the tensor. What we need to do is to implement the above formula.
import tensorflow as tf
import pandas as pd
from scipy.stats import entropy
a = [1.1,2.2,3.3,4.4,2.2,3.3]
res = entropy(pd.value_counts(a))
_, _, count = tf.unique_with_counts(tf.constant(a))
# [1 2 2 1]
prob = count / tf.reduce_sum(count)
# [0.16666667 0.33333333 0.33333333 0.16666667]
tf_res = -tf.reduce_sum(prob * tf.log(prob))
with tf.Session() as sess:
print('scipy version: \n',res)
print('tensorflow version: \n',
scipy version:
tensorflow version:
Then we need to define a function and achieve for loop through tf.map_fn in your custom layer according to above code.
def rev_entropy(self, x, beta,batch):
def row_entropy(row):
_, _, count = tf.unique_with_counts(row)
prob = count / tf.reduce_sum(count)
return -tf.reduce_sum(prob * tf.log(prob))
value_ranges = [-10.0, 100.0]
nbins = 50
new_f_w_t = tf.histogram_fixed_width_bins(x, value_ranges, nbins)
rev = tf.map_fn(row_entropy, new_f_w_t,dtype=tf.float32)
new_f_w_t = x * 1/(1+rev)*beta
return new_f_w_t
Notes that the hidden layer will not produce a gradient that cannot propagate backwards since entropy is calculated on the basis of statistical probabilistic values. Maybe you need to rethink your hidden layer structure.

How can I update tensor (weight value) trying to use two separate network?

I've been trying to make AI for blackjack using RL. Now I'm trying to make two separate networks which is one way of DQN. I've searched the web and found some way and tried to use it but failed.
This error has occurred:
TypeError: Using a tf.Tensor as a Python bool is not allowed. Use if t is not None: instead of if t: to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
import gym
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
def one_hot(x):
b = s[x[0] * 20 + x[1] * 2 + x[2]]
return b.reshape(1, 600)
def boolstr_to_floatstr(v):
if v == True:
return 1
elif v == False:
return 0
X=tf.placeholder(tf.float32, shape=[1,state_number], name='input_data')
W1=tf.Variable(tf.random_uniform([state_number,128],0,0.01))#network for update
Qpred=tf.matmul(layer2,W3) # Qprediction
W4=tf.Variable(tf.random_uniform([state_number,128],0,0.01))#network for target
target=tf.matmul(layer4,W6) # target
dis=0.99 #discount factor
rList=[] #record the reward
with tf.Session() as sess:
for i in range(num_episodes): #episode 만번
s = env.reset()
rALL = 0
done = False
e=1./((i/100)+1) #exploit or explore용 상수
while not done:
s = np.asarray(s)
s[2] = boolstr_to_floatstr(s[2])
if np.random.rand(1)<e: #새로운 도전시도
a=np.argmax(Qs) #그냥 내가아는한 최댓값의 액션 선택
s1,reward,done,_=env.step(a) #
if done:
Qs[0,a]=reward+dis*np.max(Qs1) #optimal Q,feed_dict={X:one_hot(s),Y:Qs})
if i%10==0: ##target 을 Qpredion으로 업데이트해줌,update2,update3)
if reward==1:
rALL += reward
print('success rate: '+ str(sum(rList)/num_episodes))
print("Final Q-table values")
I need to print success rate finally. before DQN its 38%ish. If there is something wrong in my code considering its DQN algorithm, tell me please.
If you want to share the weights between different networks, then simply create layer with same name, using the scope with tf.variable_scope(, reuse=tf.AUTO_REUSE): and then weights between networks will be shared automatically.