How does BatchNormalization work on an example? - tensorflow

I am trying to understand batchnorm.
My humble example
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.,4.]])
out = layer1(x)
print(out)
Prints
tf.Tensor([[2.99850112 3.9980015 ]], shape=(1, 2), dtype=float64)
My attempt to reproduce it
e=0.001
m = np.sum(x)/2
b = np.sum((x - m)**2)/2
x_=(x-m)/np.sqrt(b+e)
print(x_)
It prints
[[-0.99800598 0.99800598]]
What am I doing wrong?

Two problems here.
First, batch norm has two "modes": Training, where normalization is done via the batch statistics, and inference, where normalization is done via "population statistics" that are collected from batches during training. Per default, keras layers/models function in inference mode, and you need to specify training=True in their call to change this (there are other ways, but that is the simplest one).
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.,4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
This prints tf.Tensor([[0. 0.]], shape=(1, 2), dtype=float32). Still not right!
Second, batch norm normalizes over the batch axis, separately for each feature. However, the way you specify the input (as a 1x2 array) is basically a single input (batch size 1) with two features. Batch norm just normalizes each feature to mean 0 (standard deviation is not defined). Instead, you want two inputs with a single feature:
layer1 = tf.keras.layers.BatchNormalization(scale=False, center=False)
x = np.array([[3.],[4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
This prints
tf.Tensor(
[[-0.99800634]
[ 0.99800587]], shape=(2, 1), dtype=float32)
Alternatively, specify the "feature axis":
layer1 = tf.keras.layers.BatchNormalization(axis=0, scale=False, center=False)
x = np.array([[3.,4.]], dtype=np.float32)
out = layer1(x, training=True)
print(out)
Note that the input shape is "wrong", but we told batchnorm that axis 0 is the feature axis (it defaults to -1, the last axis). This will also give the desired result:
tf.Tensor([[-0.99800634 0.99800587]], shape=(1, 2), dtype=float32)

Related

I'm getting error (Inputs to a layer should be tensors) when using tf.data.Dataset and the Window creation function

The problem I'm stuck with is an error in the fit method when trying to train a neural network based on a dataset generated using the tf.data.Dataset.Window window creation function.
My training dataset is too big to fit in memory, and I have to train on data that is formed into window. In this regard, the loading of a data set is organized through the tf.data.experimental.CsvDataset function.
The dataset is a consecutive row of numeric values, where the first 7 values ​​contain labels, the next 100 values ​​contain features. Only one value is taken to form labels, the remaining 6 are omitted and serve only for additional experiments with the quality of training.
import tensorflow as tf
from tensorflow import keras
XLength = 107
types = [tf.constant(0, dtype=tf.float32)]
ds = tf.data.experimental.CsvDataset(train_file_list, types*XLength, header=False, field_delim = ";", compression_type="GZIP")
The pack_row function extracts the 3rd value from each row as a label and 100 features values
def pack_row(*row):
label = row[3]
features = tf.stack(row[PLength:XLength],1)
return features, label
Next, we create a data set in which rows form a data set divided into features and labels, and add a window creation function.
window_ds_train = ds.batch(1000).map(stack_row, num_parallel_calls=4).unbatch().window(10, shift=1, drop_remainder=True)
The features dataset looks like this:
for x in window_ds_train.take(1):
for n in x[0]:
print(n)
tf.Tensor(
[1.1039783 1.1163003 1.1081576 1.1117266 1.1180297 1.2345679 1.3053098
1.3443557 1.3639535 1.26 1.2604042 1.1780168 1.1761158 1.2451861
1.4478064 1.4914197 1.35623 1.4864376 1.4237918 1.4029851 1.434866
1.1298449 1.0216535 1.0060976 1.0190678 1.0550661 0.99117 0.8632287
0.7545455 0.7396314 0.7372093 0.7226107 0.7727273 0.766129 1.0083683
1.5096774 1.4933333 1.2517985 1.537037 1.6262627 1.5851064 1.2197802
1.1764706 1.6491228 4.631579 5.25 4.7 4.3333335 4.
3.5714285 0.28 0.25 0.2307692 0.212766 0.1904762 0.2159091
0.606383 0.85 0.8198198 0.6308725 0.6149068 0.6506024 0.7988506
0.6696429 0.6623932 0.9917012 1.3052632 1.2941177 1.383871 1.3564669
1.3520249 1.3253012 1.1584415 1.0089086 0.9478079 0.981289 0.9939394
0.9788054 0.8850772 0.6969292 0.7127659 0.7023498 0.6727494 0.7373381
0.6705021 0.6907001 0.8030928 0.8502564 0.8488844 0.7933962 0.7936508
0.7331628 0.7438507 0.7661017 0.81 0.8944306 0.8995017 0.9023987
0.8958163 0.9058149], shape=(100,), dtype=float32)
tf.Tensor(
[1.0480686 1.0768552 1.0823635 1.0807899 1.0946314 1.1049724 1.0976744
1.1112158 1.1066037 1.0180608 1.0143541 1.0478215 1.1168385 1.1465721
1.1544029 1.1672772 1.0481482 1.0198511 0.9598997 1.0053476 1.1888889
0.9557377 0.8722689 0.9482759 0.948718 0.9485149 0.9144603 0.7938144
0.6960168 0.6963124 0.7188209 0.7328605 0.6848341 0.686747 0.589242
0.5806451 0.5614035 0.4371859 0.483965 0.4721408 0.7163461 0.8951613
0.8403361 0.8703704 1.1428572 0.9264706 0.7460318 0.65 0.5925926
0.9615384 1.04 1.6875 1.5384616 1.3404255 1.0793651 0.875
1.1489362 1.19 1.1171172 1.3959732 2.1180124 2.066265 2.2873564
1.78125 1.7222222 1.6970954 1.4561404 1.4602076 1.3645161 1.3911672
1.4361371 1.436747 1.2597402 1.0935411 1.0542798 1.054054 1.0545454
1.1464355 1.0463122 0.8411215 0.9946808 1.0417755 0.9805353 0.9540636
0.8566946 0.8662487 0.872165 0.8953846 0.9543611 0.9858491 0.9822596
0.9036658 0.8999152 0.9110169 0.905 0.9135495 0.9252492 0.9239041
0.9286301 0.954136 ], shape=(100,), dtype=float32)
I had to omit some of the data, because the data set is too large, the window has the form (10,100)
The labels look like this:
for x in window_ds_train.take(1):
for n in x[1]:
print(n)
tf.Tensor(-0.21, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
Next, I would like to make a flat_map transformation to a dataset, but when I try to execute:
flatten = window_ds_train.flat_map(lambda x:x.batch(10))
of course, I will get an error: TypeError: () takes 1 positional argument but 2 were given, since both features and labels are hardwired inside the dataset, and the method can apparently only process one axis.
The model I'm trying to train looks like this:
inputs = keras.Input(shape=(100))
x = keras.layers.Dense(204, activation='relu')(inputs)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(400, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(400, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(204, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(102, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
outputs = keras.layers.Dense(10)(x)
model = keras.Model(inputs, outputs)
model.compile(optimizer=tf.keras.optimizers.Adam(), loss = 'mse', metrics="mae")
If, under such circumstances, training is carried out:
model.fit(window_ds_train, epochs=1, verbose=1)
then I get an error: TypeError: Inputs to a layer should be tensors. Got: <_VariantDataset shapes: (100,), types: tf.float32>
Accordingly, I understand that the incoming data must be a tensor, while it is of type _VariantDataset, which is not acceptable.
To work around this problem, I attempted to split the dataset into features and labels and process them in separate flat_map threads. To do this, I had to additionally introduce two functions, the first returns features, and the second labels:
def label_row(*row):
label = row[3]
return label
def features_row(*row):
features = tf.stack(row[PLength:XLength],1)
return features
Next, we form a data set with window functions for features and labels separately for each:
feature_flatten = feature_window_ds_train.flat_map(lambda x:x.batch(10))
label_flatten = label_window_ds_train.flat_map(lambda x:x.batch(10))
When trying to train a model:
history = model.fit(feature_flatten, label_flatten, epochs=1, verbose=1)
i get error: y argument is not supported when using dataset as input
Definitely, the input model expects a dataset in which the Dataset consists of x and y, in this case I submit x separately from y, which is unacceptable.
If someone has ideas on how to train a model that will accept Dataset.Window as input, I would be very grateful for clarifications.
Let's first create a dataset compatible with your model
N = 50;
c = 1;
ds = tf.data.Dataset.from_tensor_slices(
(
tf.random.normal(shape=(N, c, 100)),
tf.random.normal(shape=(N, c))
)
)
Then we can simply
model.fit(ds, epochs=1)
But notice that the return type of window, is not the same as the initial dataset. ds is a dataset of tuples, dsw is a tupple of _VariantDatasets
print(ds)
# <TensorSliceDataset shapes: ((1, 100), (1,)), types: (tf.float32, tf.float32)>
for dsw in ds.window(30):
print(dsw);
# (<_VariantDataset shapes: (1, 100), types: tf.float32>, <_VariantDataset shapes: (1,), types: tf.float32>)
# (<_VariantDataset shapes: (1, 100), types: tf.float32>, <_VariantDataset shapes: (1,), types: tf.float32>)
What you can do to get a window of the dataset with the same type is to combine skip and take
def simple_window(ds, size):
for start in range(0, ds.cardinality(), size):
yield ds.skip(start).take(size)
Then you can train with different windows
for dsw in simple_window(ds, 30):
model.fit(dsw, epochs=1)

How to compute gradients with tf.scatter_sub?

When implementing lambda-opt(an algorithm published on KDD'19) in tensorflow, I came across a problem to compute gradients with tf.scatter_sub。
θ refers to an embedding matrix for docid.
The formulation is
θ(t+1)=θ(t) - α*(grad+2*λ*θ),
delta = theta_grad_no_reg.values * lr + 2 * lr * cur_scale * cur_theta
next_theta_tensor = tf.scatter_sub(theta,theta_grad_no_reg.indices,delta)
then I use θ(t+1) for some computation. Finally, I want to compute gradients with respect to λ, not θ.
But the gradient is None.
I wrote a demo like this:
import tensorflow as tf
w = tf.constant([[1.0], [2.0], [3.0]], dtype=tf.float32)
y = tf.constant([5.0], dtype=tf.float32)
# θ
emb_matrix = tf.get_variable("embedding_name", shape=(10, 3),
initializer=tf.random_normal_initializer(),dtype=tf.float32)
# get one line emb
cur_emb=tf.nn.embedding_lookup(emb_matrix,[0])
# The λ matrix
doc_lambda = tf.get_variable(name='docid_lambda', shape=(10, 3),
initializer=tf.random_normal_initializer(), dtype=tf.float32)
# get one line λ
cur_lambda=tf.nn.embedding_lookup(doc_lambda, [0])
# θ(t+1) Tensor("ScatterSub:0", shape=(10, 3), dtype=float32_ref)
next_emb_matrix=tf.scatter_sub(emb_matrix, [0], (cur_emb *cur_lambda))
# do some compute with θ(t+1) Tensor ,not Variable
next_cur_emb=tf.nn.embedding_lookup(next_emb_matrix,[0])
y_ = tf.matmul(next_cur_emb, w)
loss = tf.reduce_mean((y - y_) ** 2)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
grad_var_list=optimizer.compute_gradients(loss)
print(grad_var_list)
# [(None, <tf.Variable 'embedding_name:0' shape=(10, 3) dtype=float32_ref>), (None, <tf.Variable 'docid_lambda:0' shape=(10, 3) dtype=float32_ref>)]
The gradient is None, too. It seems that tf.scatter_sub op doesn't provide gradient?
Thanks for your help!
If you have an interest in this algorithm, you can search for it, but it's not important about this question.

Connect custom input pipeline to tf model

I am currently trying to get a simple tensorflow model to train by data provided by a custom input pipeline. It should work as efficient as possible. Although I've read lots of tutorials, I can't get it to work.
THE DATA
I have my training data split over several csv files. File 'a.csv' has 20 samples and 'b.csv' has 30 samples in it, respectively. They have the same structure with the same header:
feature1; feature2; feature3; feature4
0.1; 0.2; 0.3; 0.4
...
(No labels, as it is for an autoencoder.)
THE CODE
I have written an input pipeline and would like to feed the data from it to the model. My code looks like this:
import tensorflow as tf
def input_pipeline(filenames, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(
lambda filename: (
tf.data.TextLineDataset(filename)
.skip(1)
.shuffle(10)
.map(lambda csv_row: tf.decode_csv(
csv_row,
record_defaults=[[-1.0]]*4,
field_delim=';'))
.batch(batch_size)
)
)
return dataset.make_initializable_iterator()
iterator = input_pipeline(['/home/sku/data/a.csv',
'/home/sku/data/b.csv'],
batch_size=5)
next_element = iterator.get_next()
# Build the autoencoder
x = tf.placeholder(tf.float32, shape=[None, 4], name='in')
z = tf.contrib.layers.fully_connected(x, 2, activation_fn=tf.nn.relu)
x_hat = tf.contrib.layers.fully_connected(z, 4)
# loss function with epsilon for numeric stability
epsilon = 1e-10
loss = -tf.reduce_sum(
x * tf.log(epsilon + x_hat) + (1 - x) * tf.log(epsilon + 1 - x_hat))
train_op = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(loss)
with tf.Session() as sess:
sess.run(iterator.initializer)
sess.run(tf.global_variables_initializer())
for i in range(50):
batch = sess.run(next_element)
sess.run(train_op, feed_dict={x: batch, x_hat: batch})
THE PROBLEM
When trying to feed the data to the model, I get an error:
ValueError: Cannot feed value of shape (4, 5) for Tensor 'in:0', which has shape '(?, 4)'
When printing out the shapes of the batched data, I get this for example:
(array([ 4.1, 5.9, 5.5, 6.7, 10. ], dtype=float32), array([0.4, 7.7, 0. , 3.4, 8.7], dtype=float32), array([3.5, 4.9, 8.3, 7.2, 6.4], dtype=float32), array([-1. , -1. , 9.6, -1. , -1. ], dtype=float32))
It makes sense, but where and how do I have to reshape this? Also, this additional info dtype only appears with batching.
I also considered that I did the feeding wrong. Do I need input_fn or something like that? I remember that feeding dicts is way to slow. If somebody could give me an efficient way to prepare and feed the data, I would be really grateful.
Regards,
I've figured out a solution, that requires a second mapping function. You have to add the following line to the input function:
def input_pipeline(filenames, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(
lambda filename: (
tf.data.TextLineDataset(filename)
.skip(1)
.shuffle(10)
.map(lambda csv_row: tf.decode_csv(
csv_row,
record_defaults=[[-1.0]]*4,
field_delim=';'))
.map(lambda *inputs: tf.stack(inputs)) # <-- mapping required
.batch(batch_size)
)
)
return dataset.make_initializable_iterator()
This seems to convert the array-like output to a matrix, that can be fed to the network.
However, I'm still not sure if feeding it via feed_dict is the most efficient way. I'd still appreciate support here!

Why batch_normalization in tensorflow does not give expected results?

I would like to see the output of batch_normalization layer in a small example, but apparently I am doing something wrong so I get the same output as the input.
import tensorflow as tf
import keras.backend as K
K.set_image_data_format('channels_last')
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3)
x = np.random.rand(4, 2, 2, 3) # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
K.set_session(sess)
a = sess.run(outp, feed_dict={X:x, K.learning_phase(): 0})
print(a-x) # print the difference between input and normalized output
The input and output of the above code are almost identical. Can anyone point out the problem to me?
Remember that batch_normalization behaves differently at train and test time. Here, you have never "trained" your batch normalization, so the moving average it has learned is random but close to 0, and the moving variance factor close to 1, so the output is almost the same as the input. If you use K.learning_phase(): 1 you'll already see some differences (because it will normalize using the batch's average and standard deviation); if you first learn on a lot of examples and then test on some other ones you'll also see the normalization occuring, because the learnt mean and standard deviation will not be 0 and 1.
To see better the effects of batch norm, I'd also suggest you to multiply your input by a big number (say 100), so that you have a clear difference between unnormalized and normalized vectors, that will help you test what's going on.
EDIT: In your code as is, it seems that the update of the moving mean and moving variance is never done. You need to make sure the update ops are run, as indicated in batch_normalization's doc. The following lines should make it work:
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
Below is my full working code (I got rid of Keras because I don't know it well, but you should be able to re-add it).
import tensorflow as tf
import numpy as np
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
is_training = tf.placeholder(tf.bool, shape=()) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
x = np.random.rand(4, 2, 2, 3) * 100 # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
initial = sess.run(outp, feed_dict={X:x, is_training: False})
for i in range(10000):
a = sess.run(outp, feed_dict={X:x, is_training: True})
if (i % 1000 == 0):
print("Step %i: " %i, a-x) # print the difference between input and normalized output
final = sess.run(outp, feed_dict={X: x, is_training: False})
print("initial: ", initial)
print("final: ", final)
assert not np.array_equal(initial, final)

How to have a variable number of hidden layers in Tensorflow?

Suppose that we want to try sort of hidden layer numbers and their size. How can we do in Tensorflow?
Consider following example to make it clear:
# Create a Neural Network Layer
def fc_layer(input, size_in, size_out):
w = tf.Variable(tf.truncated_normal([None, size_in, size_out]), name="W")
b = tf.Variable(tf.constant(0.1, shape=[size_out]))
act = tf.matmul(input, w) + b
return act
n_hiddenlayers=3 #number of hidden layers
hidden_layer=tf.placeholder(tf.float32,[n_hiddenlayers, None, None])
#considering 4 as size of inputs and outputs of all layers
sizeInpOut=4
for i in range(n_hiddenlayers):
hidden_layer(i,:,:)= tf.nn.sigmoid(fc_layer(X, sizeInpOut, sizeInpOut))
It results in an error about hidden_layer(i,:,:)= ...
In the other word, I need tensor of tensors.
I did this just using a list to hold the different layers as follows, seemed to work fine.
# inputs
x_size=2 # first layer nodes
y_size=1 # final layer nodes
h_size=[3,4,3] # variable length list of hidden layer nodes
# set up input and output
X = tf.placeholder(tf.float32, [None,x_size])
y_true = tf.placeholder(tf.float32, [None,y_size])
# set up parameters
W = []
b = []
layer = []
# first layer
W.append(tf.Variable(tf.random_normal([x_size, h_size[0]], stddev=0.1)))
b.append(tf.Variable(tf.zeros([h_size[0]])))
# add hidden layers (variable number)
for i in range(1,len(h_size)):
W.append(tf.Variable(tf.random_normal([h_size[i-1], h_size[i]], stddev=0.1)))
b.append(tf.Variable(tf.zeros([h_size[i]])))
# add final layer
W.append(tf.Variable(tf.random_normal([h_size[-1], y_size], stddev=0.1)))
b.append(tf.Variable(tf.zeros([y_size])))
# define model
layer.append(tf.nn.relu(tf.matmul(X, W[0]) + b[0]))
for i in range(1,len(h_size)):
layer.append(tf.nn.relu(tf.matmul(layer[i-1], W[i]) + b[i]))
if self.type_in == "classification":
y_pred = tf.nn.sigmoid(tf.matmul(layer[-1], W[-1]) + b[-1])
loss = tf.reduce_mean(-1. * ((y_true * tf.log(y_pred)) + ((1.-y_true) * tf.log(1.-y_pred))))
correct_prediction = tf.equal(tf.round(y_pred), tf.round(y_true))
metric = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
metric_name = "accuracy"
Not a direct answer, but you could consider using tensorflow-slim. It's one of the many APIs distributed as part of tensorflow. It is lightweight and compatible with defining all the variables by hand as you are doing. If you look at the webpage I linked, slim.repeat and slim.stack allow you to create multiple layers of different widths in one line. To make things more complicated: I think part of slim is now the module called layers in tensorflow.
But maybe you just want to play directly with tf variables to understand how it works and not use a higher level API until later.
In the code you posted, since you want to create three layers, you should call fc_layer three times, but you only call it once. By the way this implies that w and b will be created three different times, as different variables with different internal tf names. And it is what you want.
You should have some for-loop or while-loop which iterates three times. Note that the output tensor at the end of the loop will become the input tensor in the next iteration. The initial input is the true input and the very last output is the true output.
Another issue with your code is that the non-linearity (the sigmoid) should be at the end of fc_layer. You want a non-linear operation between all layers.
EDIT: some code of what would usually be done:
import tensorflow as tf
input_size = 10
output_size = 4
layer_sizes = [7, 6, 5]
def fc_layer(input, size, layer_name):
in_size = input.shape.as_list()[1]
w = tf.Variable(tf.truncated_normal([in_size, size]),
name="W" + layer_name)
b = tf.Variable(tf.constant(0.1, shape=[size]),
name="b" + layer_name)
act = tf.nn.sigmoid(tf.matmul(input, w) + b)
return act
input = tf.placeholder(tf.float32, [None, input_size])
# output will be the intermediate activations successively and in the end the
# final activations (output).
output = input
for i, size in enumerate(layer_sizes + [output_size]):
output = fc_layer(output , size, layer_name=str(i + 1))
print("final output var: " + str(output))
print("All vars in the tensorflow graph:")
for var in tf.global_variables():
print(var)
With output:
final output: Tensor("Sigmoid_3:0", shape=(?, 4), dtype=float32)
<tf.Variable 'W1:0' shape=(10, 7) dtype=float32_ref>
<tf.Variable 'b1:0' shape=(7,) dtype=float32_ref>
<tf.Variable 'W2:0' shape=(7, 6) dtype=float32_ref>
<tf.Variable 'b2:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'W3:0' shape=(6, 5) dtype=float32_ref>
<tf.Variable 'b3:0' shape=(5,) dtype=float32_ref>
<tf.Variable 'W4:0' shape=(5, 4) dtype=float32_ref>
<tf.Variable 'b4:0' shape=(4,) dtype=float32_ref>
In your code your were using the same name for w, which creates conflicts since different variables with the same name would be created. I fixed it in my code, but even if you use the same name tensorflow is intelligent enough and will rename each variable to a unique name by adding an underscore and a number.
EDIT: here is what I think you wanted to do:
import tensorflow as tf
hidden_size = 4
input_size = hidden_size # equality required!
output_size = hidden_size # equality required!
n_hidden = 3
meta_tensor = tf.Variable(tf.truncated_normal([n_hidden, hidden_size, hidden_size]),
name="meta")
def fc_layer(input, i_layer):
w = meta_tensor[i_layer]
# more verbose: w = tf.slice(meta_tensor, begin=[i_layer, 0, 0], size=[1, hidden_size, hidden_size])[0]
b = tf.Variable(tf.constant(0.1, shape=[hidden_size]),
name="b" + str(i_layer))
act = tf.nn.sigmoid(tf.matmul(input, w) + b)
return act
input = tf.placeholder(tf.float32, [None, input_size])
# output will be the intermediate activations successively and in the end the
# final activations (output).
output = input
for i_layer in range(0, n_hidden):
output = fc_layer(output, i_layer)
print("final output var: " + str(output))
print("All vars in the tensorflow graph:")
for var in tf.global_variables():
print(var)
With output:
final output var: Tensor("Sigmoid_2:0", shape=(?, 4), dtype=float32)
All vars in the tensorflow graph:
<tf.Variable 'meta:0' shape=(3, 4, 4) dtype=float32_ref>
<tf.Variable 'b0:0' shape=(4,) dtype=float32_ref>
<tf.Variable 'b1:0' shape=(4,) dtype=float32_ref>
<tf.Variable 'b2:0' shape=(4,) dtype=float32_ref>
As I said this is not standard. While coding it I also realized that it is quite limiting since all hidden layers must have the same size. A meta-tensor can be used to store many matrices, but those must all have the same dimensions. So you could not do like I did in the example above where the hidden first layer has size 7 and the next one size 6 and the final one size 5, before an output of size 4.