Tensorflow add broadcasting - tensorflow

Can someone tell me what's going on here? I'm confused.
In [125]: a
Out[125]: <tf.Tensor 'MatMul_86739:0' shape=(100, 1) dtype=float32>
In [126]: embed
Out[126]: <tf.Tensor 'embedding_lookup_41205:0' shape=(100,) dtype=float32>
In [128]: a+embed
Out[128]: <tf.Tensor 'add_43373:0' shape=(100, 100) dtype=float32>
How can (100,1) + (100,) be (100,100)? and if so, WHY?

The rules for TensorFlow's broadcasting operators are based on NumPy's broadcasting rules.
The basic algorithm for broadcasting works from right to left. Assume we are adding (or applying another binary broadcasting operator to) two tensors x and y, the following code computes the shape of the result:
result_shape = []
# Loop over the matching dimensions of x and y in reverse.
for x_dim, y_dim in zip(x.shape[::-1], y.shape[::-1]):
if x.shape == y.shape:
result_shape.insert(0, x.shape)
elif x.shape == 1:
result_shape.insert(0, y.shape) # x will be broadcast along this dimension.
elif y.shape == 1:
result_shape.insert(0, x.shape) # y will be broadcast along this dimension.
else:
raise ValueError("Shapes of x and y are incompatible.")
# If x and y have a different rank, the leading dimensions are inherited
# from the tensor with higher rank.
if len(x.shape) > len(y.shape):
num_leading_dims = len(x.shape) - len(y.shape)
result_shape = x.shape[0:num_leading_dims] + result_shape
elif len(y.shape) > len(x.shape):
num_leading_dims = len(y.shape) - len(x.shape)
result_shape = y.shape[0:num_leading_dims] + result_shape
Now in your example, you have x.shape = (100,) and y.shape = (100, 1):
The first comparison is between 100 and 1, and so result_shape = [100].
y.shape is longer than x.shape, so we add the leading dimension to the result, giving result_shape = [100, 100].

Related

output from tf.tensordot(x,y,2)

I try to understand the collaborative filter recommender
One particular part is about tf.tensordot(x,y,2) in the following code:
def call(self, inputs):
user_vector = self.user_embedding(inputs[:, 0])
user_bias = self.user_bias(inputs[:, 0])
movie_vector = self.movie_embedding(inputs[:, 1])
movie_bias = self.movie_bias(inputs[:, 1])
dot_user_movie = tf.tensordot(user_vector, movie_vector, 2)
# Add all the components (including bias)
x = dot_user_movie + user_bias + movie_bias
# The sigmoid activation forces the rating to between 0 and 1
return tf.nn.sigmoid(x)
dot_user_movie.shape is () but x.shape is (None, 1). Why? I expect dot_user_movie.shape be (None, 1) as well.
What I really want to do is to remove bias terms. But it will give me some error if I do the following:
x = dot_user_movie
return tf.nn.sigmoid(x)
I try to mimic the above calculation with explicit values:
emb = layers.Embedding(20,4)
bias = layers.Embedding(20, 1)
x1 = emb(np.array([0,2,1]))
y1 = emb(np.array([3,5,1]))
dot_x1_y1 = tf.tensordot(x1, y1, 2)
#<tf.Tensor: shape=(), dtype=float32, numpy=0.0065868925>
bias(np.array([0,2,1]))
dot_x1_y1+bias
## this will cause some error ValueError: Attempt to convert a value
## (<keras.layers.core.embedding.Embedding object at0x7f7211a59640>)
## with an unsupported type (<class'keras.layers.core.embedding.Embedding'>)
## to a Tensor.

How to make 2 tensors the same length by mean | median imputation of the shortest tensor?

I'm trying to subclass a the base Keras layer to create a layer that will merge the rank 1 output of 2 layers of a skip connection by outputting the Dot product of 2 tensors. The 2 incoming tensors are created by Dense layers parsed by a Neural Architecture Search algorithm that randomly selects the number of Dense units and hence the length of the 2 tensors. These of course will usually not be of the same length. I am trying an experiment to see if casting them to the same length by means of appending the shorter tensor with a mathematically meaningful imputation: [e.g. mean | median | hypotenuse | cos | ... etc] then merging them by means of the dot product will outperform Add or Concatenate merging strategies. To make them the same length:
I try the overall strategy:
Find the shorter tensor.
Pass it to tf.reduce_mean() (aliasing the resulting mean as "rm" for the sake of discussion).
Create a list of [rm for rm in range(['difference in length of the longer tensor and the shorter tensor']). Cast as a tensor if necessary.
[pad | concatenate] the shorter tensor with the result of the operation above to make it equal in length.
Here is where I am running into a dead wall:
Since the tf operation reduce_mean is returning a future with its shape set as None (not assumed to be a scalar of 1), they are in a state of having a shape of '(None,)', which the tf.keras.layers.Dot layer refuses to ingest and throws a ValueError, as it does not see them as being the same length, though they always will be:
KerasTensor(type_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None), name='tf.math.reduce_mean/Mean:0', description="created by layer 'tf.math.reduce_mean'")
ValueError: A Concatenate layer should be called on a list of at least 1 input. Received: input_shape=[[(None,), (None,)], [(None, 3)]]
My code (in the package/module):
import tensorflow as tf
import numpy as np
class Linear1dDot(tf.keras.layers.Layer):
def __init__(self, input_dim=None,):
super(Linear1dDot, self).__init__()
def __call__(self, inputs):
max_len = tf.reduce_max(tf.Variable(
[inp.shape[1] for inp in inputs]))
print(f"max_len:{max_len}")
for i in range(len(inputs)):
inp = inputs[i]
print(inp.shape)
inp_lenght = inp.shape[1]
if inp_lenght < max_len:
print(f"{inp_lenght} < {max_len}")
# pad_with = inp.reduce_mean()
pad_with = tf.reduce_mean(inp, axis=1)
print(pad_with)
padding = [pad_with for _ in range(max_len - inp_lenght)]
inputs[i] = tf.keras.layers.concatenate([padding, [inp]])
# inputs[i] = tf.reshape(
# tf.pad(inp, padding, mode="constant"), (None, max_len))
print(inputs)
return tf.keras.layers.Dot(axes=1)(inputs)
...
# Alternatively substituting the last few lines with:
pad_with = tf.reduce_mean(inp, axis=1, keepdims=True)
print(pad_with)
padding = tf.keras.layers.concatenate(
[pad_with for _ in range(max_len - inp_lenght)])
inputs[i] = tf.keras.layers.concatenate([padding, [inp]])
# inputs[i] = tf.reshape(
# tf.pad(inp, padding, mode="constant"), (None, max_len))
print(inputs)
return tf.keras.layers.Dot(axes=1)(inputs)
... and countless other permutations of attempts ...
Does anyone know a workaround or have any advice? (other than 'Don't try to do this.')?
In the parent folder of this module's package ...
Test to simulate a skip connection merging into the current layer:
from linearoneddot.linear_one_d_dot import Linear1dDot
x = tf.constant([1, 2, 3, 4, 5])
y = tf.constant([0, 9, 8])
inp1 = tf.keras.layers.Input(shape=3)
inp2 = tf.keras.layers.Input(shape=5)
xd = tf.keras.layers.Dense(3, "relu")(inp1)
yd = tf.keras.layers.Dense(5, 'elu')(inp2)
combined = Linear1dDot()([xd, yd]) # tf.keras.layers.Dot(axes=1)([xd, yd])
z = tf.keras.layers.Dense(2)(combined)
model = tf.keras.Model(inputs=[inp1, inp2], outputs=z) # outputs=z)
print(model([x, y]))
print(model([np.random.random((3, 3)), np.random.random((3, 5))]))
Does anyone know a workaround that will be able to get the mean of the shorter rank 1 tensor as a scalar, which I can then append / pad to the shorter tensor to a set intended langth (same length as the longer tensor).
Try this, hope this will work, Try to padd the shortest input with 1, and then concat it with the input then take the dot product, then finally subtract the extra ones which were added in the dot product...
class Linear1dDot(tf.keras.layers.Layer):
def __init__(self,**kwargs):
super(Linear1dDot, self).__init__()
def __call__(self, inputs):
_input1 , _input2 = inputs
_input1_shape = _input1.shape[1]
_input2_shape = _input2.shape[1]
difference = tf.math.abs(_input1_shape - _input2_shape)
padded_input = tf.ones(shape=(1,difference))
if _input1_shape > _input2_shape:
padded_tensor = tf.concat([_input2 ,padded_input],axis=1)
scaled_output = tf.keras.layers.Dot(axes=1)([padded_tensor, _input1])
scaled_output -= tf.reduce_sum(padded_input)
return scaled_output
else:
padded_tensor = tf.concat([_input1 , padded_input],axis=1)
scaled_output = tf.keras.layers.Dot(axes=1)([padded_tensor, _input2])
scaled_output -= tf.reduce_sum(padded_input)
return scaled_output
x = tf.constant([[1, 2, 3, 4, 5, 9]])
y = tf.constant([[0, 9, 8]])
inp1 = tf.keras.layers.Input(shape=3)
inp2 = tf.keras.layers.Input(shape=5)
xd = tf.keras.layers.Dense(5, "relu")(x)
yd = tf.keras.layers.Dense(3, 'elu')(y)
combined = Linear1dDot()([xd, yd]) # tf.keras.layers.Dot(axes=1)([xd, yd])
Output:
<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[4.4694786]], dtype=float32)>

Issue with Keras backend flatten

Why does Keras.backend.flatten not show proper dimension? I have the following:
x is <tf.Tensor 'concat_8:0' shape=(?, 4, 8, 62) dtype=float32>
After:
Keras.backend.flatten(x)
x becomes: <tf.Tensor 'Reshape_22:0' shape=(?,) dtype=float32>
Why is x not of shape=(?, 4*8*62)
EDIT-1
I get (?, ?) if I use batch_flatten (branch3x3 & branch5x5 below are tensors from previous convolutions):
x = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=3))([branch3x3, branch5x5])
x = Lambda(lambda v: K.batch_flatten(v))(x)
Result of first Lambda is <tf.Tensor 'lambda_144/concat:0' shape=(?, 4, 8, 62) dtype=float32>
Result of second Lambda is <tf.Tensor 'lambda_157/Reshape:0' shape=(?, ?) dtype=float32>
EDIT-2
Tried batch_flatten but get an error downstream when I build the model output (using reshape instead of batch_flatten seems to work). branch3x3 is <tf.Tensor 'conv2d_202/Elu:0' shape=(?, 4, 8, 30) dtype=float32>, and branch5x5 is <tf.Tensor 'conv2d_203/Elu:0' shape=(?, 4, 8, 32) dtype=float32>:
from keras import backend as K
x = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=3))([branch3x3, branch5x5])
x = Lambda(lambda v: K.batch_flatten(v))(x)
y = Conv1D(filters=2, kernel_size=4)(Input(shape=(4, 1)))
y = Lambda(lambda v: K.batch_flatten(v))(y)
z = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=1))([x, y])
output = Dense(32, kernel_initializer=TruncatedNormal(), activation='linear')(z)
cnn = Model(inputs=[m1, m2], outputs=output)
The output statement results in the following error for the kernel_initializer: TypeError: Failed to convert object of type to Tensor. Contents: (None, 32). Consider casting elements to a supported type.
From the docstring of flatten:
def flatten(x):
"""Flatten a tensor.
# Arguments
x: A tensor or variable.
# Returns
A tensor, reshaped into 1-D
"""
So it turns a tensor with shape (batch_size, 4, 8, 62) into a 1-D tensor with shape (batch_size * 4 * 8 * 62,). That's why your new tensor has a 1-D shape (?,).
If you want to keep the first dimension, use batch_flatten:
def batch_flatten(x):
"""Turn a nD tensor into a 2D tensor with same 0th dimension.
In other words, it flattens each data samples of a batch.
# Arguments
x: A tensor or variable.
# Returns
A tensor.
"""
EDIT: You see the shape being (?, ?) because the shape is determined dynamically at runtime. If you feed in a numpy array, you can easily verify that the shape is correct.
input_tensor = Input(shape=(4, 8, 62))
x = Lambda(lambda v: K.batch_flatten(v))(input_tensor)
print(x)
Tensor("lambda_1/Reshape:0", shape=(?, ?), dtype=float32)
model = Model(input_tensor, x)
out = model.predict(np.random.rand(32, 4, 8, 62))
print(out.shape)
(32, 1984)
EDIT-2:
From the error message, it seems that TruncatedNormal requires a fixed output shape from the previous layer. So the dynamic shape (None, None) from batch_flatten won't work.
I can think of two options:
Provide manually computed output_shape to the Lambda layers:
x = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=3))([branch3x3, branch5x5])
x_shape = (np.prod(K.int_shape(x)[1:]),)
x = Lambda(lambda v: K.batch_flatten(v), output_shape=x_shape)(x)
input_y = Input(shape=(4, 1))
y = Conv1D(filters=2, kernel_size=4)(input_y)
y_shape = (np.prod(K.int_shape(y)[1:]),)
y = Lambda(lambda v: K.batch_flatten(v), output_shape=y_shape)(y)
z = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=1))([x, y])
output = Dense(32, kernel_initializer=TruncatedNormal(), activation='linear')(z)
cnn = Model(inputs=[m1, m2, input_y], outputs=output)
Use the Flatten layer (which calls batch_flatten and computes the output shape inside of it):
x = Concatenate(axis=3)([branch3x3, branch5x5])
x = Flatten()(x)
input_y = Input(shape=(4, 1))
y = Conv1D(filters=2, kernel_size=4)(input_y)
y = Flatten()(y)
z = Concatenate(axis=1)([x, y])
output = Dense(32, kernel_initializer=TruncatedNormal(), activation='linear')(z)
cnn = Model(inputs=[m1, m2, input_y], outputs=output)
I'd prefer the latter as it makes the code less cluttered. Also,
You can replace the Lambda layer wrapping K.concatenate() with a Concatenate layer.
Remember to move the Input(shape=(4, 1)) out and provide it in your Model(inputs=...) call.

Tensorflow, Cannot feed value of shape ..... for Tensor

I have a problem with linear regression and 3d matrices.
They are all floating point numbers, with labels.
I got started from this code but I changed the matrix:
https://aqibsaeed.github.io/2016-07-07-TensorflowLR/
With 2 dimensions, it is working well but, with 3, I can not get it running.
this is the shape
(387, 7, 10) shape train_x
(387, 1) shape train_x
(43, 7, 10) test_x.shape
(43, 1) test_y.shape
n_dim = f.shape[1]
train_x, test_x, train_y, test_y = train_test_split(f,l,test_size=0.1, shuffle =False)
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)
learning_rate = 0.01
training_epochs = 1000
cost_history = np.empty(shape=[1],dtype=float)
X = tf.placeholder(tf.float32,[None,n_dim])
Y = tf.placeholder(tf.float32,[None,1])
W = tf.Variable(tf.ones([n_dim,1]))
#init = tf.initialize_all_variables()
init = tf.global_variables_initializer()
y_ = tf.matmul(X, W)
cost = tf.reduce_mean(tf.square(y_ - Y))
training_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
sess = tf.Session()
sess.run(init)
for epoch in range(training_epochs):
sess.run(training_step,feed_dict={X:train_x,Y:train_y})
cost_history = np.append(cost_history,sess.run(cost,feed_dict={X: train_x,Y: train_y}))
plt.plot(range(len(cost_history)),cost_history)
plt.axis([0,training_epochs,0,np.max(cost_history)])
plt.show()
pred_y = sess.run(y_, feed_dict={X: test_x})
mse = tf.reduce_mean(tf.square(pred_y - test_y))
print("MSE: %.4f" % sess.run(mse))
fig, ax = plt.subplots()
ax.scatter(test_y, pred_y)
ax.plot([test_y.min(), test_y.max()], [test_y.min(), test_y.max()], 'k--', lw=3)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()
</ blink>
this is the mistake
\session.py", line 1100, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (387, 7, 10) for Tensor 'Placeholder_12:0', which has shape '(?, 7)'
Your error message shows exact reason why it is raised.
The dimension between placeholder and train_x doesn't fit.
train_x has a (387, 7, 10) shape. In usual convention, you have 387 datapoint which has (7, 10) dimension.
But, X (placeholder, the bucket you will put train_x in) has a [None, n_dim] (I guess n_dim is 7) shape.
Using [None, ~] in the first element is only accepted as the number of datapoints, not dimension of your data.
So you need to change [None, n_dim] to [None, 7, 10] in this case.
edited)
In this case, X is not exacty 3D data. just a bunch of 2D data. For direct weight multiplication of 2D data, you need convolution step. That is CNN. But you only have very small dimension data matrix, you just need to reshape (7,10) matrix shape data to (7*10) vector shape data.
Using tf.reshape function.tf.reshape(X, shape=[387, 7*10]) will be works, and also change your W to right dimension to multiply. like, tf.Variable(tf.ones([7*10,1])).

K-means example(tf.expand_dims)

In Example code of Kmeans of Tensorflow,
When use the function 'tf.expand_dims'(Inserts a dimension of 1 into a tensor's shape.) in point_expanded, centroids_expanded
before calculate tf.reduce_sum.
why is these have different indexes(0, 1) in second parameter?
import numpy as np
import tensorflow as tf
points_n = 200
clusters_n = 3
iteration_n = 100
points = tf.constant(np.random.uniform(0, 10, (points_n, 2)))
centroids = tf.Variable(tf.slice(tf.random_shuffle(points), [0, 0],[clusters_n, -1]))
points_expanded = tf.expand_dims(points, 0)
centroids_expanded = tf.expand_dims(centroids, 1)
distances = tf.reduce_sum(tf.square(tf.subtract(points_expanded, centroids_expanded)), 2)
assignments = tf.argmin(distances, 0)
means = []
for c in range(clusters_n):
means.append(tf.reduce_mean(tf.gather(points,tf.reshape(tf.where(tf.equal(assignments, c)), [1, -1])), reduction_indices=[1]))
new_centroids = tf.concat(means,0)
update_centroids = tf.assign(centroids, new_centroids)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for step in range(iteration_n):
[_, centroid_values, points_values, assignment_values] = sess.run([update_centroids, centroids, points, assignments])
print("centroids" + "\n", centroid_values)
plt.scatter(points_values[:, 0], points_values[:, 1], c=assignment_values, s=50, alpha=0.5)
plt.plot(centroid_values[:, 0], centroid_values[:, 1], 'kx', markersize=15)
plt.show()
This is done to subtract each centroid from each point. First, make sure you understand the notion of broadcasting (https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
that is linked from tf.subtract (https://www.tensorflow.org/api_docs/python/tf/subtract). Then, you just need to draw the shapes of points, expanded_points, centroids, and expanded_centroids and understand what values get "broadcast" where. Once you do that you will see that broadcasting allows you to compute exactly what you want - subtract each point from each centroid.
As a sanity check, since there are 200 points, 3 centroids, and each is 2D, we should have 200*3*2 differences. This is exactly what we get:
In [53]: points
Out[53]: <tf.Tensor 'Const:0' shape=(200, 2) dtype=float64>
In [54]: points_expanded
Out[54]: <tf.Tensor 'ExpandDims_4:0' shape=(1, 200, 2) dtype=float64>
In [55]: centroids
Out[55]: <tf.Variable 'Variable:0' shape=(3, 2) dtype=float64_ref>
In [56]: centroids_expanded
Out[56]: <tf.Tensor 'ExpandDims_5:0' shape=(3, 1, 2) dtype=float64>
In [57]: tf.subtract(points_expanded, centroids_expanded)
Out[57]: <tf.Tensor 'Sub_5:0' shape=(3, 200, 2) dtype=float64>
If you are having trouble drawing the shapes, you can think of broadcasting the expanded_points with dimension (1, 200, 2) to dimension (3, 200, 2) as copying the 200x2 matrix 3 times along the first dimension. The 3x2 matrix in centroids_expanded (of shape (3, 1, 2)) get copied 200 times along the second dimension.