I have the following ndarray (which is stacked by 351 3x3 matrices)
tensor = np.ones((351,3,3))
b = np.ones((351,3))
Applying a function such as :
np.linalg.tensorinv(tensor)
np.linalg.tensorsolve(tensor,b)
Gives me the following error:
"{LinAlgError}Last 2 dimensions of the array must be square"
Why does that error occur? I mean the last two dimensions are square (3x3). This even do not work with tensor.T (which is 3x3x351). Thanks for any help.
The sense in which the tensorinv operation defines square dimensions is somewhat unusual. tensorinv takes a parameter ind and a tensor is "square" if the product of the indices up to (but not including) ind and the product of the indices from ind to the last index are equal, i.e. prod(tensor.shape[:ind]) == prod(tensor.shape[ind:]). This is useful for defining inverses of tensor operations or solving tensor contraction equations, but based on the shape of your examples, I expect this isn't what you are trying to do.
You seem to be wanting to solve 315 different linear systems of equations Ax=b. You should be able to do this with just np.linalg.solve(tensor, b) (though not with your examples in the question, as your tensor of all ones would be a bunch of singular matrices). Rewriting your example to make tensor smaller and a collection of identity matrices rather than all ones:
>>> temp=np.eye(3)
>>> tensor=np.repeat(temp[np.newaxis,:,:],4,axis=0)
>>> tensor.shape
(4, 3, 3)
>>> b=np.ones((4,3))
>>> b
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
>>> np.linalg.solve(tensor,b)
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Here is an example to solve similar questions from the issue #43561
When I was trying to load the sequential model here using tf.keras.models.load_model in TF 2.3.1, an error is thrown at the following location:
~/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py in _should_skip_first_node(layer)
1031 return (isinstance(layer, Functional) and
1032 # Filter out Sequential models without an input shape.
-> 1033 isinstance(layer._layers[0], input_layer_module.InputLayer))
1034
1035
IndexError: list index out of range
The model is believed to be trained using keras and under TF1.9, and the model definition can be found here, and here's the code for training.
Here you can find the full stack trace and running code under TF 2.3.1: https://colab.research.google.com/drive/1Lfo0O7D0cM8EtR0h6noqCoWqoqf8bzAD?usp=sharing
Then I downgraded to TF 2.2 and 2.1 with the same code above, it threw the error just as #35934 Keras Model Errors on Loading - 'list' object has no attribute 'items'
Then I downgraded to TF 2.0, the code was executing indefinitely. Finally I had to manually stop it:
/opt/conda/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py in IsMapping(o)
2569
2570 """
-> 2571 return _pywrap_tensorflow_internal.IsMapping(o)
2572
2573 def IsMappingView(o):
KeyboardInterrupt:
Here you can find the full stack trace when I stopped the code under TF 2.0: https://colab.research.google.com/drive/1fCR-ci05NuYhQ8M9O2lRVG0F0YzI9Ggo?usp=sharing
Then I have tried to use keras instead of tf.keras with TF 2.3.1 and Keras 2.3.1, first I encountered an error that can be solved in this way: https://github.com/tensorflow/tensorflow/issues/38589#issuecomment-665930503 . Then another error occurs:
~/.local/lib/python3.7/site-packages/tensorflow/python/keras/backend.py in function(inputs, outputs, updates, name, **kwargs)
3931 if updates:
3932 raise ValueError('`updates` argument is not supported during '
-> 3933 'eager execution. You passed: %s' % (updates,))
3934 from tensorflow.python.keras import models # pylint: disable=g-import-not-at-top
3935 from tensorflow.python.keras.utils import tf_utils # pylint: disable=g-import-not-at-top
ValueError: `updates` argument is not supported during eager execution. You passed: [<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=0>, <tf.Variable 'UnreadVariable' shape=(3, 3, 3, 32) dtype=float32, numpy=
array([[[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0.],
......
The gist is here: https://colab.research.google.com/drive/1OovMHVrMBsIwcwn2PUcgbEXHUfPLMdyM?usp=sharing
So this way fails.
Solutions
One way is to use TF 1.15.4 and Keras 2.3.1, and finally it worked out fine, inputs, outputs, summary etc. are all parsed correctly, as well as being able to run data through the model: https://colab.research.google.com/drive/1XaRMeiT1SefS6Q10wsa0y9rEercyFlCR?usp=sharing
Another is to modify the TF 2.3.1 source code so that the model can be used in latest version using tensorflow keras. You have to redefining _should_skip_first_node in file tensorflow/python/keras/engine/functional.py:
def _should_skip_first_node(layer):
"""Returns True if the first layer node should not be saved or loaded."""
# Networks that are constructed with an Input layer/shape start with a
# pre-existing node linking their input to output. This node is excluded from
# the network config.
if layer._layers:
return (isinstance(layer, Functional) and
# Filter out Sequential models without an input shape.
isinstance(layer._layers[0], input_layer_module.InputLayer))
else:
return isinstance(layer, Functional)
Afterwards
I have submitted a PR #43570 to tensorflow, hope it will get fixed in future TF versions.
I try to calculate the KL-Divergence between two OneHotCategorical distributions with the following code:
posterior = tfd.OneHotCategorical(probs=[0., 0., 0., 0., 0., 0., 0., 1.])
prior = tfd.OneHotCategorical(probs=[0., 0., 0., 0., 0., 0., 0., 1.])
and the result:
print(posterior.kl_divergence(prior))
tf.Tensor(nan, shape=(), dtype=float32)
Is this a bug or is the result wanted? I tested it a little bit and the error refers to the calculation of the KL-divergence, where a problem with 0. values occurs (logarithm and division). In this case, the KL-divergence should get set to 0.0 in my opinion.
Regards
Tensorflow version: 2.1.0-rc1
Tensorflow probability version: 0.8.0
Can you report it on http://github.com/tensorflow/probability ? We track issues there.
Any non-zero recurrent_dropout yields NaN losses and weights; latter are either 0 or NaN. Happens for stacked, shallow, stateful, return_sequences = any, with & w/o Bidirectional(), activation='relu', loss='binary_crossentropy'. NaNs occur within a few batches.
Any fixes? Help's appreciated.
TROUBLESHOOTING ATTEMPTED:
recurrent_dropout=0.2,0.1,0.01,1e-6
kernel_constraint=maxnorm(0.5,axis=0)
recurrent_constraint=maxnorm(0.5,axis=0)
clipnorm=50 (empirically determined), Nadam optimizer
activation='tanh' - no NaNs, weights stable, tested for up to 10 batches
lr=2e-6,2e-5 - no NaNs, weights stable, tested for up to 10 batches
lr=5e-5 - no NaNs, weights stable, for 3 batches - NaNs on batch 4
batch_shape=(32,48,16) - large loss for 2 batches, NaNs on batch 3
NOTE: batch_shape=(32,672,16), 17 calls to train_on_batch per batch
ENVIRONMENT:
Keras 2.2.4 (TensorFlow backend), Python 3.7, Spyder 3.3.7 via Anaconda
GTX 1070 6GB, i7-7700HQ, 12GB RAM, Win-10.0.17134 x64
CuDNN 10+, latest Nvidia drives
ADDITIONAL INFO:
Model divergence is spontaneous, occurring at different train updates even with fixed seeds - Numpy, Random, and TensorFlow random seeds. Furthermore, when first diverging, LSTM layer weights are all normal - only going to NaN later.
Below are, in order: (1) inputs to LSTM; (2) LSTM outputs; (3) Dense(1,'sigmoid') outputs -- the three are consecutive, with Dropout(0.5) between each. Preceding (1) are Conv1D layers. Right: LSTM weights. "BEFORE" = 1 train update before; "AFTER = 1 train update after
BEFORE divergence:
AT divergence:
## LSTM outputs, flattened, stats
(mean,std) = (inf,nan)
(min,max) = (0.00e+00,inf)
(abs_min,abs_max) = (0.00e+00,inf)
AFTER divergence:
## Recurrent Gates Weights:
array([[nan, nan, nan, ..., nan, nan, nan],
[ 0., 0., -0., ..., -0., 0., 0.],
[ 0., -0., -0., ..., -0., 0., 0.],
...,
[nan, nan, nan, ..., nan, nan, nan],
[ 0., 0., -0., ..., -0., 0., -0.],
[ 0., 0., -0., ..., -0., 0., 0.]], dtype=float32)
## Dense Sigmoid Outputs:
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)
MINIMAL REPRODUCIBLE EXAMPLE:
from keras.layers import Input,Dense,LSTM,Dropout
from keras.models import Model
from keras.optimizers import Nadam
from keras.constraints import MaxNorm as maxnorm
import numpy as np
ipt = Input(batch_shape=(32,672,16))
x = LSTM(512, activation='relu', return_sequences=False,
recurrent_dropout=0.3,
kernel_constraint =maxnorm(0.5, axis=0),
recurrent_constraint=maxnorm(0.5, axis=0))(ipt)
out = Dense(1, activation='sigmoid')(x)
model = Model(ipt,out)
optimizer = Nadam(lr=4e-4, clipnorm=1)
model.compile(optimizer=optimizer,loss='binary_crossentropy')
for train_update,_ in enumerate(range(100)):
x = np.random.randn(32,672,16)
y = np.array([1]*5 + [0]*27)
np.random.shuffle(y)
loss = model.train_on_batch(x,y)
print(train_update+1,loss,np.sum(y))
Observations: the following speed up divergence:
Higher units (LSTM)
Higher # of layers (LSTM)
Higher lr << no divergence when <=1e-4, tested up to 400 trains
Less '1' labels << no divergence with y below, even with lr=1e-3; tested up to 400 trains
y = np.random.randint(0,2,32) # makes more '1' labels
UPDATE: not fixed in TF2; reproducible also using from tensorflow.keras imports.
Studying LSTM formulae deeper and digging into the source code, everything's come crystal clear.
Verdict: recurrent_dropout has nothing to do with it; a thing's being looped where none expect it.
Actual culprit: the activation argument, now 'relu', is applied on the recurrent transformations - contrary to virtually every tutorial showing it as the harmless 'tanh'.
I.e., activation is not only for the hidden-to-output transform - source code; it operates directly on computing both recurrent states, cell and hidden:
c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c, self.recurrent_kernel_c))
h = o * self.activation(c)
Solution(s):
Apply BatchNormalization to LSTM's inputs, especially if previous layer's outputs are unbounded (ReLU, ELU, etc)
If previous layer's activations are tightly bounded (e.g. tanh, sigmoid), apply BN before activations (use activation=None, then BN, then Activation layer)
Use activation='selu'; more stable, but can still diverge
Use lower lr
Apply gradient clipping
Use fewer timesteps
More answers, to some remaining questions:
Why was recurrent_dropout suspected? Unmeticulous testing setup; only now did I focus on forcing divergence without it. It did however, sometimes accelerate divergence - which may be explained by by it zeroing the non-relu contributions that'd otherwise offset multiplicative reinforcement.
Why do nonzero mean inputs accelerate divergence? Additive symmetry; nonzero-mean distributions are asymmetric, with one sign dominating - facilitating large pre-activations, hence large ReLUs.
Why can training be stable for hundreds of iterations with a low lr? Extreme activations induce large gradients via large error; with a low lr, this means weights adjust to prevent such activations - whereas a high lr jumps too far too quickly.
Why do stacked LSTMs diverge faster? In addition to feeding ReLUs to itself, LSTM feeds the next LSTM, which then feeds itself the ReLU'd ReLU's --> fireworks.
UPDATE 1/22/2020: recurrent_dropout may in fact be a contributing factor, as it utilizes inverted dropout, upscaling hidden transformations during training, easing divergent behavior over many timesteps. Git Issue on this here
I am new to Tensorflow and I want to multiply two distributions to get posterior density. How can I do it using tensorflow?
For example:
likelihood = tf.contrib.distributions.MultivariateNormalDiag(loc = [0., 0., 0.], scale_diag= [1., 1., 1.])
prior = tf.contrib.distributions.MultivariateNormalDiag(loc = [0., 0., 0.], scale_diag= [1., 1., 1.])
I tried using tf.multiply(likelihood,prior) but it gives me datatype error
Failed to convert object of type to Tensor. Contents: tf.distributions.MultivariateNormalDiag("MultivariateNormalDiag", batch_shape=(), event_shape=(3,), dtype=float32). Consider casting elements to a supported type.
Can anyone please help me with this.
Help much appreciated.
Thanks
A tf.distribution is an object, and hence can not be used as a Tensor.
You can instead multiply (or sum) the .prob (.log_prob) methods.