Related
How to construct an equivalent multivariate normal distribution in tensorflow-probability, using TransformedDistribution and tfb.ScaleMatvecLinearOperator?
I'm reading about a tutorial on a bijector in tensorflow_probability: tfp.bijectors.ScaleMatvecLinearOperator.
An example was provided.
n = 10000
loc = 0
scale = 0.5
normal = tfd.Normal(loc=loc, scale=scale)
The above codes creates a univariate normal distribution.
tril = tf.random.normal((2, 4, 4))
scale_low_tri = tf.linalg.LinearOperatorLowerTriangular(tril)
scale_low_tri.to_dense()
The above codes created a tensor consisting of 2 lower triangular matrix:
<tf.Tensor: shape=(2, 4, 4), dtype=float32, numpy=
array([[[-0.56953585, 0. , 0. , 0. ],
[ 1.1368589 , 0.32028311, 0. , 0. ],
[-0.8328388 , -1.9963025 , -0.6005632 , 0. ],
[ 0.596155 , -0.214932 , 1.0988408 , -0.41731614]],
[[ 2.0778096 , 0. , 0. , 0. ],
[-1.1863967 , 2.4897904 , 0. , 0. ],
[ 0.38001925, 1.4962028 , 1.7609248 , 0. ],
[ 2.9253726 , 0.7047957 , 0.050508 , 0.58643174]]],
dtype=float32)>
Then a matrix-vector multiplication bijector is created:
scale_lin_op = tfb.ScaleMatvecLinearOperator(scale_low_tri)
After that, a TransformedDistribution is constructed as follows:
mvn = tfd.TransformedDistribution(normal, scale_lin_op, batch_shape=[2], event_shape=[4]) #
This should have worked in the old versions of tensorflow_probability. However the constructor of TransformedDistribution is changed now and does not accept the last two parameters batch_shape and event_shape. Therefore I tried to use the following way to do the same:
mvn2 = tfd.TransformedDistribution(
distribution=tfd.Sample(
normal,
sample_shape=[4] # base_dist.event_shape == [4]
),
bijector=scale_lin_op, ) # batch_shape=[2], event_shape=[4]
mvn2
And the result seems to have the correct batch_shape and event_shape
<tfp.distributions.TransformedDistribution 'scale_matvec_linear_operatorSampleNormal' batch_shape=[2] event_shape=[4] dtype=float32>
Then, another distribution for comparison is created:
mvn3 = tfd.MultivariateNormalLinearOperator(loc=loc, scale=scale_low_tri)
mvn3
According to the tutorial, the TransformedDistribution mvn2 should be equivalent to the MultivariateNormalLinearOperator mvn3.
# Check
xn = normal.sample((n, 2, 4)) # sample_shape = (n, 2, 4)
tf.norm(mvn2.log_prob(xn) - mvn3.log_prob(xn)) / tf.norm(mvn2.log_prob(xn))
<tf.Tensor: shape=(), dtype=float32, numpy=0.7498207>
But in my result they are not equivalent. (If they are, the above tensor should be 0)
What have I done wrong?
I have a list of the coefficient to degree 1 polynomials, with a[i][0]*x^1 + a[i][1]
a = np.array([[ 1. , 77.48514702],
[ 1. , 0. ],
[ 1. , 2.4239275 ],
[ 1. , 1.21848739],
[ 1. , 0. ],
[ 1. , 1.18181818],
[ 1. , 1.375 ],
[ 1. , 2. ],
[ 1. , 2. ],
[ 1. , 2. ]])
And running into issues with the following operation,
np.polydiv(reduce(np.polymul, a), a[0])[0] != reduce(np.polymul, a[1:])
where
In [185]: reduce(np.polymul, a[1:])
Out[185]:
array([ 1. , 12.19923307, 63.08691612, 179.21045388,
301.91486027, 301.5756213 , 165.35814595, 38.39582615,
0. , 0. ])
and
In [186]: np.polydiv(reduce(np.polymul, a), a[0])[0]
Out[186]:
array([ 1.00000000e+00, 1.21992331e+01, 6.30869161e+01, 1.79210454e+02,
3.01914860e+02, 3.01575621e+02, 1.65358169e+02, 3.83940472e+01,
1.37845155e-01, -1.06809521e+01])
First of all the remainder of np.polydiv(reduce(np.polymul, a), a[0]) is way bigger than 0, 827.61514239 to be exact, and secondly, the last two terms to quotient should be 0, but way larger from 0. 1.37845155e-01, -1.06809521e+01.
I'm wondering what are my options to improve the accuracy?
There is a slightly complicated way to keep the product first and then divide structure.
By first employ n points and evaluate on a.
xs = np.linspace(0, 1., 10)
ys = np.array([np.prod(list(map(lambda r: np.polyval(r, x), a))) for x in xs])
then do the division on ys instead of coefficients.
ys = ys/np.array([np.polyval(a[0], x) for x in xs])
finally recover the coefficient using polynomial interpolation with xs and ys
from scipy.interpolate import lagrange
lagrange(xs, ys)
I want some more control over the TensorFlow dataset generation. For this reason, I want to mirror the behavior of timeseries_dataset_from_array but with the ability to use consecutive windows or non-overlapping windows (not possible with timeseries_dataset_from_array to set sequence_stride=0).
# df_with_inputs = (x, 19) df_with_labels = (x,1)
ds = tf.data.Dataset.from_tensor_slices((df_with_inputs.values, df_with_labels.values)).window(20, shift=1, stride=1, drop_remainder=True).batch(32)
equals to:
ds = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs[df_with_inputs.columns], df_with_labels[df_with_labels.columns], sequence_length=window_size,sequence_stride=1,shuffle=False,batch_size=batch_size)
both create a BatchDataset with the same amount of samples, but the type-spec of the dataset with the manual method is somehow different, e.g., first, give me:
<BatchDataset shapes: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None]))), types: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None])))>
where the last one give me:
<BatchDataset shapes: ((None, None, 19), (None, 1)), types: (tf.float64, tf.int32)>
. But both contain the same amount of elements, in my case, 3063. Note that stride and sequence_stride have different behavior in both methods (for the same behavior, you need shift=1). Additionally, when I try to feed the first to my NN, I receive the following error (where the ds of timeseries_dataset_from_array works like a charm):
TypeError: Inputs to a layer should be tensors.
Any idea what I am missing here?
My model:
input_shape = (window_size, num_features) #(20,19)
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', padding="same",
input_shape=input_shape), [....]])
The equivalent of this:
import tensorflow as tf
tf.random.set_seed(345)
samples = 30
df_with_inputs = tf.random.normal((samples, 2), dtype=tf.float32)
df_with_labels = tf.random.uniform((samples, 1), maxval=2, dtype=tf.int32)
batch_size = 2
window_size = 20
ds1 = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs, df_with_labels, sequence_length=window_size,sequence_stride=1,shuffle=False, batch_size=batch_size)
for x, y in ds1.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
Using tf.data.Dataset.from_tensor_slices would be this:
ds2 = tf.data.Dataset.from_tensor_slices((df_with_inputs, df_with_labels)).batch(batch_size)
inputs_only_ds = ds2.map(lambda x, y: x)
inputs_only_ds = inputs_only_ds.flat_map(tf.data.Dataset.from_tensor_slices).window(window_size, shift=1, stride=1, drop_remainder=True).flat_map(lambda x: x.batch(window_size)).batch(batch_size)
ds2 = tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y)))
for x, y in ds2.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
Note that flap_map is necessary to flatten the tensor in order to apply sliding windows more easily. The function flat_map(lambda x: x.batch(window_size)) simply creates batches of the flattened tensor after applying sliding windows.
With the line inputs_only_ds = ds2.map(lambda x, y: x) I extract only the data (x) without the labels (y) to run sliding windows. Afterwards, in tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y))), I concatenate / zip the dataset with the sliding windows and the labels (y) resulting in the final result ds2.
If I understand a neural net correctly - it is just a graph of nodes and edges where each node in a given layer is connected to every node in the following layer.
The nodes have weights and the edges have weights? And you do some multiplication of these values to get a prediction.
Given a 2 layer model (with 2 input nodes 'a & b' and 1 output node 'c'), this is what I am after:
| source | destination | value |
+--------+-------------+-------+
| a | c | 0.01 |
| b | c | 0.03 |
But when I call model.weights (albeit on a more complex model) I get a bunch of keyless np arrays with no way to tell which values belong to which nodes.
[<tf.Variable 'dense_1/kernel:0' shape=(8, 12) dtype=float32, numpy=
array([[ 0.31751466, 0.20620143, 0.09791961, -0.08813753, 0.2515421 ,
-0.53187364, -0.15702713, 0.0267031 , -0.48389524, -0.13240823,
0.39453653, -0.39209265],
[ 0.31308496, -0.38468117, -0.03970708, 0.2889997 , 0.03803336,
0.04796927, -0.5140167 , 0.04645742, 0.08511442, -0.09435426,
0.03105392, -0.17520434],
[ 0.05365064, -0.05402106, -0.02931813, 0.13150737, 0.08898667,
0.20198704, 0.28716817, 0.21081768, -0.09572094, 0.14665389,
-0.3083644 , -0.47491354],
[-0.36734372, -0.12509695, -0.16984704, -0.19592582, 0.24023046,
-0.28856498, 0.11084742, 0.12101128, 0.00146453, -0.4996385 ,
-0.23521361, 0.24130017],
[ 0.21538568, -0.08531788, -0.32247233, -0.09213281, -0.39390212,
0.05042276, 0.22282743, -0.11438937, -0.00920196, 0.12748554,
-0.02741051, -0.12594655],
[ 0.3057384 , -0.20449257, 0.16837521, 0.21493798, -0.14034544,
0.45435148, -0.0548106 , 0.07033874, 0.39275315, -0.3332669 ,
-0.10222256, 0.14674312],
[ 0.36575058, 0.07205153, -0.14340317, -0.57348907, 0.7167731 ,
-0.29590985, 0.6351 , -0.6615748 , -0.23423046, -0.1065482 ,
0.7084621 , 0.02146828],
[-0.14760445, -0.4926324 , 0.30986223, 0.4067813 , 0.32313958,
-0.39595246, 0.12813015, -0.3088377 , -0.7285755 , 0.6085407 ,
0.39351743, -0.09248918]], dtype=float32)>,
<tf.Variable 'dense_1/bias:0' shape=(12,) dtype=float32, numpy=
array([-1.1890789 , 0. , -0.43765482, 0.5292001 , -0.94201744,
0.44064137, -0.5898111 , 0.8738893 , -0.62948394, 0.9394948 ,
0.47176355, 0. ], dtype=float32)>,
<tf.Variable 'dense_2/kernel:0' shape=(12, 8) dtype=float32, numpy=
array([[ 0.18743241, -0.04509293, 0.26035592, -0.40080604, -0.2120734 ,
0.0604641 , 0.17452721, -0.25245216],
[-0.4116977 , 0.4476785 , 0.13495606, 0.38070595, -0.16811815,
-0.5323667 , -0.41471216, 0.49056184],
[-0.43843648, -0.01767761, 0.03876654, 0.279591 , -0.64866304,
0.4605058 , 0.50288963, 0.46865177],
[-0.50431 , 0.26749972, -0.4822985 , 0.11643535, 0.34190154,
0.28961414, -0.19484225, 0.32788265],
[-0.4659909 , 0.12863334, -0.17177017, 0.27696657, -0.08261362,
0.1787579 , -0.49217325, -0.419283 ],
[-0.31586087, 0.4421215 , -0.35133213, -0.40784043, 0.3213457 ,
0.08262701, -0.20723267, -0.4305911 ],
[-0.32226318, -0.3479017 , -0.48984393, -0.19052912, 0.27398133,
-0.18631694, -0.42036086, -0.31824118],
[-0.04223084, -0.38938865, -0.33997327, -0.7986885 , -0.12062006,
-0.37880445, 0.06364141, 0.41674942],
[-0.07699671, -1.0260301 , -0.38287994, 0.46872973, -0.32630473,
0.37103057, 0.06274027, -0.25317484],
[-0.11334842, 0.29602957, 0.01759415, 0.07748368, -0.0767558 ,
0.13787462, -0.31502756, 0.17331126],
[-0.5030543 , -0.23578712, -0.38978124, 0.01187875, -0.02882512,
-0.5208091 , -0.4208508 , -0.08294159],
[ 0.04435921, 0.545004 , 0.07590699, 0.21470094, -0.46099266,
-0.25307545, -0.31362575, 0.3284188 ]], dtype=float32)>,
<tf.Variable 'dense_2/bias:0' shape=(8,) dtype=float32, numpy=
array([ 0. , 1.3254918 , -0.18484406, -0.0136466 , 1.2459729 ,
-1.331188 , -0.01439124, 0.9184486 ], dtype=float32)>,
<tf.Variable 'dense_3/kernel:0' shape=(8, 1) dtype=float32, numpy=
array([[-0.27390796],
[-0.40990734],
[-0.12878264],
[-0.43434066],
[-0.04099607],
[ 0.57922167],
[ 0.3830525 ],
[-0.47695825]], dtype=float32)>, <tf.Variable 'dense_3/bias:0' shape=(1,) dtype=float32, numpy=array([-1.3391492], dtype=float32)>]
Is there a JSON/dictionary-like way to get what I am after?
The "sources" and "destinations" of those edges don't have names like "a" and "b", they're just the kth neuron of the nth layer. The weights, then, are just an array. For example, weights[n][i][j] might be the weight of the edge connecting the ith neuron of layer n to the jth neuron of layer n+1. In this paradigm, the weights of your textbook example would look like
[[[ 0.8 0.4 0.3 ] [ 0.2 0.9 0.5 ]]
[[ 0.3 0.5 0.9 ]]]
When you take into account the fact that each neuron can have a bias as well as incoming weights, and that different layers' different numbers of neurons would make the 3D array ragged (which is inconvenient), you might find that the most convenient way to store it is as a structure that contains several 2D arrays (each one containing the weights for one pair of layers) and several 1D arrays (each containing the biases for one layer), all of different sizes... which is exactly what the dump you provided shows.
I made a simple example ipython notebook to calculate convolution with theano and with numpy, however the results are different. Does anybody know where is the mistake?
import theano
import numpy
from theano.sandbox.cuda import dnn
import theano.tensor as T
Define the input image x0:
x0 = numpy.array([[[[ 7.61323881, 0. , 0. , 0. ,
0. , 0. ],
[ 25.58142853, 0. , 0. , 0. ,
0. , 0. ],
[ 7.51445341, 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 12.74498367, 4.96315479, 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. ]]]], dtype='float32')
x0.shape
# (1, 1, 6, 6)
Define the convolution kernel:
w0 = numpy.array([[[[-0.0015835 , -0.00088091, 0.00226375, 0.00378434, 0.00032208,
-0.00396959],
[-0.000179 , 0.00030951, 0.00113849, 0.00012536, -0.00017198,
-0.00318825],
[-0.00263921, -0.00383847, -0.00225416, -0.00250589, -0.00149073,
-0.00287099],
[-0.00149283, -0.00312137, -0.00431571, -0.00394508, -0.00165113,
-0.0012118 ],
[-0.00167376, -0.00169753, -0.00373235, -0.00337372, -0.00025546,
0.00072154],
[-0.00141197, -0.00099017, -0.00091934, -0.00226817, -0.0024105 ,
-0.00333713]]]], dtype='float32')
w0.shape
# (1, 1, 6, 6)
Calculate the convolution with theano and cudnn:
X = T.tensor4('input')
W = T.tensor4('W')
conv_out = dnn.dnn_conv(img=X, kerns=W)
convolution = theano.function([X, W], conv_out)
numpy.array(convolution(x0, w0))
# array([[[[-0.04749081]]]], dtype=float32)
Calculate convolution with numpy (note the result is different):
numpy.sum(x0 * w0)
# -0.097668208
I'm not exactly sure what kind of convolution you are trying to compute, but it seems to me that numpy.sum(x0*w0) might not be the way to do it. Does this help?
import numpy as np
# ... define x0 and w0 like in your example ...
np_convolution = np.fft.irfftn(np.fft.rfftn(x0) * np.fft.rfftn(w0))
The last element of the resulting array, i.e. np_convolution[-1,-1,-1,-1] is -0.047490807560833327, which seems to be the answer you're looking for in your notebook.