Data standardization, across samples or across features? - numpy

I have 4 samples data with 5 features, as an array, data.
import numpy as np
data = np.array([[1,1,1,1,0],
[0,0,0,0,0],
[1,1,1,1,0],
[1,0,0,0,0]])
print (data)
n_samples, n_features = data.shape = (4,5)
When I apply StandardScaler on it as follows, does it standardize the data across features or across samples?
from sklearn.preprocessing import StandardScaler, MinMaxScaler
result = StandardScaler().fit_transform(data)
print (result)
[[ 0.57735027 1. 1. 1. 0. ]
[-1.73205081 -1. -1. -1. 0. ]
[ 0.57735027 1. 1. 1. 0. ]
[ 0.57735027 -1. -1. -1. 0. ]]
What's the best practice of data standardization in machine learning, across samples or across features?

in case of StandardScaler/MinMaxScaler the data are scaled across features and this is the best common practice
import numpy as np
from sklearn.preprocessing import StandardScaler
data = np.array([[1,1,1,1,0],
[0,0,0,0,0],
[1,1,1,1,0],
[1,0,0,0,0]])
result = StandardScaler().fit_transform(data)
result
array([[ 0.57735027, 1. , 1. , 1. , 0. ],
[-1.73205081, -1. , -1. , -1. , 0. ],
[ 0.57735027, 1. , 1. , 1. , 0. ],
[ 0.57735027, -1. , -1. , -1. , 0. ]])
you can verify it by your self
(data - data.mean(0))/data.std(0).clip(1e-5)
array([[ 0.57735027, 1. , 1. , 1. , 0. ],
[-1.73205081, -1. , -1. , -1. , 0. ],
[ 0.57735027, 1. , 1. , 1. , 0. ],
[ 0.57735027, -1. , -1. , -1. , 0. ]])

Related

How to construct an equivalent multivariate normal distribution in tensorflow-probability, using TransformedDistribution?

How to construct an equivalent multivariate normal distribution in tensorflow-probability, using TransformedDistribution and tfb.ScaleMatvecLinearOperator?
I'm reading about a tutorial on a bijector in tensorflow_probability: tfp.bijectors.ScaleMatvecLinearOperator.
An example was provided.
n = 10000
loc = 0
scale = 0.5
normal = tfd.Normal(loc=loc, scale=scale)
The above codes creates a univariate normal distribution.
tril = tf.random.normal((2, 4, 4))
scale_low_tri = tf.linalg.LinearOperatorLowerTriangular(tril)
scale_low_tri.to_dense()
The above codes created a tensor consisting of 2 lower triangular matrix:
<tf.Tensor: shape=(2, 4, 4), dtype=float32, numpy=
array([[[-0.56953585, 0. , 0. , 0. ],
[ 1.1368589 , 0.32028311, 0. , 0. ],
[-0.8328388 , -1.9963025 , -0.6005632 , 0. ],
[ 0.596155 , -0.214932 , 1.0988408 , -0.41731614]],
[[ 2.0778096 , 0. , 0. , 0. ],
[-1.1863967 , 2.4897904 , 0. , 0. ],
[ 0.38001925, 1.4962028 , 1.7609248 , 0. ],
[ 2.9253726 , 0.7047957 , 0.050508 , 0.58643174]]],
dtype=float32)>
Then a matrix-vector multiplication bijector is created:
scale_lin_op = tfb.ScaleMatvecLinearOperator(scale_low_tri)
After that, a TransformedDistribution is constructed as follows:
mvn = tfd.TransformedDistribution(normal, scale_lin_op, batch_shape=[2], event_shape=[4]) #
This should have worked in the old versions of tensorflow_probability. However the constructor of TransformedDistribution is changed now and does not accept the last two parameters batch_shape and event_shape. Therefore I tried to use the following way to do the same:
mvn2 = tfd.TransformedDistribution(
distribution=tfd.Sample(
normal,
sample_shape=[4] # base_dist.event_shape == [4]
),
bijector=scale_lin_op, ) # batch_shape=[2], event_shape=[4]
mvn2
And the result seems to have the correct batch_shape and event_shape
<tfp.distributions.TransformedDistribution 'scale_matvec_linear_operatorSampleNormal' batch_shape=[2] event_shape=[4] dtype=float32>
Then, another distribution for comparison is created:
mvn3 = tfd.MultivariateNormalLinearOperator(loc=loc, scale=scale_low_tri)
mvn3
According to the tutorial, the TransformedDistribution mvn2 should be equivalent to the MultivariateNormalLinearOperator mvn3.
# Check
xn = normal.sample((n, 2, 4)) # sample_shape = (n, 2, 4)
tf.norm(mvn2.log_prob(xn) - mvn3.log_prob(xn)) / tf.norm(mvn2.log_prob(xn))
<tf.Tensor: shape=(), dtype=float32, numpy=0.7498207>
But in my result they are not equivalent. (If they are, the above tensor should be 0)
What have I done wrong?

Addressing polynomial multiplication and division "overflow" issue

I have a list of the coefficient to degree 1 polynomials, with a[i][0]*x^1 + a[i][1]
a = np.array([[ 1. , 77.48514702],
[ 1. , 0. ],
[ 1. , 2.4239275 ],
[ 1. , 1.21848739],
[ 1. , 0. ],
[ 1. , 1.18181818],
[ 1. , 1.375 ],
[ 1. , 2. ],
[ 1. , 2. ],
[ 1. , 2. ]])
And running into issues with the following operation,
np.polydiv(reduce(np.polymul, a), a[0])[0] != reduce(np.polymul, a[1:])
where
In [185]: reduce(np.polymul, a[1:])
Out[185]:
array([ 1. , 12.19923307, 63.08691612, 179.21045388,
301.91486027, 301.5756213 , 165.35814595, 38.39582615,
0. , 0. ])
and
In [186]: np.polydiv(reduce(np.polymul, a), a[0])[0]
Out[186]:
array([ 1.00000000e+00, 1.21992331e+01, 6.30869161e+01, 1.79210454e+02,
3.01914860e+02, 3.01575621e+02, 1.65358169e+02, 3.83940472e+01,
1.37845155e-01, -1.06809521e+01])
First of all the remainder of np.polydiv(reduce(np.polymul, a), a[0]) is way bigger than 0, 827.61514239 to be exact, and secondly, the last two terms to quotient should be 0, but way larger from 0. 1.37845155e-01, -1.06809521e+01.
I'm wondering what are my options to improve the accuracy?
There is a slightly complicated way to keep the product first and then divide structure.
By first employ n points and evaluate on a.
xs = np.linspace(0, 1., 10)
ys = np.array([np.prod(list(map(lambda r: np.polyval(r, x), a))) for x in xs])
then do the division on ys instead of coefficients.
ys = ys/np.array([np.polyval(a[0], x) for x in xs])
finally recover the coefficient using polynomial interpolation with xs and ys
from scipy.interpolate import lagrange
lagrange(xs, ys)

How can the numpy computing sequence influence the result?

why the following two lines of code compute the same thing ,but I get the different results.
kernel1 = np.diag(np.exp(-scale*eigen_values))
kernel2 = np.exp(-scale*np.diag(eigen_values))
the
np.all(kernel1==kernel2)
output
False
Look at the values! Then you'll see the problem: when given a 1-d array, numpy.diag creates a 2-d array with zeros in the off-diagonal positions. In kernel1, you do diag last, so the off-diagonal values are 0. In kernel2, you apply exp after diag, and exp(0) is 1, so in kernel2, the off-diagonal terms are all 1. (Remember that numpy.exp is applied element-wise; it is not the matrix exponential.)
In [19]: eigen_values = np.array([1, 0.5, 0.1])
In [20]: scale = 1.0
In [21]: np.diag(np.exp(-scale*eigen_values))
Out[21]:
array([[0.36787944, 0. , 0. ],
[0. , 0.60653066, 0. ],
[0. , 0. , 0.90483742]])
In [22]: np.exp(-scale*np.diag(eigen_values))
Out[22]:
array([[0.36787944, 1. , 1. ],
[1. , 0.60653066, 1. ],
[1. , 1. , 0.90483742]])

Scatter operation for middle dimension of a tensor

I have a 3d tensor where I need to preserve vectors at certain positions in the second dimension, and zero out the remaining vectors. The positions are specified as a 1d array. I'm thinking the best way to do this is to multiply the tensor with a binary mask.
Here's a simple Numpy version:
A.shape: (b, n, m)
indices.shape: (b)
mask = np.zeros(A.shape)
for i in range(b):
mask[i][indices[i]] = 1
result = A*mask
So for each nxm matrix in A, I need to preserve rows specified by indices, and zero out the rest.
I'm trying to do this in TensorFlow using tf.scatter_nd op, but I can't figure out the correct shape of indices:
shape = tf.constant([3,5,4])
A = tf.random_normal(shape)
indices = tf.constant([2,1,4]) #???
updates = tf.ones((3,4))
mask = tf.scatter_nd(indices, updates, shape)
result = A*mask
Here's one way to do it, creating a mask and using tf.where:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
shape = tf.constant([3,5,4])
A = tf.random_normal(shape)
array_shape = tf.shape(A)
indices = tf.constant([2,1,4])
non_zero_indices = tf.stack((tf.range(array_shape[0]), indices), axis=1)
should_keep_row = tf.scatter_nd(non_zero_indices, tf.ones_like(indices),
shape=[array_shape[0], array_shape[1]])
print("should_keep_row", should_keep_row)
masked = tf.where(tf.cast(tf.tile(should_keep_row[:, :, None],
[1, 1, array_shape[2]]), tf.bool),
A,
tf.zeros_like(A))
print("masked", masked)
Prints:
should_keep_row tf.Tensor(
[[0 0 1 0 0]
[0 1 0 0 0]
[0 0 0 0 1]], shape=(3, 5), dtype=int32)
masked tf.Tensor(
[[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0.02036316 -0.07163608 -3.16707373 1.31406844]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]
[[ 0. 0. 0. 0. ]
[-0.76696759 -0.28313264 0.87965059 -1.28844094]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]
[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 1.03188455 0.44305769 0.71291149 1.59758031]]], shape=(3, 5, 4), dtype=float32)
(The example is using eager execution, but the same ops will work with graph execution in a Session)

Why are the convolution outputs calculated with theano and numpy not the same?

I made a simple example ipython notebook to calculate convolution with theano and with numpy, however the results are different. Does anybody know where is the mistake?
import theano
import numpy
from theano.sandbox.cuda import dnn
import theano.tensor as T
Define the input image x0:
x0 = numpy.array([[[[ 7.61323881, 0. , 0. , 0. ,
0. , 0. ],
[ 25.58142853, 0. , 0. , 0. ,
0. , 0. ],
[ 7.51445341, 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 12.74498367, 4.96315479, 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. ]]]], dtype='float32')
x0.shape
# (1, 1, 6, 6)
Define the convolution kernel:
w0 = numpy.array([[[[-0.0015835 , -0.00088091, 0.00226375, 0.00378434, 0.00032208,
-0.00396959],
[-0.000179 , 0.00030951, 0.00113849, 0.00012536, -0.00017198,
-0.00318825],
[-0.00263921, -0.00383847, -0.00225416, -0.00250589, -0.00149073,
-0.00287099],
[-0.00149283, -0.00312137, -0.00431571, -0.00394508, -0.00165113,
-0.0012118 ],
[-0.00167376, -0.00169753, -0.00373235, -0.00337372, -0.00025546,
0.00072154],
[-0.00141197, -0.00099017, -0.00091934, -0.00226817, -0.0024105 ,
-0.00333713]]]], dtype='float32')
w0.shape
# (1, 1, 6, 6)
Calculate the convolution with theano and cudnn:
X = T.tensor4('input')
W = T.tensor4('W')
conv_out = dnn.dnn_conv(img=X, kerns=W)
convolution = theano.function([X, W], conv_out)
numpy.array(convolution(x0, w0))
# array([[[[-0.04749081]]]], dtype=float32)
Calculate convolution with numpy (note the result is different):
numpy.sum(x0 * w0)
# -0.097668208
I'm not exactly sure what kind of convolution you are trying to compute, but it seems to me that numpy.sum(x0*w0) might not be the way to do it. Does this help?
import numpy as np
# ... define x0 and w0 like in your example ...
np_convolution = np.fft.irfftn(np.fft.rfftn(x0) * np.fft.rfftn(w0))
The last element of the resulting array, i.e. np_convolution[-1,-1,-1,-1] is -0.047490807560833327, which seems to be the answer you're looking for in your notebook.