Weird numpy matrix values - numpy

When i want to calculate the determinant of matrix using <<np.linalg.det(mat1)>> or calculate the inverse it gives weird value output . For example it gives 1.11022302e-16 instead of 0.
I tried to round the number for determinant but i couldn't do the same for matrix elements.

Maybe the computation is a not fix numbers so multiplication or division very close to zero but not equals.
You can define a delta that can determine if its close enough, and then compute the the absolute distance between the result and the expected value.
Maybe like this:
res = np.linalg.det(mat)
delta = 0.0001
if abs(math.floor(res)-res)<delta:
return math.floor(res)
if abs(math.ceil(res)-res)<delta:
return math.ceil(res)
return res

Related

Calculating auto covariance in pandas

Following on the answer provided by #pltrdy, in this threat:
https://stackoverflow.com/a/27164416/14744492
How do you convert the pandas.Series.autocorr(), which calculates lag-N (default=1) autocorrelation on Series, into autocovariances?
Sadly the command pandas.Series.autocov()is not implemented in pandas.
What .autocorr(k) calculates is the (Pearson) correlation coefficient for lag k. But we know that, for a series x, that coefficient for lag k is:
\rho_k = \frac{Cov(x_{t}, x_{t-k})}{Var(x)}
Then, to get autocovariance, you multiply autocorrelation by the variance:
def autocov_series(x, lag=1):
return x.autocorr(x, lag=lag) * x.var()
Note that Series.var uses ddof of 1 by default so N - 1 divides the sample variance where N == s.size (and you'd get an unbiased estimate for the population variance).

Plotting an exponential function given one parameter

I'm fairly new to python so bare with me. I have plotted a histogram using some generated data. This data has many many points. I have defined it with the variable vals. I have then plotted a histogram with these values, though I have limited it so that only values between 104 and 155 are taken into account. This has been done as follows:
bin_heights, bin_edges = np.histogram(vals, range=[104, 155], bins=30)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2.
plt.errorbar(bin_centres, bin_heights, np.sqrt(bin_heights), fmt=',', capsize=2)
plt.xlabel("$m_{\gamma\gamma} (GeV)$")
plt.ylabel("Number of entries")
plt.show()
Giving the above plot:
My next step is to take into account values from vals which are less than 120. I have done this as follows:
background_data=[j for j in vals if j <= 120] #to avoid taking the signal bump, upper limit of 120 MeV set
I need to plot a curve on the same plot as the histogram, which follows the form B(x) = Ae^(-x/λ)
I then estimated a value of λ using the maximum likelihood estimator formula:
background_data=[j for j in vals if j <= 120] #to avoid taking the signal bump, upper limit of 120 MeV set
#print(background_data)
N_background=len(background_data)
print(N_background)
sigma_background_data=sum(background_data)
print(sigma_background_data)
lamb = (sigma_background_data)/(N_background) #maximum likelihood estimator for lambda
print('lambda estimate is', lamb)
where lamb = λ. I got a value of roughly lamb = 27.75, which I know is correct. I now need to get an estimate for A.
I have been advised to do this as follows:
Given a value of λ, find A by scaling the PDF to the data such that the area beneath
the scaled PDF has equal area to the data
I'm not quite sure what this means, or how I'd go about trying to do this. PDF means probability density function. I assume an integration will have to take place, so to get the area under the data (vals), I have done this:
data_area= integrate.cumtrapz(background_data, x=None, dx=1.0)
print(data_area)
plt.plot(background_data, data_area)
However, this gives me an error
ValueError: x and y must have same first dimension, but have shapes (981555,) and (981554,)
I'm not sure how to fix it. The end result should be something like:
See the cumtrapz docs:
Returns: ... If initial is None, the shape is such that the axis of integration has one less value than y. If initial is given, the shape is equal to that of y.
So you are either to pass an initial value like
data_area = integrate.cumtrapz(background_data, x=None, dx=1.0, initial = 0.0)
or discard the first value of the background_data:
plt.plot(background_data[1:], data_area)

Strange roots `using numpy.roots`

Is there something wrong in the evaluation of the polinomial (1-alpha*z)**9 using numpy? For
alpha=3/sqrt(2) my list of coefficients is given in the array
psi_t0 = [1.0, -19.0919, 162.0, -801.859, 2551.5, -5412.55, 7654.5, -6958.99, 3690.56, -869.874]
According to numpy documentation, I have to invert this array in order to compute the zeros, i.e.
psi_t0 = psi_t0[::-1]
Thus giving
a = np.roots(psi_t0)
[0.62765842+0.06979364j 0.62765842-0.06979364j 0.52672941+0.14448097j 0.52672941-0.14448097j 0.42775926+0.13031547j 0.42775926-0.13031547j 0.36690056+0.07504044j 0.36690056-0.07504044j 0.34454214+0.j]
which is completely crap since the roots must be all equal to sqrt(2)/3.
As you take the 9th power you'll find that you create a very "wide" zero, indeed, if you step eps away from the true zero and evaluate you'll get something of O(eps^9). In view of that numerical inaccuracies are all but expected.
>>> np.set_printoptions(4)
>>> print(C)
[-8.6987e+02 3.6906e+03 -6.9590e+03 7.6545e+03 -5.4125e+03 2.5515e+03
-8.0186e+02 1.6200e+02 -1.9092e+01 1.0000e+00]
>>> np.roots(C)
array([0.4881+0.0062j, 0.4881-0.0062j, 0.4801+0.0154j, 0.4801-0.0154j,
0.4681+0.0172j, 0.4681-0.0172j, 0.458 +0.011j , 0.458 -0.011j ,
0.4541+0.j ])
>>> np.polyval(C,_)
array([1.4622e-13+6.6475e-15j, 1.4622e-13-6.6475e-15j,
1.2612e-13+1.5363e-14j, 1.2612e-13-1.5363e-14j,
1.0270e-13+1.3600e-14j, 1.0270e-13-1.3600e-14j,
1.1346e-13+9.7179e-15j, 1.1346e-13-9.7179e-15j,
1.0936e-13+0.0000e+00j])
As you can see the roots numpy returns are "good" in that the polynomial evaluates to something pretty close to zero at these points.

tf.round() to a specified precision

tf.round(x) rounds the values of x to integer values.
Is there any way to round to, say, 3 decimal places instead?
You can do it easily like that, if you don't risk reaching too high numbers:
def my_tf_round(x, decimals = 0):
multiplier = tf.constant(10**decimals, dtype=x.dtype)
return tf.round(x * multiplier) / multiplier
Mention: The value of x * multiplier should not exceed 2^32. So using the above method, should not rounds too high numbers.
The Solution of gdelab is very Good moving the required decimal point numbers to left for "." then get them later like "0.78969 * 100" will move 78.969 "2 numbers" then Tensorflow round will make it 78 then you divide it by 100 again making it 0.78 it smart one There is another workaround I would like to share for the Community.
You Can just use the NumPy round method by taking the NumPy matrix or vector then applying the method then convert the result to tensor again
#Creating Tensor
x=tf.random.normal((3,3),mean=0,stddev=1)
x=tf.cast(x,tf.float64)
x
#Grabing the Numpy array from tensor
x.numpy()
#use the numpy round method then convert the result to tensor again
value=np.round(x.numpy(),3)
Result=tf.convert_to_tensor(temp,dtype=tf.float64)
Result

Is it possible to optimize this Matlab code for doing vector quantization with centroids from k-means?

I've created a codebook using k-means of size 4000x300 (4000 centroids, each with 300 features). Using the codebook, I then want to label an input vector (for purposes of binning later on). The input vector is of size Nx300, where N is the total number of input instances I receive.
To compute the labels, I calculate the closest centroid for each of the input vectors. To do so, I compare each input vector against all centroids and pick the centroid with the minimum distance. The label is then just the index of that centroid.
My current Matlab code looks like:
function labels = assign_labels(centroids, X)
labels = zeros(size(X, 1), 1);
% for each X, calculate the distance from each centroid
for i = 1:size(X, 1)
% distance of X_i from all j centroids is: sum((X_i - centroid_j)^2)
% note: we leave off the sqrt as an optimization
distances = sum(bsxfun(#minus, centroids, X(i, :)) .^ 2, 2);
[value, label] = min(distances);
labels(i) = label;
end
However, this code is still fairly slow (for my purposes), and I was hoping there might be a way to optimize the code further.
One obvious issue is that there is a for-loop, which is the bane of good performance on Matlab. I've been trying to come up with a way to get rid of it, but with no luck (I looked into using arrayfun in conjunction with bsxfun, but haven't gotten that to work). Alternatively, if someone know of any other way to speed this up, I would be greatly appreciate it.
Update
After doing some searching, I couldn't find a great solution using Matlab, so I decided to look at what is used in Python's scikits.learn package for 'euclidean_distance' (shortened):
XX = sum(X * X, axis=1)[:, newaxis]
YY = Y.copy()
YY **= 2
YY = sum(YY, axis=1)[newaxis, :]
distances = XX + YY
distances -= 2 * dot(X, Y.T)
distances = maximum(distances, 0)
which uses the binomial form of the euclidean distance ((x-y)^2 -> x^2 + y^2 - 2xy), which from what I've read usually runs faster. My completely untested Matlab translation is:
XX = sum(data .* data, 2);
YY = sum(center .^ 2, 2);
[val, ~] = max(XX + YY - 2*data*center');
Use the following function to calculate your distances. You should see an order of magnitude speed up
The two matrices A and B have the columns as the dimenions and the rows as each point.
A is your matrix of centroids. B is your matrix of datapoints.
function D=getSim(A,B)
Qa=repmat(dot(A,A,2),1,size(B,1));
Qb=repmat(dot(B,B,2),1,size(A,1));
D=Qa+Qb'-2*A*B';
You can vectorize it by converting to cells and using cellfun:
[nRows,nCols]=size(X);
XCell=num2cell(X,2);
dist=reshape(cell2mat(cellfun(#(x)(sum(bsxfun(#minus,centroids,x).^2,2)),XCell,'UniformOutput',false)),nRows,nRows);
[~,labels]=min(dist);
Explanation:
We assign each row of X to its own cell in the second line
This piece #(x)(sum(bsxfun(#minus,centroids,x).^2,2)) is an anonymous function which is the same as your distances=... line, and using cell2mat, we apply it to each row of X.
The labels are then the indices of the minimum row along each column.
For a true matrix implementation, you may consider trying something along the lines of:
P2 = kron(centroids, ones(size(X,1),1));
Q2 = kron(ones(size(centroids,1),1), X);
distances = reshape(sum((Q2-P2).^2,2), size(X,1), size(centroids,1));
Note
This assumes the data is organized as [x1 y1 ...; x2 y2 ...;...]
You can use a more efficient algorithm for nearest neighbor search than brute force.
The most popular approach are Kd-Tree. O(log(n)) average query time instead of the O(n) brute force complexity.
Regarding a Maltab implementation of Kd-Trees, you can have a look here