Does numpy have an inbuilt implementation for modular exponentation of matrices?
(As pointed out by user2357112, I am actually looking for element wise modular reduction)
One way modular exponentiation on regular numbers is done is with Exponentiation by Squaring (https://en.wikipedia.org/wiki/Exponentiation_by_squaring), with a modular reduction taken at each step. I am wondering if there is a similar inbuilt solution for matrix multiplication. I am aware I can write code to emulate this easily, but I am wondering if there is an inbuilt solution.
Modular exponentiation is not currently built in NumPy (GitHub issue). The easiest/laziest way to achieve it is frompyfunc:
modexp = np.frompyfunc(pow, 3, 1)
print(modexp(np.array([[1, 2], [3, 4]]), 2, 3).astype(int))
prints
[[1 1]
[0 1]]
This is of course slower than native NumPy would be, and we get an array with dtype=object (hence astype(int) is added).
Related
Consider X and Y to be marginally standard normal with correlation 1.0.
When the correlation is 1.0, the bivariate normal distribution is undefined (it's technically the y = x line), but numpy still prints out values. Why does it do this?
Oh, but the distribution is defined! It just doesn't have a well-defined density function. (At least, not with respect to the Lesbegue measure on the 2D space.) (See Mathematics Stack Exchange's discussion on broader groups of such distributions.) So numpy is doing nothing wrong.
What you're describing is the degenerate case of the bivariate (or more generally, multivariate) normal distribution. This occurs when the covariance matrix is not positive definite. However, the distribution is defined for any positive semi-definite covariance matrix.
As an example, the matrix [[1, 1], [1, 1]] is positive not definite but is positive semidefinite.
The distribution still has a host of other properties that distributions should: a support (the real line, as you note: μ + span(Σ)), moments, and more.
import numpy as np
np.random.multivariate_normal(mean=[0, 0], cov=[[1, 1], [1, 1]])
# array([0.61156886, 0.61156887])
In summary, numpy's behavior isn't broken. It's well-behaved by returning samples from a properly specified distribution.
what I mean by the title is that sometimes I come across code that requires numpy operations (for example sum or average) along a specified axis. For example:
np.sum([[0, 1], [0, 5]], axis=1)
I can grasp this concept, but do we actually ever do these operations also along higher dimensions? Or is that not a thing? And if yes, how do you gain intuition for high-dimensional datasets and how do you make sure you are working along the right dimension/axis?
Numpy's broadcasting rules have bitten me once again and I'm starting to feel there may be a way of thinking about this
topic that I'm missing.
I'm often in situations as follows: the first axis of my arrays is reserved for something fixed, like the number of samples. The second axis could represent different independent variables of each sample, for some arrays, or it could be not existent when it feels natural that there only be one quantity attached to each sample in an array. For example, if the array is called price, I'd probably only use one axis, representing the price of each sample. On the other hand, a second axis is sometimes much more natural. For example, I could use a neural network to compute a quantity for each sample, and since neural networks can in general compute arbitrary multi valued functions, the library I use would in general return a 2d array and make the second axis singleton if I use it to compute a single dependent variable. I found this approach to use 2d arrays is also more amenable to future extensions of my code.
Long story short, I need to make decisions in various places of my codebase whether to store array as (1000,) or (1000,1), and changes of requirements occasionally make it necessary to switch from one format to the other.
Usually, these arrays live alongside arrays with up to 4 axes, which further increases the pressure to sometimes introduce singleton second axis, and then have the third axis represent a consistent semantic quality for all arrays that use it.
The problem now occurs when I add my (1000,) or (1000,1) arrays, expecting to get (1000,1), but get (1000,1000) because of implicit broadcasting.
I feel like this prevents giving semantic meaning to axes. Of course I could always use at least two axes, but that leads to the question where to stop: To be fail safe, continuing this logic, I'd have to always use arrays of at least 6 axes to represent everything.
I'm aware this is maybe not the best technically well defined question, but does anyone have a modus operandi that helps them avoid these kind of bugs?
Does anyone know the motivations of the numpy developers to align axes in reverse order for broadcasting? Was computational efficiency or another technical reason behind this, or a model of thinking that I don't understand?
In MATLAB broadcasting, a jonny-come-lately to this game, expands trailing dimensions. But there the trailing dimensions are outermost, that is order='F'. And since everything starts as 2d, this expansion only occurs when one array is 3d (or larger).
https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/
explains, and gives a bit of history. My own history with the language is old enough, that the ma_expanded = ma(ones(3,1),:) style of expansion is familiar. octave added broadcasting before MATLAB.
To avoid ambiguity, broadcasting expansion can only occur in one direction. Expanding in the direction of the outermost dimension makes seems logical.
Compare (3,) expanded to (1,3) versus (3,1) - viewed as nested lists:
In [198]: np.array([1,2,3])
Out[198]: array([1, 2, 3])
In [199]: np.array([[1,2,3]])
Out[199]: array([[1, 2, 3]])
In [200]: (np.array([[1,2,3]]).T).tolist()
Out[200]: [[1], [2], [3]]
I don't know if there are significant implementation advantages. With the striding mechanism, adding a new dimension anywhere is easy. Just change the shape and strides, adding a 0 for the dimension that needs to be 'replicated'.
In [203]: np.broadcast_arrays(np.array([1,2,3]),np.array([[1],[2],[3]]),np.ones((3,3)))
Out[203]:
[array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]), array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])]
In [204]: [x.strides for x in _]
Out[204]: [(0, 8), (8, 0), (24, 8)]
I would like to add (arithmetics) two large System.Arrays element-wise in IronPython and store the result in the first array like this:
for i in range(0:ArrA.Count) :
arrA.SetValue(i, arrA.GetValue(i) + arrB.GetValue(i));
However, this seems very slow. Having a C background I would like to use pointers or iterators. However, I do not know how I should apply the IronPython idiom in a fast way. I cannot use Python lists, as my objects are strictly from type System.Array. The type is 3d float.
What is the fastests / a fast way to perform to compute this computation?
Edit:
The number of elements is appr. 256^3.
3d float means that the array can be accessed like this: array.GetValue(indexX, indexY, indexZ). I am not sure how the respective memory is organized in IronPython's System.Array.
Background: I wrote an interface to an IronPython API, which gives access to data in a simulation software tool. I retrieve 3d scalar data and accumulate it to a temporal array in my IronPython script. The accumulation is performed 10,000 times and should be fast, so that the simulation does not take ages.
Is it possible to use the numpy library developed for IronPython?
https://pytools.codeplex.com/wikipage?title=NumPy%20and%20SciPy%20for%20.Net
It appears to be supported, and as far as I know is as close you can get in python to C style pointer functionality with arrays and such.
Create an array:
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
Multiply all elements by 3.0:
x *= 3.0
I am currently trying to implement a Gaussian Process in Mathematica and am stuck with the maximization of the loglikelihood. I just tried to use the FindMaximum formula on my loglikelihood function but this does not seem to work on this function.
gpdata = {{-1.5, -1.8}, {-1., -1.2}, {-0.75, -0.4}, {-0.4,
0.1}, {-0.25, 0.5}, {0., 0.8}};
kernelfunction[i_, j_, h0_, h1_] :=
h0*h0*Exp[-(gpdata[[i, 1]] - gpdata[[j, 1]])^2/(2*h1^2)] +
KroneckerDelta[i, j]*0.09;
covariancematrix[h0_, h1_] =
ParallelTable[kernelfunction[i, j, h0, h1], {i, 1, 6}, {j, 1, 6}];
loglikelihood[h0_, h1_] := -0.5*
gpdata[[All, 2]].LinearSolve[covariancematrix[h0, h1],
gpdata[[All, 2]], Method -> "Cholesky"] -
0.5*Log[Det[covariancematrix[h0, h1]]] - 3*Log[2*Pi];
FindMaximum[loglikelihood[a, b], {{a, 1}, {b, 1.1}},
MaxIterations -> 500, Method -> "QuasiNewton"]
In the loglikelihood I would usually have the product of the inverse of the covariance matrix times the gpdata[[All, 2]] vector but because the covariance matrix is always positive semidefinite I wrote it this way. Also the evaluation does not stop if I use
gpdata[[All, 2]].Inverse[
covariancematrix[h0, h1]].gpdata[[All, 2]]
Has anyone an idea? I am actually working on a far more complicated problem where I have 6 parameters to optimize but I already have problems with 2.
In my experience I've seen that second-order methods fail with hyper-parameter optimization more than gradient based methods. I think this is because (most?) second-order methods rely on the function being close to a quadratic near the current estimate.
Using conjugate-gradient or even Powell's (derivative-free) conjugate direction method has proved successful in my experiments. For the two parameter case, I would suggest making a contour plot of the hyper-parameter surface for some intuition.