numpy matrix subtraction gives very small number where it should give zero - numpy

I have the following code:
A = np.random.randint(-10,10,(4,4))
B = np.random.randint(-10,10,(4,4))
A1, A2, A3, A4 = inv(A#B), inv(A)#inv(B), inv(B#A), inv(B)#inv(A)
print(A1 - A4)
since mathematically A1 = A4, subtracting them should give back zero matrix, but I get
[[-1.73472348e-18 1.73472348e-18 3.46944695e-18 3.46944695e-18]
[-8.67361738e-19 0.00000000e+00 4.01154804e-18 3.46944695e-18]
[ 3.46944695e-18 1.73472348e-18 3.46944695e-18 -3.46944695e-18]
[-1.73472348e-18 -1.73472348e-18 -1.73472348e-18 1.73472348e-18]]
its very close to zero, but not really zero - why is that?

Related

SageMath: Mod().sqrt() prefixed with "sqrt" string for a particular combination in Sagemath. Is this a bug?

Using Sagemath 9.2 on Windows 10
a1 = 9798722381116618056227637476565566484018606253194222755351973203508462742253522311154076194134700145275527578605535821781545038187843569198505993524407287520970070771172279404172004212310432500247465608472105231701909612623072343883942216806934904529600639676698348239426243771486521532222069409611514728756060897629936844695006373653175992634673678639333010508845045985607328371180356262460490393317997708599757357055386370808544031455931154122353547239678217006604692623467390849309525705453042722141078914816760002281629323554959490483823338710209710265138177331357093216148991708169324688688552846634664517554736
n = 27772857409875257529415990911214211975844307184430241451899407838750503024323367895540981606586709985980003435082116995888017731426634845808624796292507989171497629109450825818587383112280639037484593490692935998202437639626747133650990603333094513531505209954273004473567193235535061942991750932725808679249964667090723480397916715320876867803719301313440005075056481203859010490836599717523664197112053206745235908610484907715210436413015546671034478367679465233737115549451849810421017181842615880836253875862101545582922437858358265964489786463923280312860843031914516061327752183283528015684588796400861331354873
a2 = Mod(a1, n).sqrt()
I get the following
sage: print(a2)
sqrt9798722381116618056227637476565566484018606253194222755351973203508462742253522311154076194134700145275527578605535821781545038187843569198505993524407287520970070771172279404172004212310432500247465608472105231701909612623072343883942216806934904529600639676698348239426243771486521532222069409611514728756060897629936844695006373653175992634673678639333010508845045985607328371180356262460490393317997708599757357055386370808544031455931154122353547239678217006604692623467390849309525705453042722141078914816760002281629323554959490483823338710209710265138177331357093216148991708169324688688552846634664517554736
If you observe, a2 is prefixed with sqrt!
I don't see this with other roots, I calculated with Sage. What does this mean?
Is this a bug or does this have some other meaning?
It would appear n is prime (at least pseudo-prime):
sage: n.is_prime(proof=False)
True
Assuming it is, let us define the finite field in n elements:
sage: F = GF(n, proof=False)
and view a1 as an element A1 in F:
sage: A1 = F(a1)
Asking whether a1 is a square modulo n
amounts to asking whether A1 a square in F.
sage: A1.is_square()
False
It is not! So when we compute the square root of A1,
it has to be in a quadratic extension of F.
This is why when we ask Sage to compute this square root,
it gives it as a generator of that extension.
A natural name for this generator is "sqrt(n)",
which is what Sage uses.
Probably, when you computed other square roots,
they were square roots of numbers which were
squares modulo n, and therefore the square roots
could be computed in F, i.e. in ZZ / n ZZ,
without requiring a quadratic field extension.

Flop count for variable initialization

Consider the following pseudo code:
a <- [0,0,0] (initializing a 3d vector to zeros)
b <- [0,0,0] (initializing a 3d vector to zeros)
c <- a . b (Dot product of two vectors)
In the above pseudo code, what is the flop count (i.e. number floating point operations)?
More generally, what I want to know is whether initialization of variables counts towards the total floating point operations or not, when looking at an algorithm's complexity.
In your case, both a and b vectors are zeros and I don't think that it is a good idea to use zeros to describe or explain the flops operation.
I would say that given vector a with entries a1,a2 and a3, and also given vector b with entries b1, b2, b3. The dot product of the two vectors is equal to aTb that gives
aTb = a1*b1+a2*b2+a3*b3
Here we have 3 multiplication operations
(i.e: a1*b1, a2*b2, a3*b3) and 2 addition operations. In total we have 5 operations or 5 flops.
If we want to generalize this example for n dimensional vectors a_n and b_n, we would have n times multiplication operations and n-1 times addition operations. In total we would end up with n+n-1 = 2n-1 operations or flops.
I hope the example I used above gives you the intuition.

Np.where function

I've got a little problem understanding the where function in numpy.
The ‘times’ array contains the discrete epochs at which GPS measurements exist (rounded to the nearest second).
The ‘locations’ array contains the discrete values of the latitude, longitude and altitude of the satellite interpolated from 10 seconds intervals to 1 second intervals at the ‘times’ epochs.
The ‘tracking’ array contains an array for each epoch in ‘times’ (array within an array). The arrays have 5 columns and 32 rows. The 32 rows correspond to the 32 satellites of the GPS constellation. The 0th row corresponds to the 1st satellite, the 31st to the 32nd. The columns contain the following (in order): is the satellite tracked (0), is L1 locked (1), is L2 locked (2), is L1 unexpectedly lost (3), is L2 unexpectedly lost (4).
We need to find all the unexpected losses and put them in an array so we can plot it on a map.
What we tried to do is:
i = 0
with np.load(r’folderpath\%i.npz' %i) as oneday_data: #replace folderpath with your directory
times = oneday_data['times']
locations = oneday_data['locations']
tracking = oneday_data['tracking']
A = np.where(tracking[:][:][4] ==1)
This should give us all the positions of the losses. With this indices it is easy to get the right locations. But it keeps returning useless data.
Can someone help us?
I think the problem is your dual slices. Further, having an array of arrays could lead to weird problems (I assume you mean an object array of 2D arrays).
So I think you need to dstack tracking into a 3D array, then do where on that. If the array is already 3D, then you can skip the dstack part. This will get the places where L2 is unexpectedly lost, which is what you did in your example:
tracking3d = np.dstack(tracking)
A0, A2 = np.where(tracking3d[:, 4, :]==1)
A0 is the position of the 1 along axis 0 (satellite), while A2 is the position of the same 1 along axis 2 (time epoch).
If the values of tracking can only be 0 or 1, you can simplify this by just doing np.where(tracking3d[:, 4, :]).
You can also roll the axes back into the configuration you were using (0: time epoch, 1: satellite, 2: tracking status)
tracking3d = np.rollaxis(np.dstack(tracking), 2, 0)
A0, A1 = np.where(tracking3d[:, :, 4]==1)
If you want to find the locations where L1 or L2 are unexpectedly lost, you can do this:
tracking3d = np.rollaxis(np.dstack(tracking), 2, 0)
A0, A1, _ = np.where(tracking3d[:, :, 3:]==1)
In this case it is the same, except there is a dummy variable _ used for the location along the last axis, since you don't care whether it was lost for L1 or L2 (if you do care, you could just do np.where independently for each axis).

SAS CONTRAST: "Weighting" for linear combination of parameter estimates in PROC GLM

Background: I have a categorical variable, X, with four levels that I fit as separate dummy variables. Thus, there are three total dummy variables representing x=1, x=2, x=3 (x=0 is baseline).
Problem/issue: I want to test the significance of a linear combination of my model's parameters, something like: 2*B1+2*B2+B3=0.
In Stata, the first issue can be done easily after a model is fit using the following:
test 2*B1 + 2*B2 + B3 = 0
Now, if I want to do this in SAS for PROC GLM using a CONTRAST statement, I know my "weights" (for lack of a better term) must sum to 0. For example, if, in an unrelated example, I wanted to test the following for four continuous variables: C1 + C2 = C3 + C4, my contrast statement would look like:
CONTRAST 'Contrast1' C1 0.5 C2 0.5 C3 -0.5 C4 -0.5
In this case, it's obvious how each variable should be weighted. However, when I want to combine the coefficients I gave in the model above (2*B1 + 2*B2 + B3 = 0) with these weights, it becomes unclear to me how to weight the function in the CONTRAST statement, specifically for a dummy variable-coded categorical variable, as described initially in the problem.
Use PROC REG.
proc reg data=mydata;
model y = b1 b2 b3;
test 2*b1+2*b2+b3=0;
run;
quit;

Calculating value based on weighted percentage chance

I am writing a program in which I need to get a random value based on weighted chances and am having real difficulty. An example of what I need to do:
a = 50%, b = 30%, c = 10%, d = 10%
In this example, I need to be able to get a random value, a,b,c or d, with the value coming back being as 'a' 50% of the time, b 30% of the time, etc...
Many thanks
Assign each value a range of numbers between 0 and 1 based on the chance it should appear. For example, A should be 0 to .5, since it needs a 50% chance. Then, get a random number between 0 and 1. Whichever value's range the random number falls into is the value you get.
A = [0, .5)
B = [.5, .8)
C = [.8, .9)
D = [.9, 1)
Random number is [0,1)
[ = inclusive, ) = exclusive.