Difference between numpy.logical_and and & - numpy

I'm trying to use the logical_and of two or more numpy arrays. I know numpy has the function logical_and(), but I find the simple operator & returns the same results and are potentially easier to use.
For example, consider three numpy arrays a, b, and c. Is
np.logical_and(a, np.logical_and(b,c))
equivalent to
a & b & c?
If they are (more or less) equivalent, what's the advantage of using logical_and()?

#user1121588 answered most of this in a comment, but to answer fully...
"Bitwise and" (&) behaves much the same as logical_and on boolean arrays, but it doesn't convey the intent as well as using logical_and, and raises the possibility of getting misleading answers in non-trivial cases (packed or sparse arrays, maybe).
To use logical_and on multiple arrays, do:
np.logical_and.reduce([a, b, c])
where the argument is a list of as many arrays as you wish to logical_and together. They should all be the same shape.

I have been googling some official confirmation that I can use & instead of logical_and on NumPy bool arrays, and found one in the NumPy v1.15 Manual:
If you know you have boolean arguments, you can get away with using
NumPy’s bitwise operators, but be careful with parentheses, like this:
z = (x > 1) & (x < 2). The absence of NumPy operator forms of
logical_and and logical_or is an unfortunate consequence of Python’s
design.
So one can also use ~ for logical_not and | for logical_or.

Related

Gdal: how to assign values to pixel based on condition?

I would like to change the values of the pixel of a geotiff raster such as is 1 if the pixel values are between 50 and 100 and 0 otherwise.
Following this post, this is what I am doing:
gdal_calc.py -A input.tif --outfile=output.tif --calc="1*(50<=A<=100)" --NoDataValue=0
but I got the following error
0.. evaluation of calculation 1*(50<=A<=100) failed
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think such a notation would only work if the expression returns a single boolean, but this returns an array of booleans. Hence the suggestion to aggregate the array to a scalar with something like any() or all().
You should be able to write it in a way compatible with Numpy arrays with something like this:
1 * ((50 <= A) & (A <=100))
Your original expression has an implicit and in it, whereas this uses an explicit & which translates to np.logical_and for an element-wise test if both values on either side are True.
I'm not sure what the multiplication with one adds in this case, it casts the bool to an int32 datatype. Even if you need to write the result as an int32 you can probably still leave the casting to GDAL in this case.
A toy example replicating this would be:
a = np.random.randint(0,2,5, dtype=np.bool_)
b = np.random.randint(0,2,5, dtype=np.bool_)
With this data a and b would fail in the same way, because it can't evaluate an entire array as True/False, whereas a & b would return a new array with the element-wise result.

any reasons for inconsistent numpy arguments of numpy.zeros and numpy.random.randn

I'm implementing a computation using numpy zeros and numpy.random.randn
W1 = np.random.randn(n_h, n_x) * .01
b1 = np.zeros((n_h, 1))
I'm not sure why random.randn() can accept two integers while zeros() needs a tuple. Is there a good reason for that?
Cheers, JChen.
Most likely it's just a matter of history. numpy results from the merger of several prior packages, and has a long development. Some quirks get cleaned up, others left as is.
randn(d0, d1, ..., dn)
zeros(shape, dtype=float, order='C')
randn has this note:
This is a convenience function. If you want an interface that takes a
tuple as the first argument, use numpy.random.standard_normal instead.
standard_normal(size=None)
With * it is easy to pass a tuple to randn:
np.random.randn(*(1,2,3))
np.zeros takes a couple of keyword arguments. randn does not. You can define a Python function with a (*args, **kwargs) signature. But accepting a tuple, especially one with a common usage as shape, fits better. But that's a matter of opinion.
np.random.rand and np.random.random_sample are another such pair. Most likely rand and randn are the older versions, and standard_normal and random_sample are newer ones designed to conform to the more common tuple style.

How to use the backslash operator in Julia?

I am currently trying to invert huge matrices of order 1 million by 1 million and I figured that the Backslash operator will be helpful in doing this. Any idea as to how it's implemented?. I did not find any concrete examples so any help is much appreciated.
Any idea as to how it's implemented?
It's a multialgorithm. This shows how to use it:
julia> A = rand(10,10)
10×10 Array{Float64,2}:
0.330453 0.294142 0.682869 0.991427 … 0.533443 0.876566 0.157157
0.666233 0.47974 0.172657 0.427015 0.501511 0.0978822 0.634164
0.829653 0.380123 0.589555 0.480963 0.606704 0.642441 0.159564
0.709197 0.570496 0.484826 0.17325 0.699379 0.0281233 0.66744
0.478663 0.87298 0.488389 0.188844 0.38193 0.641309 0.448757
0.471705 0.804767 0.420039 0.0528729 … 0.658368 0.911007 0.705696
0.679734 0.542958 0.22658 0.977581 0.197043 0.717683 0.21933
0.771544 0.326557 0.863982 0.641557 0.969889 0.382148 0.508773
0.932684 0.531116 0.838293 0.031451 0.242338 0.663352 0.784813
0.283031 0.754613 0.938358 0.0408097 0.609105 0.325545 0.671151
julia> b = rand(10)
10-element Array{Float64,1}:
0.0795157
0.219318
0.965155
0.896807
0.701626
0.741823
0.954437
0.573683
0.493615
0.0821557
julia> A\b
10-element Array{Float64,1}:
1.47909
2.39816
-0.15789
0.144003
-1.10083
-0.273698
-0.775122
0.590762
-0.0266894
-2.36216
You can use #which to see how it's defined:
julia> #which A\b
\(A::AbstractArray{T,2} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) in Base.LinAlg at linalg\generic.jl:805
Which leads us here: https://github.com/JuliaLang/julia/blob/master/base/linalg/generic.jl#L827 (line numbers change slightly because of version differences). As you can see, it does a few quick function calls to determine what type of matrix it is. istril finds out of its lower triangular: https://github.com/JuliaLang/julia/blob/master/base/linalg/generic.jl#L987 , etc. Once it determines the matrix type, it specializes the matrix as much as possible so it can be efficient, and then calls \. These specialized matrix types either perform a factorization which then \ does the backsubstitution (which is a nice way to use \ on your own BTW to re-use the factorization), or it "directly knows" the answer, like for triangular or diagonal matrices.
Can't get more concrete than the source.
Note that \ is slightly different than just inverting. You usually do not want to invert a matrix, let alone a large matrix. These factorizations are much more numerically stable. However, inv will do an inversion, which is a lot like an LU-factorization (which in Julia is lufact). You may also want to look into pinv for the psudo-inverse in some cases where the matrix is singular or close to singular, but you should really avoid this an instead factorize + solve the system instead of using the inverse.
For very large sparse matrices, you'll want to use iterative solvers. You'll find a lot of implementations in IterativeSolvers.jl

What is the equivalent of numpy.allclose for structured numpy arrays?

Running numpy.allclose(a, b) throws TypeError: invalid type promotion on structured arrays. What would be the correct way of checking whether the contents of two structured arrays are almost equal?
np.allclose does an np.isclose followed by all(). isclose tests abs(x-y) against tolerances, with accomodations for np.nan and np.inf. So it is designed primarily to work with floats, and by extension ints.
The arrays have to work with np.isfinite(a), as well as a-b and np.abs. In short a.astype(float) should work with your arrays.
None of this works with the compound dtype of a structured array. You could though iterate over the fields of the array, and compare those with isclose (or allclose). But you will have ensure that the 2 arrays have matching dtypes, and use some other test on fields that don't work with isclose (eg. string fields).
So in the simple case
all([np.allclose(a[name], b[name]) for name in a.dtype.names])
should work.
If the fields of the arrays are all the same numeric dtype, you could view the arrays as 2d arrays, and do allclose on those. But usually structured arrays are used when the fields are a mix of string, int and float. And in the most general case, there are compound dtypes within dtypes, requiring some sort of recursive testing.
import numpy.lib.recfunctions as rf
has functions to help with complex structured array operations.
Assuming b is a scalar, you can just iterate over the fields of a:
all(np.allclose(a[field], b) for field in a.dtype.names)

Numpy/Scipy pinv and pinv2 behave differently

I am working with bidimensional arrays on Numpy for Extreme Learning Machines. One of my arrays, H, is random, and I want to compute its pseudoinverse.
If I use scipy.linalg.pinv2 everything runs smoothly. However, if I use scipy.linalg.pinv, sometimes (30-40% of the times) problems arise.
The reason why I am using pinv2 is because I read (here: http://vene.ro/blog/inverses-pseudoinverses-numerical-issues-speed-symmetry.html ) that pinv2 performs better on "tall" and on "wide" arrays.
The problem is that, if H has a column j of all 1, pinv(H) has huge coefficients at row j.
This is in turn a problem because, in such cases, np.dot(pinv(H), Y) contains some nan values (Y is an array of small integers).
Now, I am not into linear algebra and numeric computation enough to understand if this is a bug or some precision related property of the two functions. I would like you to answer this question so that, if it's the case, I can file a bug report (honestly, at the moment I would not even know what to write).
I saved the arrays with np.savetxt(fn, a, '%.2e', ';'): please, see https://dl.dropboxusercontent.com/u/48242012/example.tar.gz to find them.
Any help is appreciated. In the provided file, you can see in pinv(H).csv that rows 14, 33, 55, 56 and 99 have huge values, while in pinv2(H) the same rows have more decent values.
Your help is appreciated.
In short, the two functions implement two different ways to calculate the pseudoinverse matrix:
scipy.linalg.pinv uses least squares, which may be quite compute intensive and take up a lot of memory.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.pinv.html#scipy.linalg.pinv
scipy.linalg.pinv2 uses SVD (singular value decomposition), which should run with a smaller memory footprint in most cases.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.pinv2.html#scipy.linalg.pinv2
numpy.linalg.pinv also implements this method.
As these are two different evaluation methods, the resulting matrices will not be the same. Each method has its own advantages and disadvantages, and it is not always easy to determine which one should be used without deeply understanding the data and what the pseudoinverse will be used for. I'd simply suggest some trial-and-error and use the one which gives you the best results for your classifier.
Note that in some cases these functions cannot converge to a solution, and will then raise a scipy.stats.LinAlgError. In that case you may try to use the second pinv implementation, which will greatly reduce the amount of errors you receive.
Starting from scipy 1.7.0 , pinv2 is deprecated and also replaced by a SVD solution.
DeprecationWarning: scipy.linalg.pinv2 is deprecated since SciPy 1.7.0, use scipy.linalg.pinv instead
That means, numpy.pinv, scipy.pinv and scipy.pinv2 now compute all equivalent solutions. They are also equally fast in their computation, with scipy being slightly faster.
import numpy as np
import scipy
arr = np.random.rand(1000, 2000)
res1 = np.linalg.pinv(arr)
res2 = scipy.linalg.pinv(arr)
res3 = scipy.linalg.pinv2(arr)
np.testing.assert_array_almost_equal(res1, res2, decimal=10)
np.testing.assert_array_almost_equal(res1, res3, decimal=10)