Destructing Iterator.product in Julia - numpy

I wonder if I have Iterator.product generating pairs (x,y) how I get only x-componnet out of it?
I'm trying to convert from Python/numpy to julia using this cheatsheet. Simple thing:
xs = np.arange(0,2*np.pi,0.1)
ys = np.arange(0,2*np.pi,0.1)
Xs,Ys = np.meshgrid(xs,ys)
F = sin(Xs)*cos(Ys)
Now in Julia I got I should use Iterator.product
xs = 0.0 : 0.1 : 2*pi
ys = 0.0 : 0.1 : 2*pi
XYs = Iterators.product(xs,ys)
for (i, (x,y)) in enumerate( XYs ) println("$i $x $y") end
# as far as good
# Now what?
(Xs,Ys)=XYs ; println(Xs); println(Ys) # Nope
.(Xs,Ys)=XYs ; println(Xs); println(Ys) # syntax: invalid identifier name "."
Xs=xy[:,0] ; println(Xs) # MethodError: no method matching getindex( ...
XYs_T = transpose(XYs) ; println(XYs_T) # MethodError: no method matching transpose(
# this seems to work, but how to address index which is not "first" or "last" ?
Xs = first.(XYs)
Ys = last.(XYs)
# I guess I know how to continue
F = sin.(Xs)*cos.(Ys)
imshow(F) # yes, corrent

You can avoid using the routines in Iterators if you use a comprehension:
julia> F = [sin(x) * cos(y) for x in xs, y in ys]
63×63 Array{Float64,2}:
0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0
0.0998334 0.0993347 0.0978434 0.0953745 0.0919527 0.0925933 0.0958571 0.098163 0.0994882
0.198669 0.197677 0.194709 0.189796 0.182987 0.184262 0.190756 0.195345 0.197982
0.29552 0.294044 0.289629 0.282321 0.272192 0.274089 0.28375 0.290576 0.294498
...
Remember that since Julia organizes arrays in memory by column first and Python row first, if you want the arrays to display the same in the REPL, you will need to reverse the order of x and y in the array reference, as in the 'ij' matrix indexing option of numpy's meshgrid:
F = [sin(x) * cos(y) for y in ys, x in xs] # BUT reference elements as e = F[y, x] now

You can use comprehensions, but I like broadcasting:
xs = range(0, 2π, step=0.1)
ys = range(0, 2π, step=0.1)
F = sin.(xs) .* cos.(ys') # notice the adjoint '
You should avoid creating the intermediate meshgrid, that's just wasted. In fact, you should go read the manual section on broadcasting: https://docs.julialang.org/en/v1/manual/functions/#man-vectorized-1
The same thing applies in Python, btw, there is no need for meshgrid, just make sure that xs is a row vector and ys a column vector (or vice versa), then you can directly multiply np.sin(xs) * np.cos(ys), and they will broadcast to matrix form automatically.

Related

SVD Inversion, Moore Penrose and and LSQ give different answers using Numpy

I am solving a matrix using different methods. According to my interpretation of the numpy descriptions, all three of my tested methods (SVD inversion, moore-penrose inversion, and Least Squares) should result in the same answer. However, the SVD inversion results in a very different answer. I cannot find a mathematical reason for this in Numerical Recipes. Is there a Numpy implementation nuance that is causing this?
I am using the following code on Python 3.8.10, Numpy 1.21.4, in a jupyter notebook
y = np.array([176, 166, 194])
x = np.array([324, 322, 376])
x = np.stack([x, np.ones_like(x)], axis=1)
# Solve the matrix using singular value decomposition
u, s, vh = np.linalg.svd(x, full_matrices=False)
s = np.where(s < np.finfo(s.dtype).eps, 0, s)
manual_scale, manual_offset = vh # np.linalg.inv(np.diag(s)) # u.T # y
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Solve the matrix using Moore-Penrose Inversion
# Manually
manual_scale, manual_offset = np.linalg.inv(x.T # x) # x.T # y
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Using supplied numpy methods
manual_scale, manual_offset = np.linalg.pinv(x) # y
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Solve using lstsq
((manual_scale, manual_offset), residuals, rank, s) = np.linalg.lstsq(x, y)
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
The output (edited for clarity) is then
'SVD'
0.6091639943577222
29.167637174498772
array([[226.53677135, 29.77680117],
[225.31844336, 29.77680117],
[258.21329905, 29.77680117]])
'Manual Moore-Penrose'
0.4388335704125341
29.170697012800005
array([[171.35277383, 29.60953058],
[170.47510669, 29.60953058],
[194.17211949, 29.60953058]])
'Moore-Penrose'
0.43883357041251736
29.170697012802187
array([[171.35277383, 29.60953058],
[170.47510669, 29.60953058],
[194.17211949, 29.60953058]])
'LSTSQ'
/tmp/ipykernel_261995/387148285.py:24: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
((manual_scale, manual_offset), residuals, rank, s) = np.linalg.lstsq(x, y)
0.43883357041251814
29.17069701280214
array([[171.35277383, 29.60953058],
[170.47510669, 29.60953058],
[194.17211949, 29.60953058]])
As you can see three later methods get the same result, yet the manual svd calculation is different. What is going on?
You are missing a transpose of vh. The SVD solution should be
manual_scale, manual_offset = vh.T # np.linalg.inv(np.diag(s)) # u.T # y
By the way, you can simplify the inverse of the diagonal factor:
manual_scale, manual_offset = vh.T # np.diag(1/s) # u.T # y
(That assumes there are no zeros in s.)
For the next person who needs this, the fixed code is below. Thanks Warren!
y = np.array([176, 166, 194])
x = np.array([324, 322, 376])
x = np.stack([x, np.ones_like(x)], axis=1)
# Solve the matrix using singular value decomposition
u, s, vh = np.linalg.svd(x, full_matrices=False)
s = np.where(s < np.finfo(s.dtype).eps, 0, s)
manual_scale, manual_offset = vh.T # np.diag(1/s) # u.T # y
display('SVD')
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Solve the matrix using Moore-Penrose Inversion
# Manually
manual_scale, manual_offset = np.linalg.inv(x.T # x) # x.T # y
display('Manual Moore-Penrose')
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Using supplied numpy methods
manual_scale, manual_offset = np.linalg.pinv(x) # y
display('Moore-Penrose')
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Solve using lstsq
display('LSTSQ')
((manual_scale, manual_offset), residuals, rank, s) = np.linalg.lstsq(x, y)
display(manual_scale, manual_offset, manual_scale * x + manual_offset)

batched tensor slice, slice B x N x M with B x 1

I have an B x M x N tensor, X, and I have and B x 1 tensor, Y, which corresponds to the index of tensor X at dimension=1 that I want to keep. What is the shorthand for this slice so that I can avoid a loop?
Essentially I want to do this:
Z = torch.zeros(B,N)
for i in range(B):
Z[i] = X[i][Y[i]]
the following code is similar to the code in the loop. the difference is that instead of sequentially indexing the array Z,X and Y we are indexing them in parallel using the array i
B, M, N = 13, 7, 19
X = np.random.randint(100, size= [B,M,N])
Y = np.random.randint(M , size= [B,1])
Z = np.random.randint(100, size= [B,N])
i = np.arange(B)
Y = Y.ravel() # reducing array to rank-1, for easy indexing
Z[i] = X[i,Y[i],:]
this code can be further simplified as
-> Z[i] = X[i,Y[i],:]
-> Z[i] = X[i,Y[i]]
-> Z[i] = X[i,Y]
-> Z = X[i,Y]
pytorch equivalent code
B, M, N = 5, 7, 3
X = torch.randint(100, size= [B,M,N])
Y = torch.randint(M , size= [B,1])
Z = torch.randint(100, size= [B,N])
i = torch.arange(B)
Y = Y.ravel()
Z = X[i,Y]
The answer provided by #Hammad is short and perfect for the job. Here's an alternative solution if you're interested in using some less known Pytorch built-ins. We will use torch.gather (similarly you can achieve this with numpy.take).
The idea behind torch.gather is to construct a new tensor-based on two identically shaped tensors containing the indices (here ~ Y) and the values (here ~ X).
The operation performed is Z[i][j][k] = X[i][Y[i][j][k]][k].
Since X's shape is (B, M, N) and Y shape is (B, 1) we are looking to fill in the blanks inside Y such that Y's shape becomes (B, 1, N).
This can be achieved with some axis manipulation:
>>> Y.expand(-1, N)[:, None] # expand to dim=1 to N and unsqueeze dim=1
The actual call to torch.gather will be:
>>> X.gather(dim=1, index=Y.expand(-1, N)[:, None])
Which you can reshape to (B, N) by adding in [:, 0].
This function can be very effective in tricky scenarios...

Numpy : multivariate indexing?

I wander, is it possible to index several dimensions at once ? With some broadcasting. Example :
Suppose i have an array A, shaped (n,d). Suppose i have a indexing array, say I with integer values between 0 and d-1. Set B = A[:,I].
If shape(I) == (k,), for whaterver k, then B has shape (n,k) and B[x,y] = A[x,I[y]].
But if shape(I) == (k,p) for whatever (k,p), then i wanted B to be shaped (n,k,p) with B[x,y,z] = A[x,I[y,z]].
1° How can i get this behavior ?
2° Does it have a drawback i did not see ?
You can do it exactly as you described it:
import numpy as np
n = 100
d = 20
k = 10
p = 17
A = np.random.random((n, d))
I = np.random.randint(low=0, high=d, size=(k, p))
B = A[:, I]
print(B.shape) # (n, k, p)
# Testing if the new array B is constructed as expected
x = 3
y = 5
z = 7
print(B[x, y, z])
print(A[x, I[y, z]])
print(B[x, y, z] == A[x, I[y, z]])
Its hard to answer if this is a good implementation or not, without context. But in general it is a good idea to use numpy and vectorization if you have speed in mind.

Julia - linear regression - ERROR: DimensionMismatch

I wish to do a linear regression in Julia, but I am getting an error:
DimensionMismatch("column length 3000 for column(s) X, and is incompatible with column length 1000 for column(s) Y")
julia> x=rand(1000,3);
julia> x[:,1]=x[:,1] .+ 1;
julia> y = rand(1000,1) .+ 3;
julia> size(x)
(1000, 3)
julia> size(y)
(1000, 1)
julia> ols = lm(#formula(Y ~ X), DataFrame(X = x, Y = y))
ERROR: DimensionMismatch("column length 3000 for column(s) X, and is incompatible with column length 1000 for column(s) Y")
Stacktrace:
[1] (::getfield(DataFrames, Symbol("##DataFrame#83#86")))(::Bool, ::Type, ::Array{Any,1}, ::DataFrames.Index) at /Users/henry/.julia/packages/DataFrames/CZrca/src/dataframe/dataframe.jl:118
[2] Type at ./none:0 [inlined]
[3] #DataFrame#94(::Base.Iterators.Pairs{Symbol,Array{Float64,2},Tuple{Symbol,Symbol},NamedTuple{(:X, :Y),Tuple{Array{Float64,2},Array{Float64,2}}}}, ::Type) at /Users/henry/.julia/packages/DataFrames/CZrca/src/dataframe/dataframe.jl:174
[4] (::getfield(Core, Symbol("#kw#Type")))(::NamedTuple{(:X, :Y),Tuple{Array{Float64,2},Array{Float64,2}}}, ::Type{DataFrame}) at ./none:0
[5] top-level scope at none:0
Can someone help?
Your problem is that when you write:
DataFrame(X = x, Y = y)
You are trying to set Matrix as a column of a DataFrame. This is not allowed. You can only add vectors as columns of a data frame. So this will work for example
DataFrame([x y], [:x1, :x2, :x3, :y])
So in your original code you can write for example:
lm(#formula(y ~ x1+x2+x3), DataFrame([x y], [:x1, :x2, :x3, :y]))

Is there a Julia equivalent to NumPy's ellipsis slicing syntax (...)?

In NumPy, the ellipsis syntax is for
filling in a number of : until the number of slicing specifiers matches the dimension of the array.
(paraphrasing this answer).
How can I do that in Julia?
Not yet, but you can help yourself if you want.
import Base.getindex, Base.setindex!
const .. = Val{:...}
setindex!{T}(A::AbstractArray{T,1}, x, ::Type{Val{:...}}, n) = A[n] = x
setindex!{T}(A::AbstractArray{T,2}, x, ::Type{Val{:...}}, n) = A[ :, n] = x
setindex!{T}(A::AbstractArray{T,3}, x, ::Type{Val{:...}}, n) = A[ :, :, n] =x
getindex{T}(A::AbstractArray{T,1}, ::Type{Val{:...}}, n) = A[n]
getindex{T}(A::AbstractArray{T,2}, ::Type{Val{:...}}, n) = A[ :, n]
getindex{T}(A::AbstractArray{T,3}, ::Type{Val{:...}}, n) = A[ :, :, n]
Then you can write
> rand(3,3,3)[.., 1]
3x3 Array{Float64,2}:
0.0750793 0.490528 0.273044
0.470398 0.461376 0.01372
0.311559 0.879684 0.531157
If you want more elaborate slicing, you need to generate/expand the definition or use staged functions.
Edit: Nowadays, see https://github.com/ChrisRackauckas/EllipsisNotation.jl
They way to go is EllipsisNotation.jl, which adds .. to the language.
Example:
julia> using EllipsisNotation
julia> x = rand(1,2,3,4,5);
julia> x[..,3] == x[:,:,:,:,3]
true
julia> x[1,..] == x[1,:,:,:,:]
true
julia> x[1,1,..] == x[1,1,:,:,:]
true
(#mschauer noted this his answer (edit) already, but the reference is at the very end, and I felt this question deserved a clean up-to-date answer.)