special vectorial product numpy - numpy

ref question
Let's say I have vector s and I want to produce the matrix m (see image) with only numpy functions, how could I do that ? I imagined to transpose the vector s and to find a special product between s and s^t but I couldn't manage to find it. Do you have any idea ?

This looks like an outer product:
s = np.arange(3) # array([1, 2, 3])
np.multiply.outer(s,s)
output:
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])

Related

Numpy 3D indexing sepecific column elements

I have a multidimensional (3D) array like the following:
>>>arr=np.array([[[0, 0],
[1, 1]],
[[2, 0],
[3, 1]],
[[3, 0],
[4, 1]]])
I would like to obtain for each of the 2d submatrices a the element of column 0 located in a certain row, being the value that selects the row a value associated with each submatrix, that is, I would like to achieve:
>>>arr[:,[0,1,0],0]
array([0,3,3])
However, with the above command what I get is:
>>>arr[:,[0,1,0],0]
array([[0, 1, 0],
[2, 3, 2],
[3, 4, 3]])
Per the documentation, I was able to achieve the goal using the following command:
>>>arr[range(arr.shape[0]),[0,1,0],0]
array([0,3,3])
But I would like to know if there is a better way where I don't need to pass a list with all the indices for the first element of the indexing, like in the first example.

What is the difference between np.array([val1, val2]) and np.array([[val1, val2]])?

What is the difference between np.array([1, 2]) and np.array([[1, 2]])?
Which one of them is a matrix?
I also do not understand the output for shape of the above tensors. The former returns (2,) and the latter returns (1,2).
np.array([1, 2]) builds an array starting from a list, thus giving you a 1D array with the shape (2, ) since it only contains a single list of two elements.
When using the double [ you are actually passing a list of lists, thus this gets you a multidimensional array, or matrix, with the shape (1, 2).
With the latter you are able to build more complex matrices like:
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rendering a 3x3 matrix:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

How to return one NumPy array per partition in Dask?

I need to compute many NumPy arrays (that can be up to 4-dimensional), one for each partition of a Dask dataframe, and then add them as arrays. However, I'm struggling to make map_partitions return an array for each partition instead of a single array for all of them.
import dask.dataframe as dd
import numpy as np, pandas as pd
df = pd.DataFrame(range(15), columns=['x'])
ddf = dd.from_pandas(df, npartitions=3)
def func(partition):
# Here I also tried returning the array in a list and in a tuple
return np.array([[1, 2], [3, 4]])
# Here I tried all the options available for 'meta'
results = ddf.map_partitions(func).compute()
Then results is:
array([[1, 2],
[3, 4],
[1, 2],
[3, 4],
[1, 2],
[3, 4]])
And if, instead, I do results.sum().compute() I get 30.
What I'd like to get is:
[np.array([[1, 2],[3, 4]]), np.array([[1, 2],[3, 4]]), np.array([[1, 2],[3, 4]])]
So that if I compute the sum, I get:
array([[ 3, 6],
[ 9, 12]])
How can you achieve this result with Dask?
I managed to make it work like this, but I don't know if this is the best way:
from dask import delayed
results = []
for partition in ddf.partitions:
result = delayed(func)(partition)
results.append(result)
delayed(sum)(results).compute()
The result of the computation is:
array([[ 3, 6],
[ 9, 12]])
You are right, a dask-array is usually to be viewed as a single logical array, which just happens to be made of pieces. Single you are not using the logical layer, you could have done your work with delayed alone. On the other hand, it seems like the end result you want really is a sum over all the data, so maybe even simpler would be an appropriate reshape and sum(axis=)?
ddf.map_partitions(func).compute_chunk_sizes().reshape(
-1, 2, 2).sum(axis=0).compute()
(compute_chunk_sizes is needed because although your original pandas dataframe had a known size, Dask did not evaluate your function yet to know what sizes it gave back)
However, given your setup, the following would work and be more similar to your original attempt, see .to_delayed()
list_of_delayed = ddf.map_partitions(func).to_delayed().tolist()
tuple_of_np_lists = dask.compute(*list_of_delayed)
(tolist forces evaluating the contained delayed objects)

Numpy Advanced Indexing confusion

If a is numpy array of shape (5,3), b is of shape (2,2) and c is of shape (2,2), what is the shape of a[b,c]?
Can anyone explain this to me with an example. I've read the docs but still I am not able to understand how it works.
Just for the purpose of expounding the concept of advanced indexing, here is a contrived example:
# input arrays
In [22]: a
Out[22]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [23]: b
Out[23]:
array([[0, 1],
[2, 3]])
In [24]: c
Out[24]:
array([[0, 1],
[2, 2]])
# advanced indexing
In [25]: a[b, c]
Out[25]:
array([[ 0, 4],
[ 8, 11]])
By the expression a[b, c], we are using the arrays b and c to selectively pull out elements from the array a.
To interpret the output of a[b, c]:
# b # c # 2D indices
[[0, 1], [[0, 1] ---> (0,0) (1,1)
[2, 3]] [2, 2]] ---> (2,2) (3,2)
The 2D indices would simply be applied to the array a and the corresponding elements would be returned as array in the result of a[b, c]
a[(0,0)] --> 0
a[(1,1)] --> 4
a[(2,2)] --> 8
a[(3,2)] --> 11
The above elements are returned as a 2D array since the arrays b and c are 2D arrays themselves.
Also, please note that advanced indexing always returns a copy.
In [27]: (a[b, c]).flags.owndata
Out[27]: True
However, an assignment operation using advanced indexing will alter the original array (in-place). But, this behaviour is also dependent on two factors:
whether your indexing operation is pure (only advanced indexing) or mixed (a combination of advanced & simple indexing)
in case of mixed indexing, the order in which they are applied.
See: Views and copies confusion with NumPy arrays when combining index operations

Bitwise OR along one axis of a NumPy array

For a given NumPy array, it is easy to perform a "normal" sum along one dimension. For example:
X = np.array([[1, 0, 0], [0, 2, 2], [0, 0, 3]])
X.sum(0)
=array([1, 2, 5])
X.sum(1)
=array([1, 4, 3])
Instead, is there an "efficient" way of computing the bitwise OR along one dimension of an array similarly? Something like the following, except without requiring for-loops or nested function calls.
Example: bitwise OR along zeroeth dimension as I currently am doing it:
np.bitwise_or(np.bitwise_or(X[:,0],X[:,1]),X[:,2])
=array([1, 2, 3])
What I would like:
X.bitwise_sum(0)
=array([1, 2, 3])
numpy.bitwise_or.reduce(X, axis=whichever_one_you_wanted)
Use the reduce method of the numpy.bitwise_or ufunc.