Opposite of numpy.delete - numpy

I have a numpy array which looks like: [3,65,7,83,2,4] and I want to keep indices [1,3,5]. Which would give me [65, 83, 4]. Is there a way to do this in Numpy?
This would essentially be the opposite of the numpy.delete function.

Use fancy indexing:
>>> a = np.array([3,65,7,83,2,4])
>>> a[[1, 3, 5]]
array([65, 83, 4])

Related

How I print specific values from a multidimensional array with numpy?

I have a multidimensional np.array like: [[2, 55, 62], [3, 56,63], [4, 57, 64], ...].
I'm pretending to print only the values greater than 2 at the firt column, returnig a print like: [[3, 56,63], [4, 57, 64], ...]
How can I get it?
All you need to do is to select just the values you want to print.
Short answer:
import numpy as np
a = np.array([[1,2,3],[3,2,1]])
print(a[a>2])
What's going on?
Well, first, a>2 return a boolean mask telling if condition is met for each position of the array. This is a numpy array with exactly the same shape than a, but with dtype=bool.
Then, this mask is used to select only values where the mask's value is True, which are also those hat meet your condition.
Finally, you just print them.
Step by step, you can write as follows:
import numpy as np
a = np.array([[1,2,3],[3,2,1]])
print(a.shape) # output is (2, 3)
mask = a > 2
print(mask.shape) # output is (2, 3)
print(mask.dtype) # output is book
print(mask) # here you can see True only for those positions where condition is met
print(a[mask])

How to return one NumPy array per partition in Dask?

I need to compute many NumPy arrays (that can be up to 4-dimensional), one for each partition of a Dask dataframe, and then add them as arrays. However, I'm struggling to make map_partitions return an array for each partition instead of a single array for all of them.
import dask.dataframe as dd
import numpy as np, pandas as pd
df = pd.DataFrame(range(15), columns=['x'])
ddf = dd.from_pandas(df, npartitions=3)
def func(partition):
# Here I also tried returning the array in a list and in a tuple
return np.array([[1, 2], [3, 4]])
# Here I tried all the options available for 'meta'
results = ddf.map_partitions(func).compute()
Then results is:
array([[1, 2],
[3, 4],
[1, 2],
[3, 4],
[1, 2],
[3, 4]])
And if, instead, I do results.sum().compute() I get 30.
What I'd like to get is:
[np.array([[1, 2],[3, 4]]), np.array([[1, 2],[3, 4]]), np.array([[1, 2],[3, 4]])]
So that if I compute the sum, I get:
array([[ 3, 6],
[ 9, 12]])
How can you achieve this result with Dask?
I managed to make it work like this, but I don't know if this is the best way:
from dask import delayed
results = []
for partition in ddf.partitions:
result = delayed(func)(partition)
results.append(result)
delayed(sum)(results).compute()
The result of the computation is:
array([[ 3, 6],
[ 9, 12]])
You are right, a dask-array is usually to be viewed as a single logical array, which just happens to be made of pieces. Single you are not using the logical layer, you could have done your work with delayed alone. On the other hand, it seems like the end result you want really is a sum over all the data, so maybe even simpler would be an appropriate reshape and sum(axis=)?
ddf.map_partitions(func).compute_chunk_sizes().reshape(
-1, 2, 2).sum(axis=0).compute()
(compute_chunk_sizes is needed because although your original pandas dataframe had a known size, Dask did not evaluate your function yet to know what sizes it gave back)
However, given your setup, the following would work and be more similar to your original attempt, see .to_delayed()
list_of_delayed = ddf.map_partitions(func).to_delayed().tolist()
tuple_of_np_lists = dask.compute(*list_of_delayed)
(tolist forces evaluating the contained delayed objects)

Bitwise OR along one axis of a NumPy array

For a given NumPy array, it is easy to perform a "normal" sum along one dimension. For example:
X = np.array([[1, 0, 0], [0, 2, 2], [0, 0, 3]])
X.sum(0)
=array([1, 2, 5])
X.sum(1)
=array([1, 4, 3])
Instead, is there an "efficient" way of computing the bitwise OR along one dimension of an array similarly? Something like the following, except without requiring for-loops or nested function calls.
Example: bitwise OR along zeroeth dimension as I currently am doing it:
np.bitwise_or(np.bitwise_or(X[:,0],X[:,1]),X[:,2])
=array([1, 2, 3])
What I would like:
X.bitwise_sum(0)
=array([1, 2, 3])
numpy.bitwise_or.reduce(X, axis=whichever_one_you_wanted)
Use the reduce method of the numpy.bitwise_or ufunc.

I am trying to array index a 4 dimensional numpy array.

i have a 4 dimensional array -- say a=numpy.array(40,40,4,1000)
I also have an index array -- say b = np.arrange(35)
I am looking to make an array doing something like c = a[b,b,3,999] where the resulting array would look something like d = numpy.array(35,35). Would appreciate any thoughts on what the right way to do this is. Thank you. Neela.
Since b=np.arange(35) is just the first 35 indices, use slices instead:
c = a[:35,:35,3,999]
If the values in b are not contiguous, then you will need to adjust its shape
c = a[b[:,None], b[None,:], 3, 999]
e.g.
In [754]: a=np.arange(3*4*5).reshape(3,4,5)
In [755]: b=np.array([2,0,1])
In [756]: a[b[:,None],b[None,:],3]
Out[756]:
array([[53, 43, 48],
[13, 3, 8],
[33, 23, 28]])
b[:,None] is a (3,1) array, b[None,:] a (1,3), together they broadcast to (3,3) arrays.
You may need to read up on broadcasting and advanced indexing.
More explicitly this indexing is:
a[[[2],[0],[1]], [[2,0,1]], 3]
np.ix_ is a handy tool for generating indexes like this:
In [795]: I,J = np.ix_(b,b)
In [796]: I
Out[796]:
array([[2],
[0],
[1]])
In [797]: J
Out[797]: array([[2, 0, 1]])
In [798]: a[I,J,3]
Out[798]:
array([[53, 43, 48],
[13, 3, 8],
[33, 23, 28]])

Add single element to array as first entry in numpy

How to achieve this?
I have a numpy array containing:
[1, 2, 3]
I want to create an array containing:
[8, 1, 2, 3]
That is, I want to add an element on as the first element of the array.
Ref:Add single element to array in numpy
The most basic operation is concatenate:
x=np.array([1,2,3])
np.concatenate([[8],x])
# array([8, 1, 2, 3])
np.r_ and np.insert make use of this. Even if they are more convenient to remember, or use in more complex cases, you should be familiar with concatenate.
Use numpy.insert(). The docs are here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.insert.html#numpy.insert
You can also use numpy's np.r_, a short-cut for concatenation along the first axis:
>>> import numpy as np
>>> a = np.array([1, 2, 3])
>>> b = np.r_[8, a]
>>> b
array([8, 1, 2, 3])