I am trying to replicate how zip works by using a simple example and i want the output to be an array.
I have the following data
s = (2, 2)
array = np.zeros(s)
x = np.array([1, 0, 1, 0, 1, 1, 1, 1])
y = np.array([1, 0, 0, 0, 1, 0, 1, 1])
What i want to do is have a 2x2 matrix as output, which works like this:
for i, j in zip(x, y):
array[i][j] += 1
This outputs
[[2 0]
[2 4]]
I tried obtaining the same results without using the zip for lists but i get a (1,1) tuple
for i in range(len(x)):
array = x[i], y[i]
will output: (1, 1)
for i in range(len(x)):
array[x[i]][y[i]] += 1
This will do the same as
for i, j in zip(x, y):
array[i][j] += 1
This makes the transformation clear for you
for idx in range(len(x)):
i = x[idx]
j = y[idx]
array[i][j] += 1
print(array)
Output:
[[2 0]
[2 4]]
import numpy as np
array = np.zeros(2, dtype=int)
x = np.array([1, 0, 1, 0, 1, 1, 1, 1])
y = np.array([1, 0, 0, 0, 1, 0, 1, 1])
# empirically what zip does
[(x[i], y[i]) for i in range(sorted([len(x), len(y)])[0])]
#proof
print( list(zip(x, y)) == [(x[i], y[i]) for i in range(sorted([len(x), len(y)])[0])] )
True
Why use sorted?
Since zip stops at the index of the smaller list we need to iterate by the range of the smallest list; therefore sorted([1,2])[0] is 1 (the smaller of the 2)
so even if i had y = np.array([1, 0, 0, 0, 1, 0, 1, 1, 1]) that would still return the correct zip of the 2
In order to do it in a vectorized way, try to work on array.ravel() instead. This command performs dynamic changes of array:
np.add.at(array.ravel(),
np.ravel_multi_index(np.array([x,y]), s),
np.repeat(1, len(x)))
So you can check out after running it that array has changed:
>>> array
array([[2, 0],
[2, 4]])
Related
I want to use a 2D array which contains k-index values to quickly fill a 3D array with different mask values above/below each k-index. Only non-zero boundary indices will be used to fill.
Initialize 2D k-index array and extract valid i-j index arrays:
import numpy as np
boundary_indices = np.array([[0, 1, 2], [1, 2, 1], [0, 2, 0]])
ii, jj = np.where(boundary_indices > 0) # determine desired indices
kk = boundary_indices[ii, jj] # align boundary indices with valid indices
Yields:
boundary_indices = array([[0, 1, 2],
[1, 2, 1],
[0, 2, 0]])
ii = array([0, 0, 1, 1, 1, 2])
jj = array([1, 2, 0, 1, 2, 1])
kk = array([1, 2, 1, 2, 1, 2])
Loop through the indices and populate the output array:
output = np.zeros((3, 3, 3), dtype=np.int64)
for i, j, k in zip(ii, jj, kk):
output[i, j, :k] = 7 # fill region above
output[i, j, k:] = 8 # fill region below
While this does yield the correct results, it becomes quite slow once the size of the array increases significantly:
output[:, :, 0] = [[0, 7, 7],
[7, 7, 7],
[0, 7, 0]]
output[:, :, 1] = [[0, 8, 7],
[8, 7, 8],
[0, 7, 0]]
output[:, :, 2] = [[0, 8, 8],
[8, 8, 8],
[0, 8, 0]]
Is there a more efficient way to do this?
Tried output[ii, jj, kk] = 8 but that only imprints the boundary on the output array and not the regions above/below.
I was hoping that there would be some fancy-indexing magic and that something like this would work:
output[ii, jj, :kk] = 7
output[ii, jj, kk:] = 8
But it generates a TypeError: TypeError: only integer scalar arrays can be converted to a scalar index
For such kind of operation, Numba and Cython can be used to produce an efficient code. Here is an example with Numba:
import numba as nb
# `parallel=True` can be added here for large arrays
#nb.njit('int64[:,:,::1](int64[:], int64[:], int64[:])')
def compute(ii, jj, kk):
output = np.zeros((3, 3, 3), dtype=np.int64)
n = output.shape[2]
# `for idx in prange(ii.size)` can be used here for large array
for i, j, k in zip(ii, jj, kk):
# `i, j, k = ii[idx], jj[idx], kk[idx]` can be used here for large array
for l in range(k): # fill region above
output[i, j, l] = 7
for l in range(k, n): # fill region below
output[i, j, l] = 8
return output
# Either kk needs to be converted to an int64-based array with kk.astype(np.int64)
# or boundary_indices needs to be an int64-based array in the first place.
output = compute(ii, jj, kk)
Note that the Numba function can be faster if ii and jj are contiguous. However, they are surprisingly not contiguous when retrieved from np.where. Besides I assume that kk is a 64-bit array. You can change the signature (string in the Numba jit decorator) so to support 32-bit array. Also please note that Numba can lazily compile the function based on the provided type at runtime but this introduce a significant overhead during the first function call. This code is significantly faster, especially for large arrays thanks to the the just-in-time compilation of Numba. The Numba loop can be parallelized using prange and the parallel=True decorator flag although the current code should already be pretty good. Finally, note that you can do the operation np.where(boundary_indices > 0) directly in the Numba loop on the fly so to avoid creating possibly-expensive temporary arrays.
I have an ndarray A of shape (n, a, b)
I want a Boolean ndarray X of shape (a, b) where
X[i,j]=any(A[:, i, j] < 0)
How to achieve this?
I would use an intermediate matrix and the sum(axis) method:
np.random.seed(24)
# example matrix filled either with 0 or -1:
A = np.random.randint(2, size=(3, 2, 2)) - 1
# condition test:
X_elementwise = A < 0
# Check whether the conditions are fullfilled at least once:
X = X_elementwise.sum(axis=0) >= 1
Values for A and X:
A = array([[[-1, 0],
[-1, 0]],
[[ 0, 0],
[ 0, -1]],
[[ 0, 0],
[-1, 0]]])
X = array([[ True, False],
[ True, True]])
I want to compute the cumulative count of a given variable. So I expect that the following code works
import pandas as pd
import numpy as np
df = pd.DataFrame.from_records({'x': [0, 1, 0, 1, 1]})
df2 = pd.DataFrame.from_records({'x': [0, 0, 0, 0, 0]})
result = df.groupby('x').apply(lambda x: pd.Series(np.arange(len(x)), index=x.index)).reset_index(level=0, drop=True).sort_index()
assert (result == [0, 0, 1, 1, 2]).all()
result2 = df2.groupby('x').apply(lambda x: pd.Series(np.arange(len(x)))).reset_index(level=0, drop=True).sort_index()
assert (result2 == [0, 1, 2, 3, 4]).all()
The first assert is True but not the second one.
Why ?
This seems to be an open issue.
See BUG: inconsistent return format of Dataframe group apply function.
A workaround can be:
assert (result2.values == [0, 1, 2, 3, 4]).all()
I have a numpy array A of size ((s1,...sm)) with integer entries and a dictionary D with integers as keys and numpy arrays of size ((t)) as values. I would like to evaluate the dictionary on every entry of the array A to get a new array B of size ((s1,...sm,t)).
For example
D={1:[0,1],2:[1,0]}
A=np.array([1,2,1])
The output shout be
array([[0,1],[1,0],[0,1]])
Motivation: I have an array with indexes of unit vectors as entries and I need to transform it into an array with the vectors as entries.
If you can rename your keys to be 0-indexed, you might use direct array querying on your unit vectors:
>>> units = np.array([D[1], D[2]])
>>> B = units[A - 1] # -1 because 0 indexed: 1 -> 0, 2 -> 1
>>> B
array([[0, 1],
[1, 0],
[0, 1]])
And similarly for any shape:
>>> A = np.random.random_integers(0, 1, (10, 11, 12))
>>> A.shape
(10, 11, 12)
>>> B = units[A]
>>> B.shape
(10, 11, 12, 2)
You can learn more about advanced indexing on the numpy doc
>>> np.asarray([D[key] for key in A])
array([[0, 1],
[1, 0],
[0, 1]])
Here's an approach using np.searchsorted to locate those row indices to index into the values of the dictionary and then simply indexing it to get the desired output, like so -
idx = np.searchsorted(D.keys(),A)
out = np.asarray(D.values())[idx]
Sample run -
In [45]: A
Out[45]: array([1, 2, 1])
In [46]: D
Out[46]: {1: [0, 1], 2: [1, 0]}
In [47]: idx = np.searchsorted(D.keys(),A)
...: out = np.asarray(D.values())[idx]
...:
In [48]: out
Out[48]:
array([[0, 1],
[1, 0],
[0, 1]])
How can I extract triangles from delaunay filter in mayavi?
I want to extract the triangles just like matplotlib does
import numpy as np
import matplotlib.delaunay as triang
from enthought.mayavi import mlab
x = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
z = np.zeros(9)
#matplotlib
centers, edges, triangles_index, neig = triang.delaunay(x,y)
#mayavi
vtk_source = mlab.pipeline.scalar_scatter(x, y, z, figure=False)
delaunay = mlab.pipeline.delaunay2d(vtk_source)
I want to extract the triangles from mayavi delaunay filter to obtain the variables #triangle_index and #centers (just like matplotlib)
The only thing I've found is this
http://docs.enthought.com/mayavi/mayavi/auto/example_delaunay_graph.html
but only get the edges, and are codificated different than matplotlib
To get the triangles index:
poly = delaunay.outputs[0]
tindex = poly.polys.data.to_array().reshape(-1, 4)[:, 1:]
poly is a PolyData object, poly.polys is a CellArray object that stores the index information.
For detail about CellArray: http://www.vtk.org/doc/nightly/html/classvtkCellArray.html
To get the center of every circumcircle, you need to loop every triangle and calculate the center:
centers = []
for i in xrange(poly.number_of_cells):
cell = poly.get_cell(i)
points = cell.points.to_array()[:, :-1].tolist()
center = [0, 0]
points.append(center)
cell.circumcircle(*points)
centers.append(center)
centers = np.array(centers)
cell.circumcircle() is a static function, so you need to pass all the points of the triangle as arguments, the center data will be returned by modify the fourth argument.
Here is the full code:
import numpy as np
from enthought.mayavi import mlab
x = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
z = np.zeros(9)
vtk_source = mlab.pipeline.scalar_scatter(x, y, z, figure=False)
delaunay = mlab.pipeline.delaunay2d(vtk_source)
poly = delaunay.outputs[0]
tindex = poly.polys.data.to_array().reshape(-1, 4)[:, 1:]
centers = []
for i in xrange(poly.number_of_cells):
cell = poly.get_cell(i)
points = cell.points.to_array()[:, :-1].tolist()
center = [0, 0]
points.append(center)
cell.circumcircle(*points)
centers.append(center)
centers = np.array(centers)
print centers
print tindex
The output is:
[[ 1.5 0.5]
[ 1.5 0.5]
[ 0.5 1.5]
[ 0.5 0.5]
[ 0.5 0.5]
[ 0.5 1.5]
[ 1.5 1.5]
[ 1.5 1.5]]
[[5 4 2]
[4 1 2]
[7 6 4]
[4 3 1]
[3 0 1]
[6 3 4]
[8 7 4]
[8 4 5]]
The result may not be the same as matplotlib.delaunay, because there are many possible solutions.