Construct NumPy matrix row by row - numpy

I'm trying to construct a 2D NumPy array from values in an extant 2D NumPy array using an iterative process. Using ordinary python lists the process I'm describing would look like so:
coords = #data from file contained in a 2D list
d = #integer
edges = []
for i in range(d+1):
for j in range(i+1, d+1):
edge = coords[j] - coords[i]
edges.append(edge)
However, the NumPy array imposes restrictions that do not permit the process shown above. Below I try to do the same thing using NumPy arrays, and it should immediately be clear where the problems are:
coords = np.genfromtxt('Energies.txt', dtype=float, skip_header=1)
d = #integer
#how to initialize?
for i in range(d+1):
for j in range(i+1, d+1):
edge = coords[j] - coords[i]
#how to append?
Because .append does not exist for NumPy arrays I need to rely on concatenate or stack instead. But these functions are designed to join existing arrays, and I don't have anything to concatenate or stack until after the first iteration of my loop. So I suppose I need to change my data flow, but I'm unsure how to go about this.
Any help would be greatly appreciated. Thanks in advance.

that function is numpy.meshgrid [1] , the function does it by default.
[1] https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.meshgrid.html

Related

A more efficient way of creating an NxM array in Python

In Python, I need to create an NxM matrix in which the ij entry has value of i^2 + j^2.
I'm currently constructing it using two for loops, but the array is quite big and the computation time is long and I need to perform it several times. Is there a more efficient way of constructing such matrix using maybe Numpy ?
You can use broadcasting in numpy. You may refer to the official documentation. For example,
import numpy as np
N = 3; M = 4 #whatever values you'd like
a = (np.arange(N)**2).reshape((-1,1)) #make it to column vector
b = np.arange(M)**2
print(a+b) #broadcasting applied
Instead of using np.arange(), you can use np.array([...some array...]) for customizing it.

Numpy iterating over rows

I kind of have the misconception that for loops should be avoided in Numpy for speed reasons, for example
import numpy
a = numpy.array([[2,0,1,3],[0,2,3,1]])
targets = numpy.array([[1,1,1,1,1,1,1]])
output = numpy.zeros((2,1))
for i in range(2):
output[i] = numpy.mean(targets[a[i]])
Is this a good way to get the mean on selected positions of each row? Feels like there might be ways to slice the array first then apply mean directly.
I think you are looking for this:
targets[a].mean(1)
Note that in your example, targets need to be 1-D and not 2-D. Otherwise, your loop throws out of bound index as it interprets the index for row index and not the column index.
numpy actually interprets this for you: targets[a] works "row-wise" and subsequently using np.mean(targets[a], axis=1) as suggested by #hpaulj in the comments does exactly what you want:
import numpy
a = numpy.array([[2,0,1,3],[0,2,3,1]])
targets = numpy.arange(1,6) # To make the results differ
output = numpy.mean(targets[a], axis=1) # the i-th row of targets[a] is targets[a[i]]

How to check the presence of a given numpy array in a larger-shape numpy array?

I guess the title of my question might not be very clear..
I have a small array, say a = ([[0,0,0],[0,0,1],[0,1,1]]). Then I have a bigger array of a higher dimension, say b = ([[[2,2,2],[2,0,1],[2,1,1]],[[0,0,0],[3,3,1],[3,1,1]],[...]]).
I'd like to check if one of the elements of a can be found in b. In this case, I'd find that the first element of a [0,0,0] is indeed in b, and then I'd like to retrieve the corresponding index in b.
I'd like to do that avoiding looping, since from the very little I understood from numpy arrays, they are not meant to be iterated over in a classic way. In other words, I need it to be very fast, because my actual arrays are quite big.
Any idea?
Thanks a lot!
Arnaud.
I don't know of a direct way, but I here's a function that works around the problem:
import numpy as np
def find_indices(val, arr):
# first take a mean at the lowest level of each array,
# then compare these to eliminate the majority of entries
mb = np.mean(arr, axis=2); ma = np.mean(val)
Y = np.argwhere(mb==ma)
indices = []
# Then run a quick loop on the remaining elements to
# eliminate arrays that don't match the order
for i in range(len(Y)):
idx = (Y[i,0],Y[i,1])
if np.array_equal(val, arr[idx]):
indices.append(idx)
return indices
# Sample arrays
a = np.array([[0,0,0],[0,0,1],[0,1,1]])
b = np.array([ [[6,5,4],[0,0,1],[2,3,3]], \
[[2,5,4],[6,5,4],[0,0,0]], \
[[2,0,2],[3,5,4],[5,4,6]], \
[[6,5,4],[0,0,0],[2,5,3]] ])
print(find_indices(a[0], b))
# [(1, 2), (3, 1)]
print(find_indices(a[1], b))
# [(0, 1)]
The idea is to use the mean of each array and compare this with the mean of the input. np.argwhere() is the key here. That way you remove most of the unwanted matches, but I did need to use a loop on the remainder to avoid the unsorted matches (this shouldn't be too memory-consuming). You'll probably want to customise it further, but I hope this helps.

Numpy: Search for an array A with same pattern in a larger array B

I have two 1D numpy array A(small) and B(large)
A=np.array([6,7,8,9,10])
B=np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,10])
I want to check if we have elements of the array A in the same order being detected in the array B.
Get the index value of array B from where the we detect the starting of array A
Index Value returned = 6
Do we have any inbuilt numpy function to perform such an operation?
I have also encountered this problem sometimes.I think the fastest way especially for big numpy arrays would be to convert them to strings and then do it.
Here is the code I use:
b=np.array([6,7,8,9,10])
a=np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,10])
a.tostring().index(b.tostring())//a.itemsize
I found a nice solution.
Given by #EdSmith in Finding Patterns in a Numpy Array
In short this is the process
Short the length of array being searched for.(My example A)
Check through entire length of the array being searched in(My example B), using np.where and np.all
This is not my code but the code that can be found in the about link, Simple and easy. I'll just alter it a bit to fit my example above Hope it helps someone :)
Thanks to #EdSmith
import numpy as np
A=np.array([6,7,8,9,10])
B=np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,10])
N = len(A)
possibles = np.where(B == A[0])[0]
solns = []
for p in possibles:
check = B[p:p+N]
if np.all(check == A):
solns.append(p)
print(solns)
Ouput
[6]
Try this:
import numpy as np
A=np.array([6,7,8,9,10])
B=np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,10])
r = np.ones_like(B)
for x in range(len(A)):r*=np.roll((B==A[x]),-x)
#first index, answer: /6/
print(np.where(r)[0][0])

How to get a subarray in numpy

I have an 3d array and I want to get a sub-array of size (2n+1) centered around an index indx. Using slices I can use
y[slice(indx[0]-n,indx[0]+n+1),slice(indx[1]-n,indx[1]+n+1),slice(indx[2]-n,indx[2]+n+1)]
which will only get uglier if I want a different size for each dimension. Is there a nicer way to do this.
You don't need to use the slice constructor unless you want to store the slice object for later use. Instead, you can simply do:
y[indx[0]-n:indx[0]+n+1, indx[1]-n:indx[1]+n+1, indx[2]-n:indx[2]+n+1]
If you want to do this without specifying each index separately, you can use list comprehensions:
y[[slice(i-n, i+n+1) for i in indx]]
You can create numpy arrays for indexing into different dimensions of the 3D array and then use use ix_ function to create indexing map and thus get the sliced output. The benefit with ix_ is that it allows for broadcasted indexing maps. More info on this could be found here. Then, you can specify different window sizes for each dimension for a generic solution. Here's the implementation with sample input data -
import numpy as np
A = np.random.randint(0,9,(17,18,16)) # Input array
indx = np.array([5,10,8]) # Pivot indices for each dim
N = [4,3,2] # Window sizes
# Arrays of start & stop indices
start = indx - N
stop = indx + N + 1
# Create indexing arrays for each dimension
xc = np.arange(start[0],stop[0])
yc = np.arange(start[1],stop[1])
zc = np.arange(start[2],stop[2])
# Create mesh from multiple arrays for use as indexing map
# and thus get desired sliced output
Aout = A[np.ix_(xc,yc,zc)]
Thus, for the given data with window sizes array, N = [4,3,2], the whos info shows -
In [318]: whos
Variable Type Data/Info
-------------------------------
A ndarray 17x18x16: 4896 elems, type `int32`, 19584 bytes
Aout ndarray 9x7x5: 315 elems, type `int32`, 1260 bytes
The whos info for the output, Aout seems to be coherent with the intended output shape which must be 2N+1.