How to get a subarray in numpy - numpy

I have an 3d array and I want to get a sub-array of size (2n+1) centered around an index indx. Using slices I can use
y[slice(indx[0]-n,indx[0]+n+1),slice(indx[1]-n,indx[1]+n+1),slice(indx[2]-n,indx[2]+n+1)]
which will only get uglier if I want a different size for each dimension. Is there a nicer way to do this.

You don't need to use the slice constructor unless you want to store the slice object for later use. Instead, you can simply do:
y[indx[0]-n:indx[0]+n+1, indx[1]-n:indx[1]+n+1, indx[2]-n:indx[2]+n+1]
If you want to do this without specifying each index separately, you can use list comprehensions:
y[[slice(i-n, i+n+1) for i in indx]]

You can create numpy arrays for indexing into different dimensions of the 3D array and then use use ix_ function to create indexing map and thus get the sliced output. The benefit with ix_ is that it allows for broadcasted indexing maps. More info on this could be found here. Then, you can specify different window sizes for each dimension for a generic solution. Here's the implementation with sample input data -
import numpy as np
A = np.random.randint(0,9,(17,18,16)) # Input array
indx = np.array([5,10,8]) # Pivot indices for each dim
N = [4,3,2] # Window sizes
# Arrays of start & stop indices
start = indx - N
stop = indx + N + 1
# Create indexing arrays for each dimension
xc = np.arange(start[0],stop[0])
yc = np.arange(start[1],stop[1])
zc = np.arange(start[2],stop[2])
# Create mesh from multiple arrays for use as indexing map
# and thus get desired sliced output
Aout = A[np.ix_(xc,yc,zc)]
Thus, for the given data with window sizes array, N = [4,3,2], the whos info shows -
In [318]: whos
Variable Type Data/Info
-------------------------------
A ndarray 17x18x16: 4896 elems, type `int32`, 19584 bytes
Aout ndarray 9x7x5: 315 elems, type `int32`, 1260 bytes
The whos info for the output, Aout seems to be coherent with the intended output shape which must be 2N+1.

Related

Visualize 1D numpy array as 2D array with matplotlib

I have a 2D array of all the numbers 1 to 100 split by 10. And boolean values for each number being prime or not prime. I'm struggling to figure out how to visualize it like in the image below.
Here is my code to help understand what I have better.
I want to visualize it like this pic online.
# excersize
is_prime = np.ones(100, dtype=bool) # array will be filled with Trues since 1 = True
# For each integer j starting from 2, cross out its higher multiples:
N_max = int(np.sqrt(len(is_prime) - 1))
for j in range(2, N_max + 1):
is_prime[2*j::j] = False
# split an array up into multiple sub arrays
split_primes = np.split(is_prime, 10);
# create overlay for numbers
num_overlay = np.arange(100)
split_overlay = np.split(num_overlay, 10)
plt.plot(split_overlay)
Creating 2D array of the numbers
Check out the documentation for numpy's reshape function. Here you can turn your array into a 2D array by doing:
data = is_prime.reshape(10,10)
we can also make an array of the first 100 integers to use for labeling in a similar fashion:
integers = np.arange(100).reshape(10,10)
Plotting the 2D array
When plotting in 2D you need to use one of the 2D functions that matplotlib provides: e.g. imshow, matshow, pcolormesh. You can either call these functions directly on your array, in which case they will use a colormap and each pixel's color will correspond to the value in associated spot in the array. Or you can explicitly make an RGB image which affords you a bit more control over the color of each box. For this case I think that that is a bit easier to do so the below solution uses that approach. However if you want to annotate heatmaps the matplolib documentation has a great resource for that here. For now we will create an array of RGB values (shape of 10 by 10 by 3) and change the colors of only the prime numbers using numpy's indexing abilities.
#create RGB array that we will fill in
rgb = np.ones((10,10,3)) #start with an array of white
rgb[data]=[1,1,0] # color the places where the data is prime to be white
plt.figure(figsize=(10,10))
plt.imshow(rgb)
# add number annotations
integers = np.arange(100).reshape(10,10)
#add annotations based on: https://stackoverflow.com/questions/20998083/show-the-values-in-the-grid-using-matplotlib
for (i, j), z in np.ndenumerate(integers):
plt.text(j, i, '{:d}'.format(z), ha='center', va='center',color='k',fontsize=15)
# remove axis and tick labels
plt.axis('off')
plt.show()
Resulting in this image:

How to check the presence of a given numpy array in a larger-shape numpy array?

I guess the title of my question might not be very clear..
I have a small array, say a = ([[0,0,0],[0,0,1],[0,1,1]]). Then I have a bigger array of a higher dimension, say b = ([[[2,2,2],[2,0,1],[2,1,1]],[[0,0,0],[3,3,1],[3,1,1]],[...]]).
I'd like to check if one of the elements of a can be found in b. In this case, I'd find that the first element of a [0,0,0] is indeed in b, and then I'd like to retrieve the corresponding index in b.
I'd like to do that avoiding looping, since from the very little I understood from numpy arrays, they are not meant to be iterated over in a classic way. In other words, I need it to be very fast, because my actual arrays are quite big.
Any idea?
Thanks a lot!
Arnaud.
I don't know of a direct way, but I here's a function that works around the problem:
import numpy as np
def find_indices(val, arr):
# first take a mean at the lowest level of each array,
# then compare these to eliminate the majority of entries
mb = np.mean(arr, axis=2); ma = np.mean(val)
Y = np.argwhere(mb==ma)
indices = []
# Then run a quick loop on the remaining elements to
# eliminate arrays that don't match the order
for i in range(len(Y)):
idx = (Y[i,0],Y[i,1])
if np.array_equal(val, arr[idx]):
indices.append(idx)
return indices
# Sample arrays
a = np.array([[0,0,0],[0,0,1],[0,1,1]])
b = np.array([ [[6,5,4],[0,0,1],[2,3,3]], \
[[2,5,4],[6,5,4],[0,0,0]], \
[[2,0,2],[3,5,4],[5,4,6]], \
[[6,5,4],[0,0,0],[2,5,3]] ])
print(find_indices(a[0], b))
# [(1, 2), (3, 1)]
print(find_indices(a[1], b))
# [(0, 1)]
The idea is to use the mean of each array and compare this with the mean of the input. np.argwhere() is the key here. That way you remove most of the unwanted matches, but I did need to use a loop on the remainder to avoid the unsorted matches (this shouldn't be too memory-consuming). You'll probably want to customise it further, but I hope this helps.

Construct NumPy matrix row by row

I'm trying to construct a 2D NumPy array from values in an extant 2D NumPy array using an iterative process. Using ordinary python lists the process I'm describing would look like so:
coords = #data from file contained in a 2D list
d = #integer
edges = []
for i in range(d+1):
for j in range(i+1, d+1):
edge = coords[j] - coords[i]
edges.append(edge)
However, the NumPy array imposes restrictions that do not permit the process shown above. Below I try to do the same thing using NumPy arrays, and it should immediately be clear where the problems are:
coords = np.genfromtxt('Energies.txt', dtype=float, skip_header=1)
d = #integer
#how to initialize?
for i in range(d+1):
for j in range(i+1, d+1):
edge = coords[j] - coords[i]
#how to append?
Because .append does not exist for NumPy arrays I need to rely on concatenate or stack instead. But these functions are designed to join existing arrays, and I don't have anything to concatenate or stack until after the first iteration of my loop. So I suppose I need to change my data flow, but I'm unsure how to go about this.
Any help would be greatly appreciated. Thanks in advance.
that function is numpy.meshgrid [1] , the function does it by default.
[1] https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.meshgrid.html

jupyter notebook prints output even when suppressed

Why is it printing the bins from the histogram?
Shouldn't the semicolon suppress it?
In [1]
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all";
In [2]
%matplotlib inline
data ={'first':np.random.rand(100),
'second':np.random.rand(100)}
fig, axes = plt.subplots(2)
for idx, k in enumerate(data):
axes[idx].hist(data[k], bins=20);
You've set InteractiveShell.ast_node_interactivity = "all";, so you've set all nodes to have ast interactivity enabled. So you get the values of data = {..}
And ; works only for the last top level expression, axes[idx].hist(data[k], bins=20); is not a top level expression, as it is nested in the for, the last top level node is the for, which is a statement.
Simply add a last no-op statement, and end it with ;
%matplotlib inline
data ={'first':np.random.rand(100),
'second':np.random.rand(100)};
fig, axes = plt.subplots(2);
for idx, k in enumerate(data):
axes[idx].hist(data[k], bins=20)
pass; # or None; 0; "foo"; ...
And you won't have any outputs.
Use codetransformer %%ast magic to quickly see the ast of an expression.
If you read the documentation, you will see exactly what it returns - a three item tuple described below. You can display it in the notebook by placing a ? at the end of the call to the histogram. It looks like your InteractiveShell is making it display. Normally, yes a semicolon would suppress the output, although inside of a loop it would be unnecessary.
Returns
n : array or list of arrays
The values of the histogram bins. See normed and weights
for a description of the possible semantics. If input x is an
array, then this is an array of length nbins. If input is a
sequence arrays [data1, data2,..], then this is a list of
arrays with the values of the histograms for each of the arrays
in the same order.
bins : array
The edges of the bins. Length nbins + 1 (nbins left edges and right
edge of last bin). Always a single array even when multiple data
sets are passed in.
patches : list or list of lists
Silent list of individual patches used to create the histogram
or list of such list if multiple input datasets.

Indexing matrix in numpy using Node id

Is there a way to index a numpy matrix, built via networkx as an adjacenjy matrix, using node name
(I built the networkx graph parsing lines from a .txt file.
Each line represents an edge and it's in the form SourceNode:DestNode:EdgeWeight)
I need the matrix because I'm going to calculate the hitting probabilities of some nodes
Regardless of how you constructed your graph, you can compute an adjacency matrix of it. The docs state that the order of the rows and columns in this graph will be "as produced by G.nodes()" if you don't specify it.
For example,
# create your graph
G = nx.DiGraph()
with open("spec.txt") as f:
for line in f:
for src, dest, weight in line.split(':'):
G.add_edge(src, dest, weight=weight)
# create adjacency matrix
# - store index now, in case graph is changed.
nodelist = G.nodes()
# extract matrix, and convert to dense representation
A = nx.adjacency_matrix(G, nodelist=nodelist).todense()
# normalise each row by incoming edges, or whatever
B = A / A.sum(axis=1).astype(float)
Let us presume that your nodes are labelled alphabetically, C-G. The node ordering is just according to the dictionary hash, and this sequence for me: ['C', 'E', 'D', 'G', 'F'].
If you want to look up information from the matrix, you could use a lookup like this:
ix = nodelist.index('D') # ix is 2 here
print A[ix,:]