numpy / pandas array comparison with multiple values in other array - pandas

I have an array
a = np.arange(0, 100)
and another array with some cut-off points
b = np.array([5, 8, 15, 35, 76])
I want to create an array such that
c = [0, 0, 0, 0, 1, 1, 1, 2, 2, ..., 4, 4, 5]
Is there an elegant / fast way to do this? Possible in Pandas?

Here's a compact way -
(a[:,None]>=b).sum(1)
Another with cumsum -
p = np.zeros(len(a),dtype=int)
p[b] = 1
out = p.cumsum()
Another with searchsorted -
np.searchsorted(b,a,'right')
Another with repeat -
np.repeat(range(len(b)+1),np.ediff1d(b,to_begin=b[0],to_end=len(a)-b[-1]))
Another with isin and cumsum -
np.isin(a,b).cumsum()

Here is one way cut
pd.cut(a,[-np.Inf]+b.tolist()+[np.Inf]).codes
Out[383]:
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], dtype=int8)

Related

Numpy array value change via two index sets

I am trying to achieve the following:
# Before
raw = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Set values to 10
indice_set1 = np.array([0, 2, 4])
indice_set2 = np.array([0, 1])
raw[indice_set1][indice_set2] = 10
# Result
print(raw)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But the raw values remain exactly the same.
Expecting this:
# After
raw = np.array([10, 1, 10, 3, 4, 5, 6, 7, 8, 9])
After doing raw[indice_set1] you get a new array, which is the one you modify with the second slicing, not raw.
Instead, slice the slicer:
raw[indice_set1[indice_set2]] = 10
Modified raw:
array([10, 1, 10, 3, 4, 5, 6, 7, 8, 9])

Creating NxN matrix where each cell has the value of the nxn matrix it represents inside the large NxN matrix

Given NxN dimensions, I'm traying to create a functions that returns a list of values that represent cells from the NxN matrix. for example:
a_3x3 = [ # 3x3 pixel window
[3,3,3],
[3,1,3],
[3,3,3]
]
a_3x3_lis = [3, 3, 3, 3, 1, 3, 3, 3, 3] # same window flattend
a_5x5 = [ # 5x5 pixel window
[5,5,5,5,5],
[5,3,3,3,5],
[5,3,1,3,5],
[5,3,3,3,5],
[5,5,5,5,5]
]
a_5x5_lis = [5, 5, 5, 5, 5, 5, 3, 3, 3, 5, 5, 3, 1, 3, 5, 5, 3, 3, 3, 5, 5, 5, 5, 5, 5] # same window flattened
I've just created the lists manually so far but its no good for large matrixes
near_win_3x3 = [3, 3, 3, 3, 1, 3, 3, 3, 3]
near_win_5x5 = [5, 5, 5, 5, 5, 5, 3, 3, 3, 5, 5, 3, 1, 3, 5, 5, 3, 3, 3, 5, 5, 5, 5, 5, 5]
near_win_7x7 = [7, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 7, 7, 5, 3, 3, 3, 5, 7, 7, 5, 3, 1, 3, 5, 7, 7, 5, 3, 3, 3, 5, 7, 7, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 7, 7,]
One way using numpy.minimum:
def reversed_pyramid(n):
a = np.arange(n)
m = np.minimum(a, a[::-1])
return n - np.minimum.outer(m, m) * 2
Output:
# reversed_pyramid(7)
array([[7, 7, 7, 7, 7, 7, 7],
[7, 5, 5, 5, 5, 5, 7],
[7, 5, 3, 3, 3, 5, 7],
[7, 5, 3, 1, 3, 5, 7],
[7, 5, 3, 3, 3, 5, 7],
[7, 5, 5, 5, 5, 5, 7],
[7, 7, 7, 7, 7, 7, 7]])
The values in your array are a function of their manhattan distance to the center. To be specific, it's: f(d) = 1 + 2 * d.
def make_window(N):
return 1 + 2 * abs(np.stack(np.mgrid[:N, :N]) - (N - 1) // 2).max(0)
N=7 produces:
[[7 7 7 7 7 7 7]
[7 5 5 5 5 5 7]
[7 5 3 3 3 5 7]
[7 5 3 1 3 5 7]
[7 5 3 3 3 5 7]
[7 5 5 5 5 5 7]
[7 7 7 7 7 7 7]]

Creating empty pandas dataframe with Multi-Index

I'm trying to create an empty pandas.Dataframe with a Multi-Index that I can later fill columnwise with my data. I've looked at other answers (here and here), but they all work with data that does not fill in columnwise, or that is somehow connected in the different columns.
The information I want to be contained in the Multi-Index looks like this:
GCM_list = ['BCC-CSM2-MR', 'CAMS-CSM1-0', 'CESM2', 'CESM2-WACCM', 'CMCC-CM2-SR5', 'EC-Earth3', 'EC-Earth3-Veg', 'FGOALS-f3-L', 'GFDL-ESM4', 'INM-CM4-8', 'INM-CM5-0', 'MPI-ESM1-2-HR', 'MRI-ESM2-0', 'NorESM2-MM', 'TaiESM1']
SSP_list = ['SSP_126', 'SSP_245', 'SSP_370', 'SSP_585']
index_years = [2030, 2040, 2050, 2060, 2070, 2080, 2090, 2100]
And I want it to look somewhat like this (for the three first items in GCM_list):
BCC-CSM2-MR CAMS-CSM1-0 CESM2
SSP_126 SSP_245 SSP_370 SSP_585 SSP_126 SSP_245 SSP_370 SSP_585 SSP_126 SSP_245 SSP_370 SSP_585
2030 | |
2040 | |
2050 V V
2060 1 2
2070
2080
2090
2100
The "arrows" in the first two columns should represent how and in what order I want to fill the Dataframe after the Index is created - if that's important for this question.
I've tried building the index like this, but I'm not sure what to make of the result. How should I proceed? Is there a way to build this empty dataframe so that I can fill it column after column?
arrays = [GCM_list, SSP_list]
index = pd.MultiIndex.from_arrays(arrays, names=('GCM', 'SSP'))
>>> index
MultiIndex(levels=[[u'BCC-CSM2-MR', u'CAMS-CSM1-0', u'CESM2', u'CESM2-WACCM', u'CMCC-CM2-SR5', u'EC-Earth3', u'EC-Earth3-Veg', u'FGOALS-f3-L', u'GFDL-ESM4', u'INM-CM4-8', u'INM-CM5-0', u'MPI-ESM1-2-HR', u'MRI-ESM2-0', u'NorESM2-MM', u'TaiESM1'], [u'SSP_126', u'SSP_245', u'SSP_370', u'SSP_585']],
labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14], [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]],
names=[u'GCM', u'SSP'])
Use MultiIndex.from_product:
arrays = [GCM_list, SSP_list]
mux = pd.MultiIndex.from_product(arrays, names=('GCM', 'SSP'))
df = pd.DataFrame(columns=mux, index=index_years)

Select elements of a numpy array based on the elements of a second array

Consider a numpy array A of shape (7,6)
A = array([[0, 1, 2, 3, 5, 8],
[4, 100, 6, 7, 8, 7],
[8, 9, 10, 11, 5, 4],
[12, 13, 14, 15, 1, 2],
[1, 3, 5, 6, 4, 8],
[12, 23, 12, 24, 4, 3],
[1, 3, 5, 7, 89, 0]])
together with a second numpy array r of the same shape which contains the radius of A starting from a central point A(3,2)=0:
r = array([[3, 3, 3, 3, 3, 4],
[2, 2, 2, 2, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 1, 0, 1, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 4]])
I would like to pick up all the elements of A which are located at the position 1 of r, i.e. [9,10,11,15,4,6,5,13], all the elements of A located at position 2 of r and so on. I there some numpy function to do that?
Thank you
You can select a section of A by doing something like A[r == 1], to get all the sections as a list you could do [A[r == i] for i in range(r.max() + 1)]. This will work, but may be inefficient depending on how big the values in r go because you need to compute r == i for every i.
You could also use this trick, first sort A based on r, then simply split the sorted A array at the right places. That looks something like this:
r_flat = r.ravel()
order = r_flat.argsort()
A_sorted = A.ravel()[order]
r_sorted = r_flat[order]
edges = r_sorted.searchsorted(np.arange(r_sorted[-1] + 1), 'right')
sections = []
start = 0
for end in edges:
sections.append(A_sorted[start:end])
start = end
I get a different answer to the one you were expecting (3 not 4 from the 4th row) and the order is slightly different (strictly row then column), but:
>>> A
array([[ 0, 1, 2, 3, 5, 8],
[ 4, 100, 6, 7, 8, 7],
[ 8, 9, 10, 11, 5, 4],
[ 12, 13, 14, 15, 1, 2],
[ 1, 3, 5, 6, 4, 8],
[ 12, 23, 12, 24, 4, 3],
[ 1, 3, 5, 7, 89, 0]])
>>> r
array([[3, 3, 3, 3, 3, 4],
[2, 2, 2, 2, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 1, 0, 1, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 4]])
>>> A[r==1]
array([ 9, 10, 11, 13, 15, 3, 5, 6])
Alternatively, you can get column then row ordering by transposing both arrays:
>>> A.T[r.T==1]
array([ 9, 13, 3, 10, 5, 11, 15, 6])

Cytoscape.js not returning an accurate node degree on edge addition + removal

I'm building a graph which allows edges to be toggled on/off. I need to be able to add and remove them repeatedly. I have noticed this error with node degrees with nodes attached to toggled edges. I've included an example.
My code:
allElements = cy.elements();
....
var allEdges = allElements.filter('edge');
var allNodes = allElements.filter('node');
for(var i=0; i<5; i++){
// DELETE
var printThis = [];
allNodes.filter(function(i,ele){
printThis.push(ele.degree());
});
console.log(printThis);
cy.remove(allEdges);
cy.add(allEdges);
}
Returns:
[1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 6, 1, 2, 1, 1, 1, 36, 8, 3, 4, 4, 2, 1, 1, 1, 1, 1, 1, 2]
[1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 6, 1, 2, 1, 1, 1, 36, 8, 3, 4, 4, 2, 1, 1, 1, 1, 1, 1, 2]
[2, 2, 2, 2, 2, 6, 2, 2, 2, 2, 2, 12, 2, 4, 2, 2, 2, 72, 16, 6, 8, 8, 4, 2, 2, 2, 2, 2, 2, 4]
[3, 3, 3, 3, 3, 9, 3, 3, 3, 3, 3, 18, 3, 6, 3, 3, 3, 108, 24, 9, 12, 12, 6, 3, 3, 3, 3, 3, 3, 6]
[4, 4, 4, 4, 4, 12, 4, 4, 4, 4, 4, 24, 4, 8, 4, 4, 4, 144, 32, 12, 16, 16, 8, 4, 4, 4, 4, 4, 4, 8]
Which shows that removing edges after the first time dont decrease the degree of the nodes they're attached to.
How can I have cytoscape return the correct degree?
Thank you for notifying us of the issue. We will get a fix in for 2.0.3 -M
https://github.com/cytoscape/cytoscape.js/issues/360