Replace values by indices of corresponding to another array - numpy

I am a newbie in numpy. I have an array A of size [x,] of values and an array B of size [y,] (y > x). I want as result an array C of size [x,] filled with indices of B.
Here is an example of inputs and outputs:
>>> A = [10, 20, 30, 10, 40, 50, 10, 50, 20]
>>> B = [10, 20, 30, 40, 50]
>>> C = #Some operations
>>> C
[0, 1, 2, 0, 3, 4, 0, 4, 1]
I didn't find the way how to do this. Please advice me. Thank you.

I think you are looking for searchsorted, assuming that B is sorted increasingly:
C = np.searchsorted(B,A)
Output:
array([0, 1, 2, 0, 3, 4, 0, 4, 1])
Update for general situation where B is not sorted. We can do an argsort:
# let's swap 40 and 50 in B
# expect the output to have 3 and 4 swapped
B = [10, 20, 30, 50, 40]
BB = np.sort(B)
C = np.argsort(B)[np.searchsorted(BB,A)]
Output:
array([0, 1, 2, 0, 4, 3, 0, 3, 1], dtype=int64)
You can double check:
(np.array(B)[C] == A).all()
# True

For general python lists
A = [10, 20, 30, 10, 40, 50, 10, 50, 20]
B = [10, 20, 30, 40, 50]
C = [A.index(e) for e in A if e in B]
print(C)

You can try this code
A = np.array([10, 20, 30, 10, 40, 50, 10, 50, 20])
B = np.array([10, 20, 30, 40, 50])
np.argmax(B==A[:,None],axis=1)

Related

np.array for variable matrix

import numpy as np
data = np.array([[10, 20, 30, 40, 50, 60, 70, 80, 90],
[2, 7, 8, 9, 10, 11],
[3, 12, 13, 14, 15, 16],
[4, 3, 4, 5, 6, 7, 10, 12]],dtype=object)
target = data[:,0]
It has this error.
IndexError Traceback (most recent call last)
Input In \[82\], in \<cell line: 9\>()
data = np.array(\[\[10, 20, 30, 40, 50, 60, 70, 80, 90\],
\[2, 7, 8, 9, 10, 11\],
\[3, 12, 13, 14, 15, 16\],
\[4, 3, 4, 5, 6, 7, 10,12\]\],dtype=object)
# Define the target data ----\> 9 target = data\[:,0\]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
May I know how to fix it, please? I mean do not change the elements in the data. Many thanks. I made the matrix in the same size and the error message was gone. But I have the data with variable size.
You have a array of objects, so you can't use indexing on axis=1 as there is none (data.shape -> (4,)).
Use a list comprehension:
out = np.array([a[0] for a in data])
Output: array([10, 2, 3, 4])

Creating multiple columns in pandas with lambda function

I'm trying to create a set of new columns with growth rates within my df in a more efficient way than multiply imputing them one by one.
My df has +100 variables, but for simplicity, assume the following:
consumption = [5, 10, 15, 20, 25, 30, 35, 40]
wage = [10, 20, 30, 40, 50, 60, 70, 80]
period = [1, 2, 3, 4, 5, 6, 7, 8]
id = [1, 1, 1, 1, 1, 1, 1, 1]
tup= list(zip(id , period, wage))
df = pd.DataFrame(tup,
columns=['id ', 'period', 'wage'])
With two variables I could simply do this:
df['wage_chg']= df.sort_values(by=['id', 'period']).groupby(['id'])['wage'].apply(lambda x: (x/x.shift(4)-1)).fillna(0)
df['consumption_chg']= df.sort_values(by=['id', 'period']).groupby(['id'])['consumption'].apply(lambda x: (x/x.shift(4)-1)).fillna(0)
But maybe by using a for loop or something I could iterate over my column names creating new growth rate columns with the name columnname_chg as in the example above.
Any ideas?
Thanks
You can try DataFrame operation rather than Series operation in groupby.apply
cols = ['wage', 'columnname']
out = df.join(df.sort_values(by=['id', 'period'])
.groupby(['id'])[cols]
.apply(lambda g: (g/g.shift(4)-1)).fillna(0)
.add_suffix('_chg'))

How do I combine 2 arraylist into a list of lists in java

I want to convert 2 arraylists to an arraylist list of arrays
newList3 = [-50, 30, -20, 0, 20, -30, 50]
newList4 = [1, 1, 1, 1, 1, 1, 1]
I want to return:
[[-50, 1], [30, 1], [-20, 1], [0, 1], [20, 1], [-30, 1], [50, 1]]
The only result I can get is:
[-50, 1, 30, 1, -20, 1, 0, 1, 20, 1, -30, 1, 50, 1]
I have tried
a = newList3.get(0);
b = newList4.get(0);
newList.add(a);
newList.add(b);
newList.add(newList2);
newList.clear();
a = newList3.get(1);
b = newList4.get(1);
newList.add(a);
newList.add(b);
newList.add(newList);
The operation you are looking for is called zip operation.
IntStream
.range(0, Math.min(list1.size(), list2.size()))
.mapToObj(i -> Arrays.asList(list1.get(i), list2.get(i)))
.collect(Collectors.toList());
Here, Since we'll be iterating over the lists we need the index. So, we're using IntStream.range to generate the index ranges. And then we're using the mapToObj to zip the 2 lists.
And in range, we're going from 0 to the list size which has minimum elements.

How to compare a 2D array against a 1D array column-wise?

I have two numpy arrays. One of them is 2D while the other is 1D.
>>> a = np.arange(0,20).reshape(2,10)
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
>>> b = np.full( a.shape[1], 10 )
>>> b
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
I want to compare them column-wise:
If the columns elements in a is identical to the column element of b, then store row number(s) of a.
Else, find the closest matching of a to b and store the row number(s).
In my example, the output from the comparison should be:
[ 1, [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1] ]
How do I do this in NumPy?
I was thinking of using np.where( a==b, run a function to get row(s) if same, run another function to get row(s) of diff )? Is this the way?

MultiPoint crossover using Numpy

I am trying to do crossover on a Genetic Algorithm population using numpy.
I have sliced the population using parent 1 and parent 2.
population = np.random.randint(2, size=(4,8))
p1 = population[::2]
p2 = population[1::2]
But I am not able to figure out any lambda or numpy command to do a multi-point crossover over parents.
The concept is to take ith row of p1 and randomly swap some bits with ith row of p2.
I think you want to select from p1 and p2 at random, cell by cell.
To make it easier to understand i've changed p1 to be 10 to 15 and p2 to be 20 to 25. p1 and p2 were generated at random in these ranges.
p1
Out[66]:
array([[15, 15, 13, 14, 12, 13, 12, 12],
[14, 11, 11, 10, 12, 12, 10, 12],
[12, 11, 14, 15, 14, 10, 13, 10],
[11, 12, 10, 13, 14, 13, 12, 13]])
In [67]: p2
Out[67]:
array([[23, 25, 24, 21, 24, 20, 24, 25],
[21, 21, 20, 20, 25, 22, 24, 22],
[24, 22, 25, 20, 21, 22, 21, 22],
[22, 20, 21, 22, 25, 23, 22, 21]])
In [68]: sieve=np.random.randint(2, size=(4,8))
In [69]: sieve
Out[69]:
array([[0, 1, 0, 1, 1, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 1, 1],
[0, 1, 1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 1, 1, 1, 1]])
In [70]: not_sieve=sieve^1 # Complement of sieve
In [71]: pn = p1*sieve + p2*not_sieve
In [72]: pn
Out[72]:
array([[23, 15, 24, 14, 12, 20, 12, 25],
[14, 11, 11, 20, 25, 12, 10, 12],
[24, 11, 14, 20, 21, 10, 13, 22],
[22, 20, 21, 13, 14, 13, 12, 13]])
The numbers in the teens come from p1 when sieve is 1
The numbers in the twenties come from p2 when sieve is 0
This may be able to be made more efficient but is this what you expect as output?