Looping through each item in a numpy array? - numpy

I'm trying to access each item in a numpy 2D array.
I'm used to something like this in Python [[...], [...], [...]]
for row in data:
for col in data:
print(data[row][col])
but now, I have a data_array = np.array(features)
How can I iterate through it the same way?

Try np.ndenumerate:
>>> a =numpy.array([[1,2],[3,4]])
>>> for (i,j), value in np.ndenumerate(a):
... print(i, j, value)
...
0 0 1
0 1 2
1 0 3
1 1 4

Make a small 2d array, and a nested list from it:
In [241]: A=np.arange(6).reshape(2,3)
In [242]: alist= A.tolist()
In [243]: alist
Out[243]: [[0, 1, 2], [3, 4, 5]]
One way of iterating on the list:
In [244]: for row in alist:
...: for item in row:
...: print(item)
...:
0
1
2
3
4
5
works just same for the array
In [245]: for row in A:
...: for item in row:
...: print(item)
...:
0
1
2
3
4
5
Now neither is good if you want to modify elements. But for crude iteration over all elements this works.
WIth the array I can easily treat it was a 1d
In [246]: [i for i in A.flat]
Out[246]: [0, 1, 2, 3, 4, 5]
I could also iterate with nested indices
In [247]: [A[i,j] for i in range(A.shape[0]) for j in range(A.shape[1])]
Out[247]: [0, 1, 2, 3, 4, 5]
In general it is better to work with arrays without iteration. I give these iteration examples to clearup some confusion.

If you want to access an item in a numpy 2D array features, you can use features[row_index, column_index]. If you wanted to iterate through a numpy array, you could just modify your script to
for row in data:
for col in data:
print(data[row, col])

Related

numpy append in a for loop with different sizes

I have a for loop but where i has changes by 2 and i want to save a value in a numpy array in each iteration that that changes by 1.
n = 8 #steps
# random sequence
rand_seq = np.zeros(n-1)
for i in range(0, (n-1)*2, 2):
curr_state= i+3
I want to get curr_state outside the loop in the rand_seq array (seven values).
can you help me with that?
thanks a lot
A much simpler version (if I understand the question correctly) would be:
np.arange(3, 15+1, 2)
where 3 = start, 15 = stop, 2 = step size.
In general, when using numpy try to avoid adding elements in a for loop as this is inefficient. I would suggest checking out the documentation of np.arange(), np.array() and np.zeros() as in my experience, these will solve 90% of array - creation issues.
A straight forward list iteration:
In [313]: alist = []
...: for i in range(0,(8-1)*2,2):
...: alist.append(i+3)
...:
In [314]: alist
Out[314]: [3, 5, 7, 9, 11, 13, 15]
or cast as a list comprehension:
In [315]: [i+3 for i in range(0,(8-1)*2,2)]
Out[315]: [3, 5, 7, 9, 11, 13, 15]
Or if you make an array with the same range parameters:
In [316]: arr = np.arange(0,(8-1)*2,2)
In [317]: arr
Out[317]: array([ 0, 2, 4, 6, 8, 10, 12])
you can add the 3 with one simple expression:
In [318]: arr + 3
Out[318]: array([ 3, 5, 7, 9, 11, 13, 15])
With lists, iteration and comprehensions are great. With numpy you should try to make an array, such as with arange, and modify that with whole-array methods (not with iterations).

Numpy fancy indexing with 2D array - explanation

I am (re)building up my knowledge of numpy, having used it a little while ago.
I have a question about fancy indexing with multidimenional (in this case 2D) arrays.
Given the following snippet:
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> i = np.array( [ [0,1], # indices for the first dim of a
... [1,2] ] )
>>> j = np.array( [ [2,1], # indices for the second dim
... [3,3] ] )
>>>
>>> a[i,j] # i and j must have equal shape
array([[ 2, 5],
[ 7, 11]])
Could someone explain in simple English, the logic being applied to give the results produced. Ideally, the explanation would be applicable for 3D and higher rank arrays being used to index an array.
Conceptually (in terms of restrictions placed on "rows" and "columns"), what does it mean to index using a 2D array?
Conceptually (in terms of restrictions placed on "rows" and "columns"), what does it mean to index using a 2D array?
It means you are constructing a 2d array R, such that R=A[B, C]. This means that the value for rij=abijcij.
So it means that the item located at R[0,0] is the item in A with as row index B[0,0] and as column index C[0,0]. The item R[0,1] is the item in A with row index B[0,1] and as column index C[0,1], etc.
So in this specific case:
>>> b = a[i,j]
>>> b
array([[ 2, 5],
[ 7, 11]])
b[0,0] = 2 since i[0,0] = 0, and j[0,0] = 2, and thus a[0,2] = 2. b[0,1] = 5 since i[0,0] = 1, and j[0,0] = 1, and thus a[1,1] = 5. b[1,0] = 7 since i[0,0] = 1, and j[0,0] = 3, and thus a[1,3] = 7. b[1,1] = 11 since i[0,0] = 2, and j[0,0] = 3, and thus a[2,3] = 11.
So you can say that i will determine the "row indices", and j will determine the "column indices". Of course this concept holds in more dimensions as well: the first "indexer" thus determines the indices in the first index, the second "indexer" the indices in the second index, and so on.

Pandas - Row mask and 2d ndarray assignement

Got some problems with pandas, I think I'm not using it properly, and I would need some help to do it right.
So, I got a mask for rows of a dataframe, this mask is a simple list of Boolean values.
I would like to assign a 2D array, to a new or existing column.
mask = some_row_mask()
my2darray = some_operation(dataframe.loc[mask, column])
dataframe.loc[mask, new_or_exist_column] = my2darray
# Also tried this
dataframe.loc[mask, new_or_exist_column] = [f for f in my2darray]
Example data:
dataframe = pd.DataFrame({'Fun': ['a', 'b', 'a'], 'Data': [10, 20, 30]})
mask = dataframe['Fun']=='a'
my2darray = [[0, 1, 2, 3, 4], [4, 3, 2, 1, 0]]
column = 'Data'
new_or_exist_column = 'NewData'
Expected output
Fun Data NewData
0 a 10 [0, 1, 2, 3, 4]
1 b 20 NaN
2 a 30 [4, 3, 2, 1, 0]
dataframe[mask] and my2darray have both the exact same number of rows, but it always end with :
ValueError: Mus have equal len keys and value when setting with ndarray.
Thanks for your help!
EDIT - In context:
I just add some precisions, it was made for filling folds steps by steps: I compute and set some values from sub part of the dataframe.
Instead of this, according to Parth:
dataframe[new_or_exist_column]=pd.Series(my2darray, index=mask[mask==True].index)
I changed to this:
dataframe.loc[mask, out] = pd.Series([f for f in features], index=mask[mask==True].index)
All values already set are overwrite by NaN values otherwise.
I miss to give some informations about it.
Thanks!
Try this:
dataframe[new_or_exist_column]=np.nan
dataframe[new_or_exist_column]=pd.Series(my2darray, index=mask[mask==True].index)
It will give desired output:
Fun Data NewData
0 a 10 [0, 1, 2, 3, 4]
1 b 20 NaN
2 a 30 [4, 3, 2, 1, 0]

update values in dataframe

I have a dataframe in which the second column is an array. I have an another dataframe which has 2 columns, from which the value has to be updated in the first dataframe.
I already tried using update, explode, map, assign method.
df = pd.DataFrame({'Account': ['A1','A2','A3']})
groups = np.array([['g1','g2'],['g3','g4'],['g1','g2','g3']])
df["Group"] = groups.tolist()
key_values = pd.DataFrame({'ID': ['1','2','3','4','5'],'Group': ['g1','g2','g3','g4','g5']})
keys = key_values.set_index('Key')['ID']
ag = Accounts_Group.explode('Group')
Setup
m = key_values.set_index('Group')['ID']
Option 1
explode + map
f = df.explode('Group')
res = f['Group'].map(m).groupby(level=0).agg(list)
0 [1, 2]
1 [3, 4]
2 [1, 2, 3]
Name: Group, dtype: object
Option 2
List comprehension + map
res = [[*map(m.get, el)] for el in df['Group']]
[['1', '2'], ['3', '4'], ['1', '2', '3']]
To assign it back:
df.assign(Group=res)
Account Group
0 A1 [1, 2]
1 A2 [3, 4]
2 A3 [1, 2, 3]
Firstly convert them to strings and replace them. Then you can convert them to list again from string using ast
import ast
df['keys']=df.astype(str).replace(to_replace=list(key_values['Group']),value=list(key_values['ID']),regex=True)['Group']
df['keys']=df['keys'].apply(lambda x: ast.literal_eval(x))
print(df)
Account Group keys
0 A1 [g1, g2] [1, 2]
1 A2 [g3, g4] [3, 4]
2 A3 [g1, g2, g3] [1, 2, 3]

What is the difference between a[:,:-1] and a[:,-1]?

How to understand the difference between a[:,:-1] and a[:,-1]?
a = np.array([[1,2],[3,4],[5,6]])
b = a[:,:-1]
print b
The output for this is:
[[1]
[3]
[5]]
And for the following code-
b = a[:,-1]
print b
The output is:
[2 4 6]
Let's create another numpy array for understanding.
my_array = np.array([[1,2,3],[4,5,6],[7,8,9]])
This array contains three different arrays. That is, my_array is an array of arrays.
type(my_array) and type(my_array[0]) will both return numpy.ndarray
When you execute my_array[:,-1], this means goto every element in my_array and print the last item in that element. The : before , means all and -1 means the last element.
So the output of my_array[:,-1] will be
array([3, 6, 9])
meaning- The last element of each array inside my_array.
Now, when you execute my_array[:,:-1], the output is:
array([[1, 2],
[4, 5],
[7, 8]])
Meaning- print all the items in all arrays of my_array except the last item.
Here : means goto all elements and :-1 means exclude the last item.