Is there a Julia analogue to numpy.argmax? - numpy

In Python, there is numpy.argmax:
In [7]: a = np.random.rand(5,3)
In [8]: a
Out[8]:
array([[ 0.00108039, 0.16885304, 0.18129883],
[ 0.42661574, 0.78217538, 0.43942868],
[ 0.34321459, 0.53835544, 0.72364813],
[ 0.97914267, 0.40773394, 0.36358753],
[ 0.59639274, 0.67640815, 0.28126232]])
In [10]: np.argmax(a,axis=1)
Out[10]: array([2, 1, 2, 0, 1])
Is there a Julia analogue to Numpy's argmax? I only found a indmax, which only accept a vector, not a two dimensional array as np.argmax.

The fastest implementation will usually be findmax (which allows you to reduce over multiple dimensions at once, if you wish):
julia> a = rand(5, 3)
5×3 Array{Float64,2}:
0.867952 0.815068 0.324292
0.44118 0.977383 0.564194
0.63132 0.0351254 0.444277
0.597816 0.555836 0.32167
0.468644 0.336954 0.893425
julia> mxval, mxindx = findmax(a; dims=2)
([0.8679518267243425; 0.9773828942695064; … ; 0.5978162823947759; 0.8934254589671011], CartesianIndex{2}[CartesianIndex(1, 1); CartesianIndex(2, 2); … ; CartesianIndex(4, 1); CartesianIndex(5, 3)])
julia> mxindx
5×1 Array{CartesianIndex{2},2}:
CartesianIndex(1, 1)
CartesianIndex(2, 2)
CartesianIndex(3, 1)
CartesianIndex(4, 1)
CartesianIndex(5, 3)

According to the Numpy documentation, argmax provides the following functionality:
numpy.argmax(a, axis=None, out=None)
Returns the indices of the maximum values along an axis.
I doubt a single Julia function does that, but combining mapslices and argmax is just the ticket:
julia> a = [ 0.00108039 0.16885304 0.18129883;
0.42661574 0.78217538 0.43942868;
0.34321459 0.53835544 0.72364813;
0.97914267 0.40773394 0.36358753;
0.59639274 0.67640815 0.28126232] :: Array{Float64,2}
julia> mapslices(argmax,a,dims=2)
5x1 Array{Int64,2}:
3
2
3
1
2
Of course, because Julia's array indexing is 1-based (whereas Numpy's array indexing is 0-based), each element of the resulting Julia array is offset by 1 compared to the corresponding element in the resulting Numpy array. You may or may not want to adjust that.
If you want to get a vector rather than a 2D array, you can simply tack [:] at the end of the expression:
julia> b = mapslices(argmax,a,dims=2)[:]
5-element Array{Int64,1}:
3
2
3
1
2

To add to the jub0bs's answer, argmax in Julia 1+ mirrors the behavior of np.argmax, by replacing axis with dims keyword, returning CarthesianIndex instead of index along given dimension:
julia> a = [ 0.00108039 0.16885304 0.18129883;
0.42661574 0.78217538 0.43942868;
0.34321459 0.53835544 0.72364813;
0.97914267 0.40773394 0.36358753;
0.59639274 0.67640815 0.28126232] :: Array{Float64,2}
julia> argmax(a, dims=2)
5×1 Array{CartesianIndex{2},2}:
CartesianIndex(1, 3)
CartesianIndex(2, 2)
CartesianIndex(3, 3)
CartesianIndex(4, 1)
CartesianIndex(5, 2)

Related

How to convert a vector of vectors into a DataFrame in Julia, without for loop?

I created a Vector of Vectors, named all_arrays in Julia in this way for a specific purpose:
using DataFrames
using StatsBase
list_of_numbers = 1:17
all_arrays = [zeros(Float64, (17,)) for i in 1:1000]
round = 1
while round != 1001
random_array = StatsBase.sample(1:17 , length(list_of_numbers))
random_array = random_array/sum(random_array)
if (0.0 in random_array) || (random_array in all_arrays)
continue
end
all_arrays[round] = random_array
round += 1
println(round)
end
The dimension of all_arrays is:
julia> size(all_arrays)
(1000,)
Then I want to convert all_arrays into a DataFrame with 1000*17 dimensions (Note that each vector in the all_arrays is a (17,) shape Vector). I tried This way:
df = DataFrames.DataFrame(zeros(1000,17) , :auto)
for idx in 1:length(all_arrays)
df[idx , :] = all_arrays[idx]
end
But I'm looking for a straightforward way for this instead of a for loop and a prebuilt DataFrame! Is there any?
If you want simple code use (the length of the code is the same as below, but I find it conceptually simpler):
DataFrame(mapreduce(permutedims, vcat, all_arrays), :auto)
For such small data as you described this should be efficient enough.
If you want something faster use:
DataFrame([getindex.(all_arrays, i) for i in 1:17], :auto, copycols=false)
Here is a benchmark:
julia> using BenchmarkTools
julia> #btime DataFrame(mapreduce(permutedims, vcat, $all_arrays), :auto);
7.257 ms (3971 allocations: 65.22 MiB)
julia> #btime DataFrame([getindex.($all_arrays, i) for i in 1:17], :auto, copycols=false);
41.000 μs (88 allocations: 140.66 KiB)

lost dimension in numpy array acquired from a dataframe

I have a dataframe (you may see the image of the dataframe from the provided link).
df.shape, type(girdi), type(girdi.iloc[0, 0])
>>>(10292, 5), pandas.core.frame.DataFrame, numpy.ndarray)
Each value in this dataframe is a NumPy array with 55 data points in it (55, ).
df.iloc[0,0]
>>>array([64.75, 65.62, 64.21, 64.62, 63.94, 62.63, 62.24, 62.65, 62.47,
63.17, 63.46, 63.75, 65.41, 65.35, 65.68, 65.97, 66.6 , 66.45,
66.11, 65.48, 64.22, 63.54, 62.81, 63.58, 62.46, 61.23, 62.26,
61.13, 61.68, 61.36, 61.93, 61.48, 61.92, 62.43, 63.37, 62.59,
63.33, 63.52, 63.23, 62.52, 63.03, 63.61, 63.83, 63.7 , 63.94,
65.14, 66. , 66.65, 65.87, 64.93, 65.84, 64.75, 65.5 , 65.7 ,
66.83])
dataframe
When I convert the whole dataframe to a NumPy array NumPy does not recognize the 3rd dimension.
X = np.array(df)
X.shape
>>>(10292, 5)
X.shape, type(X)
>>>((10292, 5), numpy.ndarray)
X[0].shape, type(X[0])
>>>((5,), numpy.ndarray)
X[0, 0].shape, type(X[0, 0])
>>>((55,), numpy.ndarray)
I expect (and desire) to get:
X.shape, X[0, 0, 0]
>>>(10292, 5, 55), 64.75
To access the data, using X[0][0][0] or X[0, 0] [0] helps, though it does not help my needs. I want to access the data with X[0, 0, 0].
I tried using np.vstack or np.expand_dims, though I was unsuccessful. How may I turn the data to (10292, 5, 55) dimensions?
Thank you,
Evrim
I suppose you have to create a new array with the desired dimensions and traverse your matrix of arrays (X)
result = np.zeros(X.shape + X[0, 0].shape)
for i in range(X.shape[0]):
for j in range(X.shape[1]):
result[i, j] = X[i, j]

Simple computation in numpy

I have numpy array like this a = [-- -- -- 1.90 2.91 1.91 2.92]
I need to find % of values more than 2, so here it is 50%.
How to get the same in easy way? also, why len(a) gives 7 (instead of 4)?
Try this:
import numpy as np
import numpy.ma as ma
a = ma.array([0, 1, 2, 1.90, 2.91, 1.91, 2.92])
for i in range(3):
a[i] = ma.masked
print(a)
print(np.sum(a>2)/((len(a) - ma.count_masked(a))))
The last line prints 0.5 which is your 50%. It subtracted from the total length of your array (7) the number of masked elements (3) which you see as the three "--" in the output you posted.
Generally speaking, you can simply use
a = np.array([...])
threshold = 2.0
fraction_higher = (a > threshold).sum() / len(a) # in [0, 1)
percentage_higher = fraction_higher * 100
The array contains 7 elements, being 3 of them masked. This code emulates the test case, generating a masked array as well:
# generate the test case: a masked array
a = np.ma.array([-1, -1, -1, 1.90, 2.91, 1.91, 2.92], mask=[1, 1, 1, 0, 0, 0, 0])
# check its format
print(a)
[-- -- -- 1.9 2.91 1.91 2.92]
#print the output
print(a[a>2].count() / a.count())
0.5

Is there a numpy function like np.fill(), but for arrays as fill value?

I'm trying to build an array of some given shape in which all elements are given by another array. Is there a function in numpy which does that efficiently, similar to np.full(), or any other elegant way, without simply employing for loops?
Example: Let's say I want an array with shape
(dim1,dim2) filled with a given, constant scalar value. Numpy has np.full() for this:
my_array = np.full((dim1,dim2),value)
I'm looking for an analog way of doing this, but I want the array to be filled with another array of shape (filldim1,filldim2) A brute-force way would be this:
my_array = np.array([])
for i in range(dim1):
for j in range(dim2):
my_array = np.append(my_array,fill_array)
my_array = my_array.reshape((dim1,dim2,filldim1,filldim2))
EDIT
I was being stupid, np.full() does take arrays as fill value if the shape is modified accordingly:
my_array = np.full((dim1,dim2,filldim1,filldim2),fill_array)
Thanks for pointing that out, #Arne!
You can use np.tile:
>>> shape = (2, 3)
>>> fill_shape = (4, 5)
>>> fill_arr = np.random.randn(*fill_shape)
>>> arr = np.tile(fill_arr, [*shape, 1, 1])
>>> arr.shape
(2, 3, 4, 5)
>>> np.all(arr[0, 0] == fill_arr)
True
Edit: better answer, as suggested by #Arne, directly using np.full:
>>> arr = np.full([*shape, *fill_shape], fill_arr)
>>> arr.shape
(2, 3, 4, 5)
>>> np.all(arr[0, 0] == fill_arr)
True

Looping through each item in a numpy array?

I'm trying to access each item in a numpy 2D array.
I'm used to something like this in Python [[...], [...], [...]]
for row in data:
for col in data:
print(data[row][col])
but now, I have a data_array = np.array(features)
How can I iterate through it the same way?
Try np.ndenumerate:
>>> a =numpy.array([[1,2],[3,4]])
>>> for (i,j), value in np.ndenumerate(a):
... print(i, j, value)
...
0 0 1
0 1 2
1 0 3
1 1 4
Make a small 2d array, and a nested list from it:
In [241]: A=np.arange(6).reshape(2,3)
In [242]: alist= A.tolist()
In [243]: alist
Out[243]: [[0, 1, 2], [3, 4, 5]]
One way of iterating on the list:
In [244]: for row in alist:
...: for item in row:
...: print(item)
...:
0
1
2
3
4
5
works just same for the array
In [245]: for row in A:
...: for item in row:
...: print(item)
...:
0
1
2
3
4
5
Now neither is good if you want to modify elements. But for crude iteration over all elements this works.
WIth the array I can easily treat it was a 1d
In [246]: [i for i in A.flat]
Out[246]: [0, 1, 2, 3, 4, 5]
I could also iterate with nested indices
In [247]: [A[i,j] for i in range(A.shape[0]) for j in range(A.shape[1])]
Out[247]: [0, 1, 2, 3, 4, 5]
In general it is better to work with arrays without iteration. I give these iteration examples to clearup some confusion.
If you want to access an item in a numpy 2D array features, you can use features[row_index, column_index]. If you wanted to iterate through a numpy array, you could just modify your script to
for row in data:
for col in data:
print(data[row, col])