In Python, there is numpy.argmax:
In [7]: a = np.random.rand(5,3)
In [8]: a
Out[8]:
array([[ 0.00108039, 0.16885304, 0.18129883],
[ 0.42661574, 0.78217538, 0.43942868],
[ 0.34321459, 0.53835544, 0.72364813],
[ 0.97914267, 0.40773394, 0.36358753],
[ 0.59639274, 0.67640815, 0.28126232]])
In [10]: np.argmax(a,axis=1)
Out[10]: array([2, 1, 2, 0, 1])
Is there a Julia analogue to Numpy's argmax? I only found a indmax, which only accept a vector, not a two dimensional array as np.argmax.
The fastest implementation will usually be findmax (which allows you to reduce over multiple dimensions at once, if you wish):
julia> a = rand(5, 3)
5×3 Array{Float64,2}:
0.867952 0.815068 0.324292
0.44118 0.977383 0.564194
0.63132 0.0351254 0.444277
0.597816 0.555836 0.32167
0.468644 0.336954 0.893425
julia> mxval, mxindx = findmax(a; dims=2)
([0.8679518267243425; 0.9773828942695064; … ; 0.5978162823947759; 0.8934254589671011], CartesianIndex{2}[CartesianIndex(1, 1); CartesianIndex(2, 2); … ; CartesianIndex(4, 1); CartesianIndex(5, 3)])
julia> mxindx
5×1 Array{CartesianIndex{2},2}:
CartesianIndex(1, 1)
CartesianIndex(2, 2)
CartesianIndex(3, 1)
CartesianIndex(4, 1)
CartesianIndex(5, 3)
According to the Numpy documentation, argmax provides the following functionality:
numpy.argmax(a, axis=None, out=None)
Returns the indices of the maximum values along an axis.
I doubt a single Julia function does that, but combining mapslices and argmax is just the ticket:
julia> a = [ 0.00108039 0.16885304 0.18129883;
0.42661574 0.78217538 0.43942868;
0.34321459 0.53835544 0.72364813;
0.97914267 0.40773394 0.36358753;
0.59639274 0.67640815 0.28126232] :: Array{Float64,2}
julia> mapslices(argmax,a,dims=2)
5x1 Array{Int64,2}:
3
2
3
1
2
Of course, because Julia's array indexing is 1-based (whereas Numpy's array indexing is 0-based), each element of the resulting Julia array is offset by 1 compared to the corresponding element in the resulting Numpy array. You may or may not want to adjust that.
If you want to get a vector rather than a 2D array, you can simply tack [:] at the end of the expression:
julia> b = mapslices(argmax,a,dims=2)[:]
5-element Array{Int64,1}:
3
2
3
1
2
To add to the jub0bs's answer, argmax in Julia 1+ mirrors the behavior of np.argmax, by replacing axis with dims keyword, returning CarthesianIndex instead of index along given dimension:
julia> a = [ 0.00108039 0.16885304 0.18129883;
0.42661574 0.78217538 0.43942868;
0.34321459 0.53835544 0.72364813;
0.97914267 0.40773394 0.36358753;
0.59639274 0.67640815 0.28126232] :: Array{Float64,2}
julia> argmax(a, dims=2)
5×1 Array{CartesianIndex{2},2}:
CartesianIndex(1, 3)
CartesianIndex(2, 2)
CartesianIndex(3, 3)
CartesianIndex(4, 1)
CartesianIndex(5, 2)
Related
I created a Vector of Vectors, named all_arrays in Julia in this way for a specific purpose:
using DataFrames
using StatsBase
list_of_numbers = 1:17
all_arrays = [zeros(Float64, (17,)) for i in 1:1000]
round = 1
while round != 1001
random_array = StatsBase.sample(1:17 , length(list_of_numbers))
random_array = random_array/sum(random_array)
if (0.0 in random_array) || (random_array in all_arrays)
continue
end
all_arrays[round] = random_array
round += 1
println(round)
end
The dimension of all_arrays is:
julia> size(all_arrays)
(1000,)
Then I want to convert all_arrays into a DataFrame with 1000*17 dimensions (Note that each vector in the all_arrays is a (17,) shape Vector). I tried This way:
df = DataFrames.DataFrame(zeros(1000,17) , :auto)
for idx in 1:length(all_arrays)
df[idx , :] = all_arrays[idx]
end
But I'm looking for a straightforward way for this instead of a for loop and a prebuilt DataFrame! Is there any?
If you want simple code use (the length of the code is the same as below, but I find it conceptually simpler):
DataFrame(mapreduce(permutedims, vcat, all_arrays), :auto)
For such small data as you described this should be efficient enough.
If you want something faster use:
DataFrame([getindex.(all_arrays, i) for i in 1:17], :auto, copycols=false)
Here is a benchmark:
julia> using BenchmarkTools
julia> #btime DataFrame(mapreduce(permutedims, vcat, $all_arrays), :auto);
7.257 ms (3971 allocations: 65.22 MiB)
julia> #btime DataFrame([getindex.($all_arrays, i) for i in 1:17], :auto, copycols=false);
41.000 μs (88 allocations: 140.66 KiB)
I have a dataframe (you may see the image of the dataframe from the provided link).
df.shape, type(girdi), type(girdi.iloc[0, 0])
>>>(10292, 5), pandas.core.frame.DataFrame, numpy.ndarray)
Each value in this dataframe is a NumPy array with 55 data points in it (55, ).
df.iloc[0,0]
>>>array([64.75, 65.62, 64.21, 64.62, 63.94, 62.63, 62.24, 62.65, 62.47,
63.17, 63.46, 63.75, 65.41, 65.35, 65.68, 65.97, 66.6 , 66.45,
66.11, 65.48, 64.22, 63.54, 62.81, 63.58, 62.46, 61.23, 62.26,
61.13, 61.68, 61.36, 61.93, 61.48, 61.92, 62.43, 63.37, 62.59,
63.33, 63.52, 63.23, 62.52, 63.03, 63.61, 63.83, 63.7 , 63.94,
65.14, 66. , 66.65, 65.87, 64.93, 65.84, 64.75, 65.5 , 65.7 ,
66.83])
dataframe
When I convert the whole dataframe to a NumPy array NumPy does not recognize the 3rd dimension.
X = np.array(df)
X.shape
>>>(10292, 5)
X.shape, type(X)
>>>((10292, 5), numpy.ndarray)
X[0].shape, type(X[0])
>>>((5,), numpy.ndarray)
X[0, 0].shape, type(X[0, 0])
>>>((55,), numpy.ndarray)
I expect (and desire) to get:
X.shape, X[0, 0, 0]
>>>(10292, 5, 55), 64.75
To access the data, using X[0][0][0] or X[0, 0] [0] helps, though it does not help my needs. I want to access the data with X[0, 0, 0].
I tried using np.vstack or np.expand_dims, though I was unsuccessful. How may I turn the data to (10292, 5, 55) dimensions?
Thank you,
Evrim
I suppose you have to create a new array with the desired dimensions and traverse your matrix of arrays (X)
result = np.zeros(X.shape + X[0, 0].shape)
for i in range(X.shape[0]):
for j in range(X.shape[1]):
result[i, j] = X[i, j]
I have numpy array like this a = [-- -- -- 1.90 2.91 1.91 2.92]
I need to find % of values more than 2, so here it is 50%.
How to get the same in easy way? also, why len(a) gives 7 (instead of 4)?
Try this:
import numpy as np
import numpy.ma as ma
a = ma.array([0, 1, 2, 1.90, 2.91, 1.91, 2.92])
for i in range(3):
a[i] = ma.masked
print(a)
print(np.sum(a>2)/((len(a) - ma.count_masked(a))))
The last line prints 0.5 which is your 50%. It subtracted from the total length of your array (7) the number of masked elements (3) which you see as the three "--" in the output you posted.
Generally speaking, you can simply use
a = np.array([...])
threshold = 2.0
fraction_higher = (a > threshold).sum() / len(a) # in [0, 1)
percentage_higher = fraction_higher * 100
The array contains 7 elements, being 3 of them masked. This code emulates the test case, generating a masked array as well:
# generate the test case: a masked array
a = np.ma.array([-1, -1, -1, 1.90, 2.91, 1.91, 2.92], mask=[1, 1, 1, 0, 0, 0, 0])
# check its format
print(a)
[-- -- -- 1.9 2.91 1.91 2.92]
#print the output
print(a[a>2].count() / a.count())
0.5
I'm trying to build an array of some given shape in which all elements are given by another array. Is there a function in numpy which does that efficiently, similar to np.full(), or any other elegant way, without simply employing for loops?
Example: Let's say I want an array with shape
(dim1,dim2) filled with a given, constant scalar value. Numpy has np.full() for this:
my_array = np.full((dim1,dim2),value)
I'm looking for an analog way of doing this, but I want the array to be filled with another array of shape (filldim1,filldim2) A brute-force way would be this:
my_array = np.array([])
for i in range(dim1):
for j in range(dim2):
my_array = np.append(my_array,fill_array)
my_array = my_array.reshape((dim1,dim2,filldim1,filldim2))
EDIT
I was being stupid, np.full() does take arrays as fill value if the shape is modified accordingly:
my_array = np.full((dim1,dim2,filldim1,filldim2),fill_array)
Thanks for pointing that out, #Arne!
You can use np.tile:
>>> shape = (2, 3)
>>> fill_shape = (4, 5)
>>> fill_arr = np.random.randn(*fill_shape)
>>> arr = np.tile(fill_arr, [*shape, 1, 1])
>>> arr.shape
(2, 3, 4, 5)
>>> np.all(arr[0, 0] == fill_arr)
True
Edit: better answer, as suggested by #Arne, directly using np.full:
>>> arr = np.full([*shape, *fill_shape], fill_arr)
>>> arr.shape
(2, 3, 4, 5)
>>> np.all(arr[0, 0] == fill_arr)
True
I'm trying to access each item in a numpy 2D array.
I'm used to something like this in Python [[...], [...], [...]]
for row in data:
for col in data:
print(data[row][col])
but now, I have a data_array = np.array(features)
How can I iterate through it the same way?
Try np.ndenumerate:
>>> a =numpy.array([[1,2],[3,4]])
>>> for (i,j), value in np.ndenumerate(a):
... print(i, j, value)
...
0 0 1
0 1 2
1 0 3
1 1 4
Make a small 2d array, and a nested list from it:
In [241]: A=np.arange(6).reshape(2,3)
In [242]: alist= A.tolist()
In [243]: alist
Out[243]: [[0, 1, 2], [3, 4, 5]]
One way of iterating on the list:
In [244]: for row in alist:
...: for item in row:
...: print(item)
...:
0
1
2
3
4
5
works just same for the array
In [245]: for row in A:
...: for item in row:
...: print(item)
...:
0
1
2
3
4
5
Now neither is good if you want to modify elements. But for crude iteration over all elements this works.
WIth the array I can easily treat it was a 1d
In [246]: [i for i in A.flat]
Out[246]: [0, 1, 2, 3, 4, 5]
I could also iterate with nested indices
In [247]: [A[i,j] for i in range(A.shape[0]) for j in range(A.shape[1])]
Out[247]: [0, 1, 2, 3, 4, 5]
In general it is better to work with arrays without iteration. I give these iteration examples to clearup some confusion.
If you want to access an item in a numpy 2D array features, you can use features[row_index, column_index]. If you wanted to iterate through a numpy array, you could just modify your script to
for row in data:
for col in data:
print(data[row, col])