To convert a CartesianIndex, such as CartesianIndex(1,2) to a LinearIndex, I can use the LinearIndeces function:
julia> a = rand(2,2)
2×2 Array{Float64,2}:
0.57097 0.0647051
0.767868 0.531104
julia> I = LinearIndices(a)
2×2 LinearIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}}:
1 3
2 4
julia> I[CartesianIndex(1,2)]
3
However, how do I get the LinearIndex integer 3 for CartesianIndex(1,2) without constructing the instance of the array a? Assuming I know the ranges for the CartesianIndex, 1:2, 1:2.
Just use LinearIndices with a tuple of the axes (or even just a tuple of dimension sizes):
julia> LinearIndices((1:2,1:2))
2×2 LinearIndices{2,Tuple{UnitRange{Int64},UnitRange{Int64}}}:
1 3
2 4
julia> LinearIndices((1:2,1:2))[1,2]
3
Related
What is pandas' transpose equivalent in Julia? thanks
I like to transpose a data frame and transpose function isn't working.
It is permutedims, it turns a data frame on its side such that rows become columns and values in the column become the names.
Note the difference between transpose and permutedims in Julia Base. permutedims only affects the outermost container. transpose is recursive.
Here are the consequences:
julia> x = [randstring(3) for _ in 1:3, _ in 1:3]
3×3 Matrix{String}:
"nTa" "QBM" "3dJ"
"RsL" "mD1" "3jq"
"dFp" "bfB" "k6P"
julia> permutedims(x)
3×3 Matrix{String}:
"nTa" "RsL" "dFp"
"QBM" "mD1" "bfB"
"3dJ" "3jq" "k6P"
julia> transpose(x)
3×3 transpose(::Matrix{String}) with eltype Union{}:
Error showing value of type LinearAlgebra.Transpose{Union{}, Matrix{String}}:
ERROR: MethodError: no method matching transpose(::String)
julia> y = [rand(3) for _ in 1:3, _ in 1:3]
3×3 Matrix{Vector{Float64}}:
[0.446435, 0.653228, 0.0857836] [0.378189, 0.505487, 0.0504642] [0.0570918, 0.462984, 0.800813]
[0.801857, 0.75505, 0.714087] [0.253316, 0.458364, 0.80242] [0.93742, 0.699745, 0.140957]
[0.419783, 0.22946, 0.748267] [0.445365, 0.563222, 0.363561] [0.088825, 0.0869342, 0.311187]
julia> permutedims(y)
3×3 Matrix{Vector{Float64}}:
[0.446435, 0.653228, 0.0857836] [0.801857, 0.75505, 0.714087] [0.419783, 0.22946, 0.748267]
[0.378189, 0.505487, 0.0504642] [0.253316, 0.458364, 0.80242] [0.445365, 0.563222, 0.363561]
[0.0570918, 0.462984, 0.800813] [0.93742, 0.699745, 0.140957] [0.088825, 0.0869342, 0.311187]
julia> transpose(y) # note that inside we have 1x3 objects not vectors
3×3 transpose(::Matrix{Vector{Float64}}) with eltype LinearAlgebra.Transpose{Float64, Vector{Float64}}:
[0.446435 0.653228 0.0857836] [0.801857 0.75505 0.714087] [0.419783 0.22946 0.748267]
[0.378189 0.505487 0.0504642] [0.253316 0.458364 0.80242] [0.445365 0.563222 0.363561]
[0.0570918 0.462984 0.800813] [0.93742 0.699745 0.140957] [0.088825 0.0869342 0.311187]
In DataFrames.jl we decided that this recursive behavior (which makes sense in linear algebra context) is not desirable. You can even read this in the docstring of transpose which states:
This operation is intended for linear algebra usage - for general data manipulation see permutedims, which is non-recursive.
Additionally in DataFrames.jl permutedims requires you to specify the column which will become column names after the operation (this requirement is DataFrames.jl specific) and you need to be careful as eltype promotion is performed (this issue is not visible for matrices which have a common eltype for all elements, while in a data frame each column might have a diffeent eltype):
julia> df1 = DataFrame(rowkey=["x", "y"], b=[1.0, 2.0], c=[3, 4], d=[true, false])
2×4 DataFrame
Row │ rowkey b c d
│ String Float64 Int64 Bool
─────┼───────────────────────────────
1 │ x 1.0 3 true
2 │ y 2.0 4 false
julia> df2 = permutedims(df1, :rowkey)
3×3 DataFrame
Row │ rowkey x y
│ String Float64 Float64
─────┼──────────────────────────
1 │ b 1.0 2.0
2 │ c 3.0 4.0
3 │ d 1.0 0.0
julia> permutedims(df2, :rowkey)
2×4 DataFrame
Row │ rowkey b c d
│ String Float64 Float64 Float64
─────┼───────────────────────────────────
1 │ x 1.0 3.0 1.0
2 │ y 2.0 4.0 0.0
New to Julia. I'm working on a correlation matrix. I've converted it into a dataframe to include feature names. To find which features are highly correlated, I need names of the features and its value.
I get the value using the following:
corr_matrix_df=cor(Matrix(df))
idx_hcorr=findall(x->abs.(x)>0.6, corr_matrix_df)
But I dont know how to get column names.
If I short it columnwise, the feature rows will shuffle up incorrectly. Any ideas?
Here is how you can do it:
julia> using DataFrames, Random
julia> Random.seed!(1234)
MersenneTwister(1234)
julia> df = DataFrame(rand(5, 5), :auto)
5×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────
1 │ 0.590845 0.854147 0.648882 0.112486 0.950498
2 │ 0.766797 0.200586 0.0109059 0.276021 0.96467
3 │ 0.566237 0.298614 0.066423 0.651664 0.945775
4 │ 0.460085 0.246837 0.956753 0.0566425 0.789904
5 │ 0.794026 0.579672 0.646691 0.842714 0.82116
julia> using Statistics
julia> cm = cor(Matrix(df))
5×5 Matrix{Float64}:
1.0 0.101686 -0.420953 0.562488 0.2127
0.101686 1.0 0.378276 0.00772785 0.100182
-0.420953 0.378276 1.0 -0.327604 -0.791489
0.562488 0.00772785 -0.327604 1.0 -0.0746962
0.2127 0.100182 -0.791489 -0.0746962 1.0
julia> high = findall(x -> abs(x) > 0.6, cm)
7-element Vector{CartesianIndex{2}}:
CartesianIndex(1, 1)
CartesianIndex(2, 2)
CartesianIndex(3, 3)
CartesianIndex(5, 3)
CartesianIndex(4, 4)
CartesianIndex(3, 5)
CartesianIndex(5, 5)
julia> [[names(df, idx.I[1]); names(df, idx.I[2])] for idx in high]
7-element Vector{Vector{String}}:
["x1", "x1"]
["x2", "x2"]
["x3", "x3"]
["x5", "x3"]
["x4", "x4"]
["x3", "x5"]
["x5", "x5"]
is this what you wanted? (I added one step after your last step)
I'm trying to use kproto() function from R package clustMixType to cluster mixed-type data in Julia, but I'm getting error No numeric variables in x! Try using kmodes() from package.... My data should have 3 variables: 2 continuous and 1 categorical. It seems after I used DataFrame() all the variables became categorical. Is there a way to avoid changing the variables type after using DataFrame() so that I have mixed-type data (continuous and categorical) to use kproto()?
using RCall
#rlibrary clustMixType
# group 1 variables
x1=rand(Normal(0,3),10)
x2=rand(Normal(1,2),10)
x3=["1","1","2","2","0","1","1","2","2","0"]
g1=hcat(x1,x2,x3)
# group 2 variables
y1=rand(Normal(0,4),10)
y2=rand(Normal(-1,6),10)
y3=["1","1","2","1","1","2","2","0","2","0"]
g2=hcat(y1,y2,y3)
#create the data
df0=vcat(g1,g2)
df1 = DataFrame(df0)
#use R function
R"kproto($df1, 2)"
I don't know anything about the R package and what kind of input it expects, but the issue is probably how you construct the data matrix from which you construct your DataFrame, not the DataFrame constructor itself.
When you concatenate a numerical and a string column, Julia falls back on the element type Any for the resulting matrix:
julia> g1=hcat(x1,x2,x3)
10×3 Matrix{Any}:
0.708309 -4.84767 "1"
0.566883 -0.214217 "1"
...
That means your df0 matrix is:
julia> #create the data
df0=vcat(g1,g2)
20×3 Matrix{Any}:
0.708309 -4.84767 "1"
0.566883 -0.214217 "1"
...
and the DataFrame constructor will just carry this lack of type information through rather than trying to infer column types.
julia> DataFrame(df0)
20×3 DataFrame
Row │ x1 x2 x3
│ Any Any Any
─────┼───────────────────────────
1 │ 0.708309 -4.84767 1
2 │ 0.566883 -0.214217 1
...
A simple way of getting around this is to just not concatenate your columns into a single matrix, but to construct the DataFrame from the columns:
julia> DataFrame([vcat(x1, y1), vcat(x2, y2), vcat(x3, y3)])
20×3 DataFrame
Row │ x1 x2 x3
│ Float64 Float64 String
─────┼───────────────────────────────
1 │ 0.708309 -4.84767 1
2 │ 0.566883 -0.214217 1
...
As you can see, we now have two Float64 numerical columns x1 and x2 in the resulting DataFrame.
As an addition to the nice answer by Nils (as the problem is indeed when a matrix is constructed not when DataFrame is created) there is this little trick:
julia> df = DataFrame([1 1.0 "1"; 2 2.0 "2"], [:int, :float, :string])
2×3 DataFrame
Row │ int float string
│ Any Any Any
─────┼────────────────────
1 │ 1 1.0 1
2 │ 2 2.0 2
julia> identity.(df)
2×3 DataFrame
Row │ int float string
│ Int64 Float64 String
─────┼────────────────────────
1 │ 1 1.0 1
2 │ 2 2.0 2
I want to return a logical vector showing the location of strings that are members of two string arrays A and B.
In Matlab, this would be
A = ["me","you","us"]
B = ["me","us"]
myLogicalVector = ismember(A,B)
myLogicalVector =
1×3 logical array
1 0 1
How do I achieve this in Julia?
I have tried
myLogicalVector = occursin.(A,B)
myLogicalVector = occursin(A,B)
It seems that occursin only works if the two input string arrays are of the same length or one string is a scalar - I am not sure if I am correct on this one.
You can write:
julia> in(B).(A)
3-element BitArray{1}:
1
0
1
more verbose versions of similar operation are (note that the type of array is different in all cases except the first):
julia> in.(A, Ref(B))
3-element BitArray{1}:
1
0
1
julia> [in(a, B) for a in A]
3-element Array{Bool,1}:
1
0
1
julia> map(a -> in(a, B), A)
3-element Array{Bool,1}:
1
0
1
julia> map(a -> a in B, A)
3-element Array{Bool,1}:
1
0
1
julia> [a in B for a in A]
3-element Array{Bool,1}:
1
0
1
If A and B were large and you needed performance then convert B to a Set like this:
in(Set(B)).(A)
(you pay one time cost of creation of the set, bu then the lookup will be faster)
I'd like to get the number of rows of a dataframe.
I can achieve that with size(myDataFrame)[1].
Is there a cleaner way ?
If you are using DataFrames specifically, then you can use nrow():
julia> df = DataFrame(Any[1:10, 1:10]);
julia> nrow(df)
10
Alternatively, you can specify the dimension argument for size:
julia> size(df, 1)
10
This also work for arrays as well so it's a bit more general:
julia> my_array = rand(4, 3)
4×3 Array{Float64,2}:
0.980798 0.873643 0.819478
0.341972 0.34974 0.160342
0.262292 0.387406 0.00741398
0.512669 0.81579 0.329353
julia> size(my_array, 1)
4