Julia: invalid assignment location when creating function to subset dataframe - dataframe

I am creating a function in Julia. It takes a dataframe (called window) and two strings (A and B) as inputs and subsets it using the variables given:
function calcs(window, A, B):
fAB=size(window[(window[:ref].==A).&(window[:alt].==B),:])[1]
end
But I get the error:
syntax: invalid assignment location ":fAB"
Stacktrace:
[1] include_string(::String, ::String) at ./loading.jl:522
I have tried running the code outside of a function (having pre-assigned the variables A="T" and B="C" like so:
fAB=size(window[(window[:ref].==A).&(window[:alt].==B),:])[1]
and this runs fine. I am new to Julia but cannot find an answer to this question. Can anyone help?

Seems you come from Python world. In Julia you do not need to add : in function definition. This will go through fine:
function calcs(window, A, B)
fAB=size(window[(window[:ref].==A).&(window[:alt].==B),:])[1]
end
When Julia encounters : in the first line of function definition it continues parsing the expression in the following line producing :fAB symbol.
EDIT: In Julia 0.7 this problem is detected by the parser. This is the result of copy-pasting your original code to REPL:
julia> function calcs(window, A, B):
fAB=size(window[(window[:ref].==A).&(window[:alt].==B),:])[1]
ERROR: syntax: space not allowed after ":" used for quoting
julia> end
ERROR: syntax: unexpected "end"

Related

Pandas iloc boolean indexing

I am reading pandas doc
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing
there is a warning section that says "Warning
iloc supports two kinds of boolean indexing....." But the text and example only give one valid form: df.iloc[s.values, 1]. Another form is considered error: "df.iloc[s, 1] would raise ValueError"
So I am confused where is the 2nd boolean indexing that is supported by iloc? Maybe these guys call the invalid form as 'supported'?
well according to the documentation:-
df.iloc[s.values,1] gives a valid output because s.values returns an ndarray of boolean values so this is the first form
df[s].iloc[:,1] also gives a valid output "maybe" its the second form.
who knows?
As this is not mentioned in the official documentation
.
.
but in the example given in documentation
I think they are reffering to .loc accessor i.e df.loc[s, 'B'] as second form
because df.loc[s, 'B'] is given as example in the documentation and also it is mentioned that:-
df.iloc[s,1] gives an invalid output i.e it raises error because s returns a series of boolean values and we can't use string indexing in iloc accessor.

Reading in non-consecutive columns using XLSX.gettable?

Is there a way to read in a selection of non-consecutive columns of Excel data using XLSX.gettable? I’ve read the documentation here XLSX.jl Tutorial, but it’s not clear whether it’s possible to do this. For example,
df = DataFrame(XLSX.gettable(sheet,"A:B")...)
selects the data in columns “A” and “B” of a worksheet called sheet. But what if I want columns A and C, for example? I tried
df = DataFrame(XLSX.gettable(sheet,["A","C"])...)
and similar variations of this, but it throws the following error: MethodError: no method matching gettable(::XLSX.Worksheet, ::Array{String,1}).
Is there a way to make this work with gettable, or is there a similar function which can accomplish this?
I don't think this is possible with the current version of XLSX.jl:
If you look at the definition of gettable here you'll see that it calls
eachtablerow(sheet, cols;...)
which is defined here as accepting Union{ColumnRange, AbstractString} as input for the cols argument. The cols argument itself is converted to a ColumnRange object in the eachtablerow function, which is defined here as:
struct ColumnRange
start::Int # column number
stop::Int # column number
function ColumnRange(a::Int, b::Int)
#assert a <= b "Invalid ColumnRange. Start column must be located before end column."
return new(a, b)
end
end
So it looks to me like only consecutive columns are working.
To get around this you should be able to just broadcast the gettable function over your column ranges and then concatenate the resulting DataFrames:
df = reduce(hcat, DataFrame.(XLSX.gettable.(sheet, ["A:B", "D:E"])))
I found that to get #Nils Gudat's answer to work you need to add the ... operator to give
reduce(hcat, [DataFrame(XLSX.gettable(sheet, x)...) for x in ["A:B", "D:E"]])

How to efficiently append a dataframe column with a vector?

Working with Julia 1.1:
The following minimal code works and does what I want:
function test()
df = DataFrame(NbAlternative = Int[], NbMonteCarlo = Int[], Similarity = Float64[])
append!(df.NbAlternative, ones(Int, 5))
df
end
Appending a vector to one column of df. Note: in my whole code, I add a more complicated Vector{Int} than ones' return.
However, #code_warntype test() does return:
%8 = invoke DataFrames.getindex(%7::DataFrame, :NbAlternative::Symbol)::AbstractArray{T,1} where T
Which means I suppose, thisn't efficient. I can't manage to get what this #code_warntype error means. More generally, how can I understand errors returned by #code_warntype and fix them, this is a recurrent unclear issue for me.
EDIT: #BogumiłKamiński's answer
Then how one would do the following code ?
for na in arr_nb_alternative
#show na
for mt in arr_nb_montecarlo
println("...$mt")
append!(df.NbAlternative, ones(Int, nb_simulations)*na)
append!(df.NbMonteCarlo, ones(Int, nb_simulations)*mt)
append!(df.Similarity, compare_smaa(na, nb_criteria, nb_simulations, mt))
end
end
compare_smaa returns a nb_simulations length vector.
You should never do such things as it will cause many functions from DataFrames.jl to stop working properly. Actually such code will soon throw an error, see https://github.com/JuliaData/DataFrames.jl/issues/1844 that is exactly trying to patch this hole in DataFrames.jl design.
What you should do is appending a data frame-like object to a DataFrame using append! function (this guarantees that the result has consistent column lengths) or using push! to add a single row to a DataFrame.
Now the reason you have type instability is that DataFrame can hold vector of any type (technically columns are held in a Vector{AbstractVector}) so it is not possible to determine in compile time what will be the type of vector under a given name.
EDIT
What you ask for is a typical scenario that DataFrames.jl supports well and I do it almost every day (as I do a lot of simulations). As I have indicated - you can use either push! or append!. Use push! to add a single run of a simulation (this is not your case, but I add it as it is also very common):
for na in arr_nb_alternative
#show na
for mt in arr_nb_montecarlo
println("...$mt")
for i in 1:nb_simulations
# here you have to make sure that compare_smaa returns a scalar
# if it is passed 1 in nb_simulations
push!(df, (na, mt, compare_smaa(na, nb_criteria, 1, mt)))
end
end
end
And this is how you can use append!:
for na in arr_nb_alternative
#show na
for mt in arr_nb_montecarlo
println("...$mt")
# here you have to make sure that compare_smaa returns a vector
append!(df, (NbAlternative=ones(Int, nb_simulations)*na,
NbMonteCarlo=ones(Int, nb_simulations)*mt,
Similarity=compare_smaa(na, nb_criteria, nb_simulations, mt)))
end
end
Note that I append here a NamedTuple. As I have written earlier you can append a DataFrame or any data frame-like object this way. What "data frame-like object" means is a broad class of things - in general anything that you can pass to DataFrame constructor (so e.g. it can also be a Vector of NamedTuples).
Note that append! adds columns to a DataFrame using name matching so column names must be consistent between the target and appended object.
This is different in push! which also allows to push a row that does not specify column names (in my example above I show that a Tuple can be pushed).

Julia merging dataframes

i am rather new to julia so apologies if this is a rather trivial question. However I have stumbled upon a problem i cannot resolve so appreciate any help/suggestions.
I have a function which returns a set of values in a dataframe when I run it. Something along the lines of:
(values) = function(data)
However I run this function in
[for example see here][1]
I run this function in a loop so that each time it runs, I get a new set of values.
for x = 1:5
(values) = function(data[data[:sub].==[x], :])
end
After the function runs, I then want to put the values that come back into a "master" data frame which has exactly the same column headings and compiles the values that come back on each loop iteration.
This seems infuriatingly tricky to do. I tried using append! as described here:
https://discourse.julialang.org/t/adding-a-new-row-to-a-dataframe/1331/3
But this does not work. For example if I run the following commands
(values_1) = function(data[data[:sub].==[1], :])
(values_2) = function(data[data[:sub].==[2], :])
append(values_1, values_2)
This fails with the following error message:
MethodError: no method matching append!(::Array{Float64,2}, ::Array{Float64,2})
Closest candidates are:
append!(::PyCall.PyObject, ::Any) at /Users/neil/.julia/v0.5/PyCall/src/PyCall.jl:836
append!(::Array{T,1}, ::Any) at collections.jl:21
append!{T}(::PyCall.PyVector{T}, ::Any) at /Users/neil/.julia/v0.5/PyCall/src/conversions.jl:278
in append!(::DataFrames.DataFrame, ::DataFrames.DataFrame) at /Users/neil/.julia/v0.5/DataFrames/src/dataframe/dataframe.jl:791
in include_string(::String, ::String) at ./loading.jl:441
in include_string(::String, ::String) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
Note that the function does correctly return a dataframe each time I run it, it is just the concatanation of values that is causing problems.
appreciate any pointers.
n
You're looking for vcat, e.g. vcat(values_1, values_2). There might be a better way to do what you're doing, though, but it's hard to give specific advice without an 'MWE', i.e. an example we can paste into a terminal, run and rewrite.

Fminbox Constrained Optimisation Julia

Either fminbox or the Optim.autodiff function appear to create a vector of type Array{Dual{Float64},1} when I run the code below, since I get the error "fbellmanind has no method matching...Array{Dual{Float64},1}". I've specified the function fbellmanind to accept Array{Any,1} but with no luck. Any ideas?
function fbargsolve(x::Vector)
fbellmanind(probc,EV,V,Ind,x,V0,VUnemp0,Vnp,Vp,q,obj,assets,EmpState,i)
fbellmanfirm(probc,poachedwage,minw,x,jfirm1,jfirm0,Ind,i)
#inbounds for ia in 1:na
Vnp[ia]=V[ia]
Indnp[ia]=Ind[ia]
firmratio[ia]=jfirm1[ia]/jfirmres[ia]
hhratio[ia]=((Vnp[ia]-VUnemp0[ia])/(Vp[ia]-VUnemp0[ia]))
end
Crit_bwr=vnormdiff(firmratio,hhratio,Inf)
return Crit_bwr
end
f=fbargsolve
df = Optim.autodiff(f, Float64, na)
x0=vec(bargwage0)
l=vec(max(reswage,minw))
u=vec(poachedwage*ones(na))
sol=fminbox(df,x0,l,u)
Refer to a very important paragraph from Julia doc
Julia’s type parameters are invariant....
You can follow at least these two possible solutions:
1- Change your function declaration, best is to explicitly use right data type Array{Dual{Float64},1} but if you like a generic way:
Use a parametric data type:
julia> function fbellmanind{T}(::Array{T,1})
"OK"
end
julia> fbellmanind(["test"])
"OK"
2- Type cast your arguments
julia> function fbellmanind(::Array{Any,1})
"OK"
end
julia> fbellmanind(Any["test"])
"OK"