I have a vector (a Base.KeySet to be precise) and I would like to iterate over all pairs of items in the vector. The simplest attempt is a nested loop
for y1 in 1:length(col)
for y2 in y1+1:length(col)
# do work with col[y1] and col[y2]
end
end
(Note that I do not care about the order of the indices within each pair, so I skip e.g. (2,1) because that's redundant with (1,2).)
I learnt I can use eachindex(col) to possibly be more efficient (or at least make the code easier to read) instead of 1:length(col) for the outer loop, but it doesn't help me for the inner loop.
Is there a better way than using the manual loop above?
You want the combinations function from Combinatorics.jl
help?> combinations
search: combinations multiset_combinations CoolLexCombinations with_replacement_combinations
combinations(a, n)
Generate all combinations of n elements from an indexable object a. Because the number of
combinations can be very large, this function returns an iterator object. Use
collect(combinations(a, n)) to get an array of all combinations.
──────────────────────────────────────────────────────────────────────────────────────────────
combinations(a)
Generate combinations of the elements of a of all orders. Chaining of order iterators is
eager, but the sequence at each order is lazy.
This method uses getindex, so you have to turn your Base.KeySet into an Array of whatever your key type is before doing the loop.
julia> d = Dict(:a => 1, :b => 2, :c => 3)
Dict{Symbol,Int64} with 3 entries:
:a => 1
:b => 2
:c => 3
julia> for (i, j) in combinations(collect(keys(d)), 2)
#info "Key combination" i=i j=j
end
┌ Info: Key combination
│ i = :a
└ j = :b
┌ Info: Key combination
│ i = :a
└ j = :c
┌ Info: Key combination
│ i = :b
└ j = :c
Provided type of items in the keyset support isless, this is reasonably clean code
# typeof(col) <: Base.KeySet
for y1 in col
for y2 in col
y1 < y2 || continue
# do work with y1 and y2
end
end
Related
I'm trying to create a categorical variable based on ranges of values from other (numerical) column. However, the code don't work when I have missings in the numerical column
Here is a replicable example:
using RDatasets;
using DataFrames;
using Pipe;
using FreqTables;
df = dataset("datasets","iris")
#lowercase columns just for convenience
#pipe df |> rename!(_, [lowercase(k) for k in names(df)]);
#without this line, the code works fine
#pipe df |> allowmissing!(_, :sepallength) |> replace!(_.sepallength, 4.9 => missing);
df[:size] = #. ifelse(df[:sepallength]<=4.7, "small", missing)
df[:size] = #. ifelse((df[:sepallength]>4.7) & (df[:sepallength]<=4.9), "avg", df[:size])
df[:size] = #. ifelse((df[:sepallength]>4.9) & (df[:sepallength]<=5), "large", df[:size])
df[:size] = #. ifelse(df[:sepallength]>5, "huge", df[:size])
println(#pipe df |> freqtable(_, :size))
Output:
TypeError: non-boolean (Missing) used in boolean context
I would like to ignore the missing cases in the numerical variable but I cannot just drop de missings because this will drop other important informations in my dataset. Moreover, if I drop just the missings in sepallength the column df[:size] would have a different length than the original dataframe.
Use the coalesce function like this:
julia> x = [1,2,3,missing,5,6,7]
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
5
6
7
julia> #. ifelse(coalesce(x < 4.7, false), "small", missing)
7-element Array{Union{Missing, String},1}:
"small"
"small"
"small"
missing
missing
missing
missing
As a side note do not write df[:size] (this syntax has been deprecated for over 2 years now and soon it will error) but rather df.size or df."size" to access the column of the data frame (the df."size" is for cases when your column names contain characters like spaces etc., e.g. df."my fancy column!").
I think Bogumil's approach is correct and probably best for most situations, but one other option that I like to use is to define my own comparison operators that can deal with missings by returning false if a missing is encountered. Using the unicode capabilities of Julia makes this quite pleasant in my opinion:
julia> ==ₘ(x, y) = ismissing(x) | ismissing(y) ? false : x == y;
julia> >=ₘ(x, y) = ismissing(x) | ismissing(y) ? false : x >= y;
julia> <=ₘ(x, y) = ismissing(x) | ismissing(y) ? false : x <= y;
julia> <ₘ(x, y) = ismissing(x) | ismissing(y) ? false : x < y;
julia> >ₘ(x, y) = ismissing(x) | ismissing(y) ? false : x > y;
julia> x = rand([missing; 1:10], 50)
julia> x .> 10
50-element Array{Union{Missing, Bool},1}
...
julia> x .>ₘ 10
50-element BitArray{1}
...
There are of course downsides to defining such an elementary operator in your own code, particularly using Unicode as well, in terms of your code being harder for other people to read (and potentially even to display correctly!), so I probably wouldn't advocate for this as the standard approach, or something to be used in library code. I do think though that for explorative work it makes life easier.
I feel I'm writing functions needlessly for the following operation of setting several derived columns sequentially:
(defn add-cols[d]
(do
(setv (get d "col0") "0")
(setv (get d "col1") (np.where (> 0 (get d "existing-col")) -1 1))
(setv (get d "col2") (* (get d "col1") (get d "existing-col")))
d
))
The above is neither succinct nor easy to follow. I'd appreciate any help with converting this pattern to a macro. I'm a beginner with macros but am thinking of creating something like so :
(pandas-addcols d
`col0 : "0",
`col1 : (np.where ( > 0 `existing-col) -1 1),
`col2 : (* `col1 `existing-col))
Would appreciate any help or guidance on the above. The final form of the macro can obviously be different too. Ultimately the most repetitive bit is the multiple "setv" and "get" calls and maybe there are more elegant a generic ways to remove those calls.
A little syntactic sugar that can help is to use a shorter name for get and remove the need to quote the string literal. Here's a simple version of $ from this library. Also, Hy's setv already lets you provide more than one target–value pair.
(import
[numpy :as np]
[pandas :as pd])
(defmacro $ [obj key]
(import [hy [HyString]])
`(get (. ~obj loc) (, (slice None) ~(HyString key))))
(setv
d (pd.DataFrame (dict :a [-3 1 3] :b [4 5 6]))
($ d col0) 0
($ d col1) (np.where (> 0 ($ d a)) -1 1))
I'm trying to set up a ampl model which clusters given points in a 2-dimensional space according to the model of Saglam et al(2005). For testing purposes I want to generate randomly some datapoints and then calculate the euclidian distance matrix for them (since I need this one). I'm aware that I could only make the distance matrix without the data points but in a later step the data points will be given and then I need to calculate the distances between each the points.
Below you'll find the code I've written so far. While loading the model I keep getting the error message "i is not defined". Since i is a subscript that should run over x1 and x1 is a parameter which is defined over the set D and have one subscript, I cannot figure out why this code should be invalid. As far as I understand, I don't have to define variables if I use them only as subscripts?
reset;
# parameters to define clustered
param m; # numbers of data points
param n; # numbers of clusters
# sets
set D := 1..m; #points to be clustered
set L := 1..n; #clusters
# randomly generate datapoints
param x1 {D} = Uniform(1,m);
param x2 {D} = Uniform(1,m);
param d {D,D} = sqrt((x1[i]-x1[j])^2 + (x2[i]-x2[j])^2);
# variables
var x {D, L} binary;
var D_l {L} >=0;
var D_max >= 0;
#minimization funcion
minimize max_clus_dis: D_max;
# constraints
subject to C1 {i in D, j in D, l in L}: D_l[l] >= d[i,j] * (x[i,l] + x[j,l] - 1);
subject to C2 {i in D}: sum{l in L} x[i,l] = 1;
subject to C3 {l in L}: D_max >= D_l[l];
So far I tried to change the line form param x1 to
param x1 {i in D, j in D} = ...
as well as
param d {x1, x2} = ...
Alas, nothing of this helped. So, any help someone can offer is deeply appreciated. I searched the web but I found nothing useful for my task.
I found eventually what was missing. The line in which I calculated the parameter d should be
param d {i in D, j in D} = sqrt((x1[i]-x1[j])^2 + (x2[i]-x2[j])^2);
Retrospectively it's clear that the subscripts i and j should have been mentioned on the line, I don't know how I could miss that.
I'd like to solve the following two coupled differential equations numerically:
d/dt Phi_i = 1 - 1/N * \sum_{j=1}^N( k_{ij} sin(Phi_i - Phi_j + a)
d/dt k_{ij} = - epsilon * (sin(Phi_i - Phi_j + b) + k_{ij}
with defined starting conditions phi_0 (1-dim array with N entries) and k_0 (2-dim array with NxN entries)
I tried this: Using DifferentialEquations.js, build a matrix of initial starting conditions u0 = hcat(Phi_0, k_0) (2-dim array, Nx(N+1)), and somehow define that the first equation applies to to first column (in my code [:,1]) , and the second equation applies to the other columns (in my code [:,2:N+1]).
using Distributions
using DifferentialEquations
N = 100
phi0 = rand(N)*2*pi
k0 = rand(Uniform(-1,1), N,N)
function dynamics(du, u, p, t)
a = 0.3*pi
b = -0.53*pi
epsi = 0.01
du[:,1] .= 1 .- 1/N .* sum.([u[i,j+1] * sin(u[i,1] - u[j,1] + a) for i in 1:N, j in 1:N], dims=2)
du[:,2:N+1] .= .- epsi .* [sin(u[i,1] - u[j,1] + b) + u[i,j+1] for i in 1:N, j in 1:N]
end
u0 = hcat(phi0, k0)
tspan = (0.0, 200.0)
prob = ODEProblem(dynamics, u0, tspan)
sol = solve(prob)
Running this lines of code result in this error:
LoadError: DimensionMismatch ("cannot broadcast array to have fewer dimensions")in expression starting at line 47 (which is sol = solve(prob))
I'm new to Julia, and I'm not sure if im heading in the right direction with this. Please help me!
First of all, edit the first package, which is Distributions and not Distribution, it took me a while to find the error xD
The main problem is the .= in your first equation. When you do that, you don't just assign new values to an array, you're making a view. I cannot explain you exactly what is a view, but what I can tell you is that, when you this kind of assign, the left and right side must have the same type.
For example:
N = 100
u = rand(N,N+1)
du = rand(N,N+1)
julia> u[:,1] .= du[:,1]
100-element view(::Array{Float64,2}, :, 1) with eltype Float64:
0.2948248997313967
0.2152933893895821
0.09114453738716022
0.35018616658607926
0.7788869975259098
0.2833659299216609
0.9093344091412392
...
The result is a view and not a Vector. With this syntax, left and right sides must have same type, and that does not happen in your example. Note that the types of rand(5) and rand(5,1) are different in Julia: the first is an Array{Float64,1} and the other is Array{Float64,2}. In your code, d[:,1] is an Array{Float64,1} but 1 .- 1/N .* sum.([u[i,j+1] * sin(u[i,1] - u[j,1] + a) for i in 1:N, j in 1:N], dims=2) is an Array{Float64,2}, that's why it doesn't work. You have two choices, change the equal sign for:
du[:,1] = ...
Or:
du[:,1] .= 1 .- 1/N .* sum.([u[i,j+1] * sin(u[i,1] - u[j,1] + a) for i in 1:N, j in 1:N], dims=2)[:,1]
The first choice is just a basic assign, the second choice uses the view way and matches the types of both sides.
I want to solve the following problem in Haskell:
Let n be a natural number and let A = [d_1 , ..., d_r] be a set of positive numbers.
I want to find all the positive solutions of the following equation:
n = Sum d_i^2 x_i.
For example if n= 12 and the set A= [1,2,3]. I would like to solve the following equation over the natural numbers:
x+4y+9z=12.
It's enough to use the following code:
[(x,y,z) | x<-[0..12], y<-[0..12], z<-[0..12], x+4*y+9*z==12]
My problem is if n is not fixed and also the set A are not fixed. I don't know how to "produce" a certain amount of variables indexed by the set A.
Instead of a list-comprehension you can use a recursive call with do-notation for the list-monad.
It's a bit more tricky as you have to handle the edge-cases correctly and I allowed myself to optimize a bit:
solve :: Integer -> [Integer] -> [[Integer]]
solve 0 ds = [replicate (length ds) 0]
solve _ [] = []
solve n (d:ds) = do
let maxN = floor $ fromIntegral n / fromIntegral (d^2)
x <- [0..maxN]
xs <- solve (n - x * d^2) ds
return (x:xs)
it works like this:
It's keeping track of the remaining sum in the first argument
when there this sum is 0 where are obviously done and only have to return 0's (first case) - it will return a list of 0s with the same length as the ds
if the remaining sum is not 0 but there are no d's left we are in trouble as there are no solutions (second case) - note that no solutions is just the empty list
in every other case we have a non-zero n (remaining sum) and some ds left (third case):
now look for the maximum number that you can pick for x (maxN) remember that x * d^2 should be <= n so the upper limit is n / d^2 but we are only interested in integers (so it's floor)
try all from x from 0 to maxN
look for all solutions of the remaining sum when using this x with the remaining ds and pick one of those xs
combine x with xs to give a solution to the current subproblem
The list-monad's bind will handle the rest for you ;)
examples
λ> solve 12 [1,2,3]
[[0,3,0],[3,0,1],[4,2,0],[8,1,0],[12,0,0]]
λ> solve 37 [2,3,4,6]
[[3,1,1,0],[7,1,0,0]]
remark
this will fail when dealing with negative numbers - if you need those you gonna have to introduce some more cases - I'm sure you figure them out (it's really more math than Haskell at this point)
Some hints:
Ultimately you want to write a function with this signature:
solutions :: Int -> [Int] -> [ [Int] ]
Examples:
solutions 4 [1,2] == [ [4,0], [0,1] ]
-- two solutions: 4 = 4*1^2 + 0*2^2, 4 = 0*1^2 + 1*2^2
solutions 22 [2,3] == [ [1,2] ]
-- just one solution: 22 = 1*2^2 + 2*3^2
solutions 10 [2,3] == [ ]
-- no solutions
Step 2. Define solutions recursively based on the structure of the list:
solutions x [a] = ...
-- This will either be [] or a single element list
solutions x (a:as) = ...
-- Hint: you will use `solutions ... as` here