How to change array elements that are greater than 5 to 5, in one line? - indexing

I would like to take an array x and change all numbers greater than 5 to 5. What is the standard way to do this in one line?
Below is some code that does this in several lines. This question on logical indexing is related but appears to concern selection rather than assignment.
Thanks
x = [1 2 6 7]
for i in 1:length(x)
if x[i] >= 5
x[i] = 5
end
end
Desired output:
x = [1 2 5 5]

The broadcast operator . works with any function, including relational operators, and it also works with assignment. Hence an intuitive one-liner is:
x[x .> 5] .= 5
This part x .> 5 broadcasts > 5 over x, resulting in a vector of booleans indicating elements greater than 5. This part .= 5 broadcasts the assignment of 5 across all elements indicated by x[x .> 5].
However, inspired by the significant speed-up in Benoit's very cool answer below (please do check it out) I decided to also add an optimized variant with a speed test. The above approach, while very intuitive looking, is not optimal because it allocates a temporary array of booleans for the indices. A (more) optimal approach that avoids temporary allocation, and as a bonus will work for any predicate (conditional) function is:
function f_cond!(x::Vector{Int}, f::Function, val::Int)
#inbounds for n in eachindex(x)
f(x[n]) && (x[n] = val)
end
return x
end
So using this function we would write f_cond!(x, a->a>5, 5) which assigns 5 to any element for which the conditional (anonymous) function a->a>5 evaluates to true. Obviously this solution is not a neat one-liner, but check out the following speed tests:
julia> using BenchmarkTools
julia> x1 = rand(1:10, 100);
julia> x2 = copy(x1);
julia> #btime $x1[$x1 .> 5] .= 5;
327.862 ns (8 allocations: 336 bytes)
julia> #btime f_cond!($x2, a->a>5, 5);
15.067 ns (0 allocations: 0 bytes)
This is just ludicrously faster. Also, you can just replace Int with T<:Any. Given the speed-up, one might wonder if there is a function in Base that already does this. A one-liner is:
map!(a->a>5 ? 5 : a, x, x)
and while this significantly speeds up over the first approach, it falls well short of the second.
Incidentally, I felt certain this must be a duplicate to another StackOverflow question, but 5 minutes searching didn't reveal anything.

You can broadcast min as well:
x .= min.(x, 5)
Note that this is (slightly) more efficient than using x[x .> 5] .= 5 because it does not allocate the temporary array of Booleans, x .> 5, and it can be automatically vectorized, with a single pass over the memory (as per Oscar's comment below):
julia> using BenchmarkTools
julia> x = [1 2 6 7] ; #btime $x .= min.($x, 5) ; # fast, no allocations
19.144 ns (0 allocations: 0 bytes)
julia> x = [1 2 6 7] ; #btime $x[$x .> 5] .= 5 ; # slower, allocates
148.678 ns (5 allocations: 304 bytes)

Related

How to iterate through all non zero values of a sparse matrix and normal matrix

I am using Julia and I want to iterate over the values of a matrix. This matrix can either be a normal matrix or a sparse matrix but I do not have the prior knowledge of that. I would like to create a code that would work in both cases and being optimised for both cases.
For simplicity, I did a example that computes the sum of the vector multiplied by a random value. What I want to do is actually similar to this but instead of being multiplied by a random number is actually an function that takes long time to compute.
myiterator(m::SparseVector) = m.nzval
myiterator(m::AbstractVector) = m
function sumtimesrand(m)
a = 0.
for i in myiterator(m)
a += i * rand()
end
return a
end
I = [1, 4, 3, 5]; V = [1, 2, -5, 3];
Msparse = sparsevec(I,V)
M = rand(5)
sumtimesrand(Msparse)
sumtimesrand(M)
I want my code to work this way. I.e. most of the code is the same and by using the right iterator the code is optimised for both cases (sparse and normal vector).
My question is: is there any iterator that does what I am trying to achieve? In this case, the iterator returns the values but an iterator over the indices would work to.
Cheers,
Dylan
I think you almost had what you are asking for? I.e., change your AbstractVector and SparseVector into AbstractArray and AbstractSparseArray. But maybe I am missing something? See MWE below:
using SparseArrays
using BenchmarkTools # to compare performance
# note the changes here to "Array":
myiterator(m::AbstractSparseArray) = m.nzval
myiterator(m::AbstractArray) = m
function sumtimesrand(m)
a = 0.
for i in myiterator(m)
a += i * rand()
end
return a
end
N = 1000
spV = sprand(N, 0.01); V = Vector(spV)
spM = sprand(N, N, 0.01); M = Matrix(spM)
#btime sumtimesrand($spV); # 0.044936 μs
#btime sumtimesrand($V); # 3.919 μs
#btime sumtimesrand($spM); # 0.041678 ms
#btime sumtimesrand($M); # 4.095 ms

Manipulating data in DataFrame: how to calculate the square of a column

I would like to calculate the square of a column A 1,2,3,4, process it with other calculation store it in column C
using CSV, DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
df.C = ((((df.A./2).^2).*3.14)./1000)
Is there an easier way to write it?
I am not sure how much shorter you would want the formula to be, but you can write:
df.C = #. (df.A / 2) ^ 2 * 3.14 / 1000
to avoid having to write . everywhere.
Or you can use transform!, but it is not shorter (its benefit is that you can uset it in a processing pipeline, e.g. using Pipe.jl):
transform!(df, :A => ByRow(a -> (a / 2) ^ 2 * 3.14 / 1000) => :C)
Try this:
df.D = .5df.A .^2 * 0.00314
Explanation:
not so many parentheses needed
multiplying scalar by vector is here as good as the vectorization for short vectors (up two something like 100 elements)
A simple benchmark using BenchmarkTools:
julia> #btime $df.E = .5*$df.A .^2 * 0.00314;
592.085 ns (9 allocations: 496 bytes)
julia> #btime $df.F = #. ($df.A / 2) ^ 2 * 0.00314;
875.490 ns (11 allocations: 448 bytes)
The fastest is however a longer version where you provide the type information #. (df.A::Vector{Int} / 2) ^ 2 * 0.00314 (again this matters rather for short DataFrames and note that here the Z column must exist so we create it here):
julia> #btime begin $df.Z = Vector{Float64}(undef, nrow(df));#. $df.Z = ($df.A::Vector{Int} / 2.0) ^ 2.0 * 0.00314; end;
162.564 ns (3 allocations: 208 bytes)

Julia InexactError: Int64

I am new to Julia. Got this InexactError . To mention that I have tried to convert to float beforehand and it did not work, maybe I am doing something wrong.
column = df[:, i]
max = maximum(column)
min = minimum(column)
scaled_column = (column .- min)/max # This is the error, I think
df[:, i] = scaled_column
julia> VERSION
v"1.4.2"
Hard to give a sure answer without a minimal working example of the problem, but in general an InexactError happens when you try to convert a value to an exact type (like integer types, but unlike floating-point types) in which the original value cannot be exactly represented. For example:
julia> convert(Int, 1.0)
1
julia> convert(Int, 1.5)
ERROR: InexactError: Int64(1.5)
Other programming languages arbitrarily chose some way of rounding here (often truncation but sometimes rounding to nearest). Julia doesn't guess and requires you to be explicit. If you want to round, truncate, take a ceiling, etc. you can:
julia> floor(Int, 1.5)
1
julia> round(Int, 1.5)
2
julia> ceil(Int, 1.5)
2
Back to your problem: you're not calling convert anywhere, so why are you getting a conversion error? There are various situations where Julia will automatically call convert for you, typically when you try to assign a value to a typed location. For example, if you have an array of Ints and you assign a floating point value into it, it will be converted automatically:
julia> v = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> v[2] = 4.0
4.0
julia> v
3-element Array{Int64,1}:
1
4
3
julia> v[2] = 4.5
ERROR: InexactError: Int64(4.5)
So that's likely what's happening to you: you get non-integer values by doing (column .- min)/max and then you try to assign it into an integer location and you get the error.
As a side note you can use transform! to achieve what you want like this:
transform!(df, i => (x -> (x .- minimum(x)) ./ maximum(x)) => i)
and this operation will replace the column.

Octave: summing indexed elements

The easiest way to describe this is via example:
data = [1, 5, 3, 6, 10];
indices = [1, 2, 2, 2, 4];
result = zeroes(1, 5);
I want result(1) to be the sum of all the elements in data whose index is 1, result(2) to be the sum of all the elements in data whose index is 2, etc.
This works but is really slow when applied (changing 5 to 65535) to 64K element vectors:
result = result + arrayfun(#(x) sum(data(index==x)), 1:5);
I think it's creating 64K vectors with 64K elements that's taking up the time. Is there a faster way to do this? Or do I need to figure out a completely different approach?
for i = [1:5]
idx = indices(i);
result(idx) = result(idx) + data(i);
endfor
But that's a very non-octave-y way to do it.
Seeing how MATLAB is very similar to Octave, I will provide an answer that was tested on MATLAB R2016b. Looking at the documentation of Octave 4.2.1 the syntax should be the same.
All you need to do is this:
result = accumarray(indices(:), data(:), [5 1]).'
Which gives:
result =
1 14 0 10 0
Reshaping to a column vector (arrayName(:) ) is necessary because of the expected inputs to accumarray. Specifying the size as [5 1] and then transposing the result was done to avoid some MATLAB error.
accumarray is also described in depth in the MATLAB documentation

How to specify the format for printing an array of Floats in julia?

I have an array or matrix that I want to print, but only to three digits of precision. How do I do that. I tried the following.
> #printf("%.3f", rand())
0.742
> #printf("%.3f", rand(3))
LoadError: TypeError: non-boolean (Array{Bool,1}) used in boolean context
while loading In[13], in expression starting on line 1
Update: Ideally, I just want to call a function like printx("{.3f}", rand(m, n)) without having to further process my array or matrix.
The OP said:
Update: Ideally, I just want to call a function like printx("{.3f}", rand(m, n)) without having to further process my array or matrix.
This answer to a similar questions suggests something like this:
julia> VERSION
v"1.0.0"
julia> using Printf
julia> m = 3; n = 5;
julia> A = rand(m, n)
3×5 Array{Float64,2}:
0.596055 0.0574471 0.122782 0.829356 0.226897
0.606948 0.0312382 0.244186 0.356534 0.786589
0.147872 0.61846 0.494186 0.970206 0.701587
# For this session of the REPL, redefine show function. Next REPL will be back to normal.
# Note %1.3f% spec for printf format string to get 3 digits to right of decimal.
julia> Base.show(io::IO, f::Float64) = #printf(io, "%1.3f", f)
# Now we have the 3 digits to the right spec working in the REPL.
julia> A
3×5 Array{Float64,2}:
0.596 0.057 0.123 0.829 0.227
0.607 0.031 0.244 0.357 0.787
0.148 0.618 0.494 0.970 0.702
# The print function prints with 3 decimals as well, but note the semicolons for rows.
# This may not be what was wanted either, but could have a use.
julia> print(A)
[0.596 0.057 0.123 0.829 0.227; 0.607 0.031 0.244 0.357 0.787; 0.148 0.618 0.494 0.970 0.702]
How about this?
julia> print(round.(rand(3); digits=3))
[0.188,0.202,0.237]
I would do it this way:
julia> using Printf
julia> map(x -> #sprintf("%.3f",x), rand(3))
3-element Array{String,1}:
"0.471"
"0.252"
"0.090"
I don't think #printf accepts a list of arguments as you might be expecting.
One solution you could try it to use #sprintf to create formatted strings, but collect them up in a list comprehension. You might then use join to concatenate them together like so:
join([#sprintf "%3.2f" x for x in rand(3)], ", ")