How to show field values in Julia - oop

I was wondering if there is a possibility to show field values in Julia.
For example, this Python program, gets the object variable wealth from the consumer class:
class Consumer:
def __init__(self, w):
"Initialize consumer with w dollars of wealth"
self.wealth = w
def earn(self, y):
"The consumer earns y dollars"
self.wealth += y
def spend(self, x):
"The consumer spends x dollars if feasible"
new_wealth = self.wealth - x
if new_wealth < 0:
print("Insufficent funds")
else:
self.wealth = new_wealth
c1.wealthc1 = Consumer(10) # Create instance with initial wealth 10
c1.spend(5)
c1.wealth
The wealth variable is 5. I want to know how can I translate this code to Julia.

The simplest approach is pretty much like Python:
mutable struct Consumer
wealth
end
function earn(c::Consumer, y)
c.wealth += y
end
function spend(c::Consumer, y)
c.wealth -= y
end
And now you can use it like:
julia> c1 = Consumer(10)
Consumer(10)
julia> spend(c1, 5)
5
julia> c1.wealth
5
You can read more about it here.
But probably in Julia you would write it like:
mutable struct ConsumerTyped{T<:Real}
wealth::T
end
function earn(c::ConsumerTyped, y)
c.wealth += y
end
function spend(c::ConsumerTyped, y)
c.wealth -= y
end
Which on surface will work almost the same. The difference is T which specifies the type of wealth. There are two benefits: you get type control in your code and the functions will run faster.
Given such a definition the only thing you need to know is that the constructor can be called in two flavors:
c2 = ConsumerTyped{Float64}(10) # explicitly specifies T
c3 = ConsumerTyped(10) # T implicitly derived from the argument
Now let us compare the performance of both types:
julia> using BenchmarkTools
julia> c1 = Consumer(10)
Consumer(10)
julia> c2 = ConsumerTyped(10)
ConsumerTyped{Int64}(10)
julia> #benchmark spend(c1, 1)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 56.434 ns (0.00% GC)
median time: 57.376 ns (0.00% GC)
mean time: 60.126 ns (0.84% GC)
maximum time: 847.942 ns (87.69% GC)
--------------
samples: 10000
evals/sample: 992
julia> #benchmark spend(c2, 1)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 29.858 ns (0.00% GC)
median time: 30.791 ns (0.00% GC)
mean time: 32.835 ns (1.63% GC)
maximum time: 966.188 ns (90.20% GC)
--------------
samples: 10000
evals/sample: 1000
and you see that you get ~2x speedup.

Julia doesn't support classes (in terms of OOP).
However, there are composite types which can represent the variables of your class:
type Consumer
wealth::Float64
end
Now, since Julia doesn't support classes, all methods have to live outside this type which allows one of the key features of Julia, multiple dispatch, to also work with user-defined types. (https://docs.julialang.org/en/stable/manual/methods/, https://www.juliabloggers.com/julia-in-ecology-why-multiple-dispatch-is-good/)
Hence, you would have to add a method like this:
function earn!(consumer::Consumer, y::Float64)
println("The consumer earns y dollars")
consumer.wealth = consumer.wealth + y
end
(Similarly, the spend function can be implemented.)

Related

systemfit 3SLS Testing for Overidentification Restrictions

currently I'm struggling to find a good way to perform the Hansen/Sargan tests of Overidentification restrictions within a Three-Stage Least Squares model (3SLS) in panel data using R. I was digging the whole day in different networks and couldn't find a way of depicting the tests in R using the well-known systemfit package.
Currently, my code is simple.
violence_c_3sls <- Crime ~ ln_GDP +I(ln_GDP^2) + ln_Gini
income_c_3sls <-ln_GDP ~ Crime + ln_Gini
gini_c_3sls <- ln_Gini ~ ln_GDP + I(ln_GDP^2) + Crime
inst <- ~ Educ_Gvmnt_Exp + I(Educ_Gvmnt_Exp^2)+ Health_Exp + Pov_Head_Count_1.9
system_c_3sls <- list(violence_c_3sls, income_c_3sls, gini_c_3sls)
fitsur_c_3sls <-systemfit(system_c_3sls, "3SLS",inst=inst, data=df_new, methodResidCov = "noDfCor" )
summary(fitsur_c_3sls)
However, adding more instruments to create an over-identified system do not yield in an output of the Hansen/Sargan test, thus I assume the test should be executed aside from the output and probably associated to systemfit class object.
Thanks in advance.
With g equations, l exogenous variables, and k regressors, the Sargan test for 3SLS is
where u is the stacked residuals, \Sigma is the estimated residual covariance, and P_W is the projection matrix on the exogenous variables. See Ch 12.4 from Davidson & MacKinnon ETM.
Calculating the Sargan test from systemfit should look something like this:
sargan.systemfit=function(results3sls){
result <- list()
u=as.matrix(resid(results3sls)) #model residuals, n x n_eq
n_eq=length(results3sls$eq) # number of equations
n=nrow(u) #number of observations
n_reg=length(coef(results3sls)) # total number of regressors
w=model.matrix(results3sls,which='z') #Matrix of instruments, in block diagonal form with one block per equation
#Need to aggregate into a single block (in case different instruments used per equation)
w_list=lapply(X = 1:n_eq,FUN = function(eq_i){
this_eq_label=results3sls$eq[[eq_i]]$eqnLabel
this_w=w[str_detect(rownames(w),this_eq_label),str_detect(colnames(w),this_eq_label)]
colnames(this_w)=str_remove(colnames(this_w),paste0(this_eq_label,'_'))
return(this_w)
})
w=do.call(cbind,w_list)
w=w[,!duplicated(colnames(w))]
n_inst=ncol(w) #w is n x n_inst, where n_inst is the number of unique instruments/exogenous variables
#estimate residual variance (or use residCov, should be asymptotically equivalent)
var_u=crossprod(u)/n #var_u=results3sls$residCov
P_w=w%*%solve(crossprod(w))%*%t(w) #Projection matrix on instruments w
#as.numeric(u) vectorizes the residuals into a n_eq*n x 1 vector.
result$statistic <- as.numeric(t(as.numeric(u))%*%kronecker(solve(var_u),P_w)%*%as.numeric(u))
result$df <- n_inst*n_eq-n_reg
result$p.value <- 1 - pchisq(result$statistic, result$df)
result$method = paste("Sargan over-identifying restrictions test")
return(result)
}

How to accelerate my written python code: function containing nested functions for classification of points by polygons

I have written the following NumPy code by Python:
def inbox_(points, polygon):
""" Finding points in a region """
ll = np.amin(polygon, axis=0) # lower limit
ur = np.amax(polygon, axis=0) # upper limit
in_idx = np.all(np.logical_and(ll <= points, points < ur), axis=1) # points in the range [boolean]
return in_idx
def operation_(r, gap, ends_ind):
""" calculation formula which is applied on the points specified by inbox_ function """
r_active = np.take(r, ends_ind) # taking values from "r" based on indices and shape (paired_values) of "ends_ind"
r_sub = np.subtract.reduce(r_active, axis=1) # subtracting each paired "r" determined by "ends_ind" [this line will be used in the final return formula]
r_add = np.add.reduce(r_active, axis=1) # adding each paired "r" determined by "ends_ind" [this line will be used in the final return formula]
paired_cent_dis = np.sum((r_add, gap), axis=0) # distance of the each two paired points
return (np.power(gap, 2) * (np.power(paired_cent_dis, 2) + 5 * paired_cent_dis * r_add - 7 * np.power(r_sub, 2))) / (3 * paired_cent_dis) # Formula
def elapses(r, pos, gap, ends_ind, elem_vert, contact_poss):
if len(gap) > 0:
elaps = np.empty([len(elem_vert), ], dtype=object)
operate_ = operation_(r, gap, ends_ind)
#elbav = np.empty([len(elem_vert), ], dtype=object)
#con_num = 0
for i, j in enumerate(elem_vert): # loop for each section (cell or region) of a mesh
in_bool = inbox_(contact_poss, j) # getting boolean array for points within that section
elaps[i] = np.sum(operate_[in_bool]) # performing some calculations on that points and get the sum of them for each section
operate_ = operate_[np.invert(in_bool)] # slicing the arrays by deleting the points on which the calculations were performed to speed-up the code in next loops
contact_poss = contact_poss[np.invert(in_bool)] # as above
#con_num += sum(inbox_(contact_poss, j))
#inba_bool = inbox_(pos, j)
#elbav[i] = 4 * np.pi * np.sum(np.power(r[inba_bool], 3)) / 3
#pos = pos[np.invert(inba_bool)]
#r = r[np.invert(inba_bool)]
return elaps
r = np.load('a.npy')
pos = np.load('b.npy')
gap = np.load('c.npy')
ends_ind = np.load('d.npy')
elem_vert = np.load('e.npy')
contact_poss = np.load('f.npy')
elapses(r, pos, gap, ends_ind, elem_vert, contact_poss)
# a --------r-------> parameter corresponding to each coordinate (point); here radius (23605,) <class 'numpy.ndarray'> <class 'numpy.float64'>
# b -------pos------> coordinates of the points (23605, 3) <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.float64'>
# c -------gap------> if we consider points as spheres by that radii [r], it is maximum length for spheres' over-lap (103832,) <class 'numpy.ndarray'> <class 'numpy.float64'>
# d ----ends_ind----> indices for each over-laped spheres (103832, 2) <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.int64'>
# e ---elem_vert----> vertices of the mesh's sections or cells (2000, 8, 3) <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.float64'>
# f --contact_poss--> a coordinate between the paired spheres (103832, 3) <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.float64'>
This code will be called frequently from another code with big-data inputs. So, speeding up this code is essential. I have tried to use jit decorator from JAX and Numba libraries to accelerate the code, but I could not work with that properly to make the code better. I have tested the code on Colab (for 3 data sets with loops number of 20, 250, and 2000) for speed and the results were:
11 (ms), 47 (ms), 6.62 (s) (per loop) <-- without the commented code lines in the code
137 (ms), 1.66 (s) , 4 (m) (per loop) <-- with activating the commented code lines in the code
What this code does is finding some coordinates in a range and then performing some calculations on them.
I will be very appreciated for any answers that can speed up this code significantly (I believe it could). Also, I will be grateful for any experienced recommendations about speeding up the code by changing (substituting) the used NumPy methods and … or writing method for the math operations.
Notes:
The proposed answers must be executable by python version 2 (being applicable in both versions 2 and 3 is very excellent)
The commented code lines in the code are unnecessary for the main aim and are written just for further evaluations. Any recommendations to handle these lines with the proposed answers is appreciated (is not needed).
Data sets for test:
small data set: https://drive.google.com/file/d/1CswjyoqS8ogLmLQa_oNTOj221chDcbK8/view?usp=sharing
medium data set: https://drive.google.com/file/d/14RJ0Ackx88NzQWloops5FagzuNQYDSrh/view?usp=sharing
large data set: https://drive.google.com/file/d/1dJnXpb3HiAGcRC9PPTwui9joNcij4E_E/view?usp=sharing
First of all, the algorithm can be improved to be much more efficient. Indeed, a polygon can be directly assigned to each point. This is like a classification of points by polygons. Once the classification is done, you can perform one/many reductions by key where the key is the polygon ID.
This new algorithm consists in:
computing all the bounding boxes of the polygons;
classifying the points by polygons;
performing the reduction by key (where the key is the polygon ID).
This approach is much more efficient than iterating over all the points for each polygons and filtering the attributes arrays (eg. operate_ and contact_poss). Indeed, a filtering is an expensive operation since it requires the target array (that may not fit in the CPU caches) to be fully read and then written back. Not to mention this operation requires a temporary array to be allocated/deleted if it is not performed in-place and the operation cannot benefit from being implemented with SIMD instructions on most x86/x86-64 platforms (as it requires the new AVX-512 instruction set). It is also harder to parallelize since the filtering steps are too fast for threads to be useful but steps need to be done sequentially.
Regarding the implementation of the algorithm, Numba can be used to speed up a lot the overall computation. The main benefit of using Numba is to drastically reduce the number of expensive temporary arrays created by Numpy in your current implementation. Note that you can specify the function types to Numba so it can compile functions when it is defined. Assertions can be used to make the code more robust and help the compiler to know the size of a given dimension so to generate a significantly faster code (the JIT compiler of Numba can unroll the loops). Ternaries operators can help a bit the JIT compiler to generate a faster branch-less program.
Note the classification can be easily parallelized using multiple threads. However, one needs to be very careful about constant propagation since some critical constants (like the shape of the working arrays and assertions) tends not to be propagated to the code executed by threads while the propagation is critical to optimize the hot loops (eg. vectorization, unrolling). Note also that creating of many threads can be expensive on machines with many cores (from 10 ms to 0.1 ms). Thus, this is often better to use a parallel implementation only on big input data.
Here is the resulting implementation (working with both Python2 and Python3):
#nb.njit('float64[::1](float64[::1], float64[::1], int64[:,::1])')
def operation_(r, gap, ends_ind):
""" calculation formula which is applied on the points specified by findMatchingPolygons_ function """
nPoints = ends_ind.shape[0]
assert ends_ind.shape[1] == 2
assert gap.size == nPoints
formula = np.empty(nPoints, dtype=np.float64)
for i in range(nPoints):
ind0, ind1 = ends_ind[i]
r0, r1 = r[ind0], r[ind1]
r_sub = r0 - r1
r_add = r0 + r1
cur_gap = gap[i]
paired_cent_dis = r_add + cur_gap
formula[i] = (cur_gap**2 * (paired_cent_dis**2 + 5 * paired_cent_dis * r_add - 7 * r_sub**2)) / (3 * paired_cent_dis)
return formula
# Use `parallel=True` for a parallel implementation
#nb.njit('int32[::1](float64[:,::1], float64[:,:,::1])')
def findMatchingPolygons_(points, polygons):
""" Attribute to all point a region """
nPolygons = polygons.shape[0]
nPolygonPoints = polygons.shape[1]
nPoints = points.shape[0]
assert points.shape[1] == 3
assert polygons.shape[2] == 3
# Compute the bounding boxes of all polygons
ll = np.empty((nPolygons, 3), dtype=np.float64)
ur = np.empty((nPolygons, 3), dtype=np.float64)
for i in range(nPolygons):
ll_x, ll_y, ll_z = polygons[i, 0]
ur_x, ur_y, ur_z = polygons[i, 0]
for j in range(1, nPolygonPoints):
x, y, z = polygons[i, j]
ll_x = x if x<ll_x else ll_x
ll_y = y if y<ll_y else ll_y
ll_z = z if z<ll_z else ll_z
ur_x = x if x>ur_x else ur_x
ur_y = y if y>ur_y else ur_y
ur_z = z if z>ur_z else ur_z
ll[i] = ll_x, ll_y, ll_z
ur[i] = ur_x, ur_y, ur_z
# Find for each point its corresponding polygon
pointPolygonId = np.empty(nPoints, dtype=np.int32)
# Use `nb.prange(nPoints)` for a parallel implementation
for i in range(nPoints):
x, y, z = points[i, 0], points[i, 1], points[i, 2]
pointPolygonId[i] = -1
for j in range(polygons.shape[0]):
if ll[j, 0] <= x < ur[j, 0] and ll[j, 1] <= y < ur[j, 1] and ll[j, 2] <= z < ur[j, 2]:
pointPolygonId[i] = j
break
return pointPolygonId
#nb.njit('float64[::1](float64[:,:,::1], float64[:,::1], float64[::1])')
def computeSections_(elem_vert, contact_poss, operate_):
nPolygons = elem_vert.shape[0]
elaps = np.zeros(nPolygons, dtype=np.float64)
pointPolygonId = findMatchingPolygons_(contact_poss, elem_vert)
for i, polygonId in enumerate(pointPolygonId):
if polygonId >= 0:
elaps[polygonId] += operate_[i]
return elaps
def elapses(r, pos, gap, ends_ind, elem_vert, contact_poss):
if len(gap) > 0:
operate_ = operation_(r, gap, ends_ind)
return computeSections_(elem_vert, contact_poss, operate_)
r = np.load('a.npy')
pos = np.load('b.npy')
gap = np.load('c.npy')
ends_ind = np.load('d.npy')
elem_vert = np.load('e.npy')
contact_poss = np.load('f.npy')
elapses(r, pos, gap, ends_ind, elem_vert, contact_poss)
Here are the results on a old 2-core machine (i7-3520M):
Small dataset:
- Original version: 5.53 ms
- Proposed version (sequential): 0.22 ms (x25)
- Proposed version (parallel): 0.20 ms (x27)
Medium dataset:
- Original version: 53.40 ms
- Proposed version (sequential): 1.24 ms (x43)
- Proposed version (parallel): 0.62 ms (x86)
Big dataset:
- Original version: 5742 ms
- Proposed version (sequential): 144 ms (x40)
- Proposed version (parallel): 67 ms (x86)
Thus, the proposed implementation is up to 86 times faster than the original one.

Why are the trigonometric functions in Julia seem to be slower than in Numpy?

I'm new to Julia, so may be doing something wrong. But I ran a simple test of trigonometric functions, and Julia seems to be significantly slower than Numpy. Need some help to see why.
--- Julia version:
x = rand(100000);
y = similar(x);
#time y.=sin.(x);
--- Numpy version:
import numpy
x = numpy.random.rand(100000)
y = numpy.zeros(x.shape)
%timeit y = numpy.sin(x)
The Julia version regularly gives 1.3 ~ 1.5 ms, but the Numpy version usually gives 0.9 ~ 1 ms. The difference is quite significant. Why is that? Thanks.
x = rand(100000);
y = similar(x);
f(x,y) = (y.=sin.(x));
#time f(x,y)
#time f(x,y)
#time f(x,y)
Gives
julia> #time y.=sin.(x);
0.123145 seconds (577.97 k allocations: 29.758 MiB, 5.70% gc time)
julia> #time y.=sin.(x);
0.000515 seconds (6 allocations: 192 bytes)
julia> #time y.=sin.(x);
0.000512 seconds (6 allocations: 192 bytes)
The first time you call a function, Julia compiles it. Broadcast expressions generate and use an anonymous function, so if you broadcast in the global scope it will compile it each time. Julia works best in function scopes.

Which is faster read(), or readline() or readlines() with respect file io in julia?

Please correct me, if I were wrong:
read is efficient, as I assume:
a) read fetches whole file content to memory in one go, similar to python.
b) readline and readlines brings one line at a time to memory.
In order to expand on the comment here is some example benchmark (to additionally show you how you can perform such tests yourself).
First create some random test data:
open("testdata.txt", "w") do f
for i in 1:10^6
println(f, "a"^100)
end
end
We will want to read in the data in four ways (and calculate the aggregate length of lines):
f1() = sum(length(l) for l in readlines("testdata.txt"))
f2() = sum(length(l) for l in eachline("testdata.txt"))
function f3()
s = 0
open("testdata.txt") do f
while !eof(f)
s += length(readline(f))
end
end
s
end
function f4()
s = 0
for c in read("testdata.txt", String)
s += c != '\n' # assume Linux for simplicity
end
s
end
Now we compare the performance and memory usage of the given options:
julia> using BenchmarkTools
julia> #btime f1()
239.857 ms (2001558 allocations: 146.59 MiB)
100000000
julia> #btime f2()
179.480 ms (2001539 allocations: 137.59 MiB)
100000000
julia> #btime f3()
189.643 ms (2001533 allocations: 137.59 MiB)
100000000
julia> #btime f4()
158.055 ms (13 allocations: 96.32 MiB)
100000000
If you run it on your machine you should get similar results.

Optimising Sparse Array Math

I have a sparse array: term_doc
its size is 622256x715 of Float64. It is very sparse:
Of its ~444,913,040 cells, only about 22,215 of them normally are nonempty.
Of the 622256 rows only 4,699 are occupied
though of the 715 columns all are occupied.
The operator I would like to perform can be described as returning the row normalized and column normalized versions this matrix.
The Naive nonsparse version, I wrote is:
function doUnsparseWay()
gc() #Force Garbage collect before I start (and periodically during). This uses alot of memory
term_doc
N = term_doc./sum(term_doc,1)
println("N done")
gc()
P = term_doc./sum(term_doc,2)
println("P done")
gc()
N[isnan(N)] = 0.0
P[isnan(P)] = 0.0
N,P,term_doc
end
Running this:
> #time N,P,term_doc= doUnsparseWay()
outputs:
N done
P done
elapsed time: 30.97332475 seconds (14466 MB allocated, 5.15% gc time in 13 pauses with 3 full sweep)
It is fairly simple.
It chews memory, and will crash if the garbage collection does not occur at the right times (Thus I call it manually).
But it is fairly fast
I wanted to get it to work on the sparse matrix.
So as not to chew my memory out,
and because logically it is a faster operation -- less cells need operating on.
I followed suggestions from this post and from the performance page of the docs.
function doSparseWay()
term_doc::SparseMatrixCSC{Float64,Int64}
N= spzeros(size(term_doc)...)
N::SparseMatrixCSC{Float64,Int64}
for (doc,total_terms::Float64) in enumerate(sum(term_doc,1))
if total_terms == 0
continue
end
#fastmath #inbounds N[:,doc] = term_doc[:,doc]./total_terms
end
println("N done")
P = spzeros(size(term_doc)...)'
P::SparseMatrixCSC{Float64,Int64}
gfs = sum(term_doc,2)[:]
gfs::Array{Float64,1}
nterms = size(term_doc,1)
nterms::Int64
term_doc = term_doc'
#inbounds #simd for term in 1:nterms
#fastmath #inbounds P[:,term] = term_doc[:,term]/gfs[term]
end
println("P done")
P=P'
N[isnan(N)] = 0.0
P[isnan(P)] = 0.0
N,P,term_doc
end
It never completes.
It gets up to outputting "N Done",
but never outputs "P Done".
I have left it running for several hours.
How can I optimize it so it can complete in reasonable time?
Or if this is not possible, explain why.
First, you're making term_doc a global variable, which is a big problem for performance. Pass it as an argument, doSparseWay(term_doc::SparseMatrixCSC). (The type annotation at the beginning of your function does not do anything useful.)
You want to use an approach similar to the answer by walnuss:
function doSparseWay(term_doc::SparseMatrixCSC)
I, J, V = findnz(term_doc)
normI = sum(term_doc, 1)
normJ = sum(term_doc, 2)
NV = similar(V)
PV = similar(V)
for idx = 1:length(V)
NV[idx] = V[idx]/normI[J[idx]]
PV[idx] = V[idx]/normJ[I[idx]]
end
m, n = size(term_doc)
sparse(I, J, NV, m, n), sparse(I, J, PV, m, n), term_doc
end
This is a general pattern: when you want to optimize something for sparse matrices, extract the I, J, V and perform all your computations on V.