How to add a column of simple moving average of another column to a Julia data frame - dataframe

I have a Julia data frame where one column is called 'close' and I want to add another column to the data frame called 'sma' which is a simple moving average of 'close'. Thanks to anyone who can help!
I noticed a problem in the code amrod. It doesn't account for the first length of SMA that doesn't have enough previous data points for a good SMA and also gives double the SMA that is asked for. I changed it to input zeros up to that point, I also changed the variable names when I was figuring out how it works.
function makeSMA(data, SMA)
len = length(data)
y = Vector{Float64}(len)
for i in 1:SMA-1
y[i] = NaN
end
for i in SMA:len
y[i] = mean(data[i-(SMA-1):i])
end
return y
end

check this:
function ma{T <: Real}(x::Vector{T}, wind::Int)
len = length(x)
y = Vector{Float64}(len)
for i in 1:len
lo = max(1, i - wind)
hi = min(len, i + wind)
y[i] = mean(x[lo:hi])
end
return y
end
x = collect(1:100)
y = ma(x, 4)
then you can hcat(x, y).
EDIT:
If you want a backwards-looking MA you can use something like
function ma{T <: Real}(x::Vector{T}, wind::Int)
len = length(x)
y = Vector{Float64}(len)
for i in 1:len
if i < wind
y[i] = NaN
else
y[i] = mean(x[i - wind + 1:i])
end
end
return y
end

Related

How to calculate the number of scatterplot data points in a particular 'region' of the graph

As my questions says I'm trying to find a way to calculate the number of scatterplot data points (pink dots) in a particular 'region' of the graph or either side of the black lines/boundaries. Open to any ideas as I don't even know where to start. Thank you!!
The code:
################################
############ GES ##############
################################
p = fits.open('GES_DR17.fits')
pfeh = p[1].data['Fe_H']
pmgfe = p[1].data['Mg_Fe']
pmnfe = p[1].data['Mn_Fe']
palfe = p[1].data['Al_Fe']
#Calculate [(MgMn]
pmgmn = pmgfe - pmnfe
ax1a.scatter(palfe, pmgmn, c='thistle', marker='.',alpha=0.8,s=500,edgecolors='black',lw=0.3, vmin=-2.5, vmax=0.65)
ax1a.plot([-1,-0.07],[0.25,0.25], c='black')
ax1a.plot([-0.07,1.0],[0.25,0.25], '--', c='black')
x = np.arange(-0.15,0.4,0.01)
ax1a.plot(x,4.25*x+0.8875, 'k', c='black')
Let's call the two axes x and y. Any line in this plot can be written as
a*x + b*y + c = 0
for some value of a,b,c. But if we plug in a points with coordinates (x,y) in to the left hand side of the equation above we get positive value for all points of the one side of the line, and a negative value for the points on the other side of the line. So If you have multiple regions delimited by lines you can just check the signs. With this you can create a boolean mask for each region, and just count the number of Trues by using np.sum.
# assign the coordinates to the variables x and y as numpy arrays
x = ...
y = ...
line1 = a1*x + b1*y + c1
line2 = a2*x + b2*y + c2
mask = (line1 > 0) & (line2 < 0) # just an example, signs might vary
count = np.sum(mask)

Vectorizing np.minimum & np.minimum over axes with broadcasting

I've roughly got something like
A = np.random.random([n, 2])
B = np.random.random([3, 2])
...
ret = 0
for b in B:
for a in A:
start = np.max([a[0], b[0]])
end = np.min([a[1], b[1]])
ret += np.max([0, end - start])
return ret
Putting it into words, A is an input array of n 2D intervals and B is a known array of 2D intervals, and I'm trying to compute the length of total intersection between all intervals.
Is there a way to vectorize it? My first though was using the np.maximize and np.minimize along with broadcasting, but nothing seems to work.
Broadcast after extending dimensions to vectorize things -
p1 = np.maximum(A[:,None,0],B[:,0])
p2 = np.minimum(A[:,None,1],B[:,1])
ret = np.maximum(0,p2-p1).sum()

How to Plot a function of two variables in Julia with pyplot

I'm trying to plot a function of two variables with pyplot in Julia. The working starting-point is the following (found here at StackOverflow):
function f(z,t)
return z*t
end
z = linspace(0,5,11)
t = linspace(0,40,4)
for tval in t
plot(z, f(z, tval))
end
show()
This works right for me and is giving me exactly what I wanted:
a field of lines.
My own functions are as follows:
## needed functions ##
const gamma_0 = 6
const Ksch = 1.2
const Kver = 1.5
function Kvc(vc)
if vc <= 0
return 0
elseif vc < 20
return (100/vc)^0.1
elseif vc < 100
return 2.023/(vc^0.153)
elseif vc == 100
return 1
elseif vc > 100
return 1.380/(vc^0.07)
else
return 0
end
end
function Kgamma(gamma_t)
return 1-((gamma_t-gamma_0)/100)
end
function K(gamma_t, vc)
return Kvc(vc)*Kgamma(gamma_t)*Ksch*Kver
end
I've tried to plot them as follows:
i = linspace(0,45,10)
j = linspace(0,200,10)
for i_val in i
plot(i,K(i,j))
end
This gives me the following Error:
isless has no method matching isless(::Int64, ::Array{Float64,1})
while loading In[51], in expression starting on line 3
in Kvc at In[17]:2 in anonymous at no file:4
Obviously, my function cant deal with an array.
Next try:
i = linspace(0,200,11)
j = linspace(0,45,11)
for i_val in i
plot(i_val,map(K,i_val,j))
end
gives me a empty plot only with axes
Can anybody please give me a hint...
EDIT
A simpler example:
using PyPlot
function P(n,M)
return (M*n^3)/9550
end
M = linspace(1,5,5)
n = linspace(0,3000,3001)
for M_val in M
plot(n,P(n,M_val))
end
show()
Solution
OK, with your help I found this solution for the shortened example which works for me as intended:
function P(n,M)
result = Array(Float64, length(n))
for (idx, val) in enumerate(n)
result[idx] = (M*val^3)/9550
end
return result
end
n = linspace(0,3000,3001)
for M_val = 1:5
plot(n,P(n,M_val))
end
show()
This gives me what I wanted for this shortened example. The remainig question is: could it be done in a simpler more elegant way?
I'll try to apply it to the original example and post it when I'll succed.
I don't completely follow all the details of what you're trying to accomplish, but here are examples on how you can modify a couple of your functions so that they accept and return arrays:
function Kvc(vc)
result = Array(Float64, length(vc))
for (idx, val) in enumerate(vc)
if val <= 0
result[idx] = 0
elseif val < 20
result[idx] = (100/val)^0.1
elseif val < 100
result[idx] = 2.023/(val^0.153)
elseif val == 100
result[idx] = 1
elseif val > 100
result[idx] = 1.380/(val^0.07)
else
result[idx] = 0
end
end
return result
end
function Kgamma(gamma_t)
return ones(length(gamma_t))-((gamma_t - gamma_0)/100)
end
Also, for your loop, I think you probably want something like:
for i_val in i
plot(i_val,K(i_val,j))
end
rather than plot(i, K(i,j), as that would just print the same thing over and over.
< is defined for scalars. I think you need to broadcast it for arrays, i.e. use .<. Example:
julia> x = 2
2
julia> x < 3
true
julia> x < [3 4]
ERROR: MethodError: no method matching isless(::Int64, ::Array{Int64,2})
Closest candidates are:
isless(::Real, ::AbstractFloat)
isless(::Real, ::Real)
isless(::Integer, ::Char)
in <(::Int64, ::Array{Int64,2}) at .\operators.jl:54
in eval(::Module, ::Any) at .\boot.jl:234
in macro expansion at .\REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at .\event.jl:46
julia> x .< [3 4]
1x2 BitArray{2}:
true true

Iterating over multidimensional Numpy array

What is the fastest way to iterate over all elements in a 3D NumPy array? If array.shape = (r,c,z), there must be something faster than this:
x = np.asarray(range(12)).reshape((1,4,3))
#function that sums nearest neighbor values
x = np.asarray(range(12)).reshape((1, 4,3))
#e is my element location, d is the distance
def nn(arr, e, d=1):
d = e[0]
r = e[1]
c = e[2]
return sum(arr[d,r-1,c-1:c+2]) + sum(arr[d,r+1, c-1:c+2]) + sum(arr[d,r,c-1]) + sum(arr[d,r,c+1])
Instead of creating a nested for loop like the one below to create my values of e to run the function nn for each pixel :
for dim in range(z):
for row in range(r):
for col in range(c):
e = (dim, row, col)
I'd like to vectorize my nn function in a way that extracts location information for each element (e = (0,1,1) for example) and iterates over ALL elements in my matrix without having to manually input each locational value of e OR creating a messy nested for loop. I'm not sure how to apply np.vectorize to this problem. Thanks!
It is easy to vectorize over the d dimension:
def nn(arr, e):
r,c = e # (e[0],e[1])
return np.sum(arr[:,r-1,c-1:c+2],axis=2) + np.sum(arr[:,r+1,c-1:c+2],axis=2) +
np.sum(arr[:,r,c-1],axis=?) + np.sum(arr[:,r,c+1],axis=?)
now just iterate over the row and col dimensions, returning a vector, that is assigned to the appropriate slot in x.
for row in <correct range>:
for col in <correct range>:
x[:,row,col] = nn(data, (row,col))
The next step is to make
rows = [:,None]
cols =
arr[:,rows-1,cols+2] + arr[:,rows,cols+2] etc.
This kind of problem has come up many times, with various descriptions - convolution, smoothing, filtering etc.
We could do some searches to find the best, or it you prefer, we could guide you through the steps.
Converting a nested loop calculation to Numpy for speedup
is a question similar to yours. There's only 2 levels of looping, and sum expression is different, but I think it has the same issues:
for h in xrange(1, height-1):
for w in xrange(1, width-1):
new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] +
t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w])
Here's what I ended up doing. Since I'm returning the xv vector and slipping it in to the larger 3D array lag, this should speed up the process, right? data is my input dataset.
def nn3d(arr, e):
r,c = e
n = np.copy(arr[:,r-1:r+2,c-1:c+2])
n[:,1,1] = 0
n3d = np.ma.masked_where(n == nodata, n)
xv = np.zeros(arr.shape[0])
for d in range(arr.shape[0]):
if np.ma.count(n3d[d,:,:]) < 2:
element = nodata
else:
element = np.sum(n3d[d,:,:])/(np.ma.count(n3d[d,:,:])-1)
xv[d] = element
return xv
lag = np.zeros(shape = data.shape)
for r in range(1,data.shape[1]-1): #boundary effects
for c in range(1,data.shape[2]-1):
lag[:,r,c] = nn3d(data,(r,c))
What you are looking for is probably array.nditer:
a = np.arange(6).reshape(2,3)
for x in np.nditer(a):
print(x, end=' ')
which prints
0 1 2 3 4 5

Explain np.polyfit and np.polyval for a scatter plot

I have to make a scatter plot and liner fit to my data. prediction_08.Dem_Adv and prediction_08.Dem_Win are two column of datas. I know that np.polyfit returns coefficients. But what is np.polyval doing here? I saw the documentation, but the explanation is confusing. can some one explain to me clearly
plt.plot(prediction_08.Dem_Adv, prediction_08.Dem_Win, 'o')
plt.xlabel("2008 Gallup Democrat Advantage")
plt.ylabel("2008 Election Democrat Win")
fit = np.polyfit(prediction_08.Dem_Adv, prediction_08.Dem_Win, 1)
x = np.linspace(-40, 80, 10)
y = np.polyval(fit, x)
plt.plot(x, y)
print fit
np.polyval is applying the polynomial function which you got using polyfit. If you get y = mx+ c relationship. The np.polyval function will multiply your x values with fit[0] and add fit[1]
Polyval according to Docs:
N = len(p)
y = p[0]*x**(N-1) + p[1]*x**(N-2) + ... + p[N-2]*x + p[N-1]
If the relationship is y = ax**2 + bx + c,
fit = np.polyfit(x,y,2)
a = fit[0]
b = fit[1]
c = fit[2]
If you do not want to use the polyval function:
y = a*(x**2) + b*(x) + c
This will create the same output as polyval.