Force single line of string in VObject - pandas

I am trying to create vCards (Email contacts) unsing the vobject library and pandas for python.
When serializing the values I get new lines in the "notes" of my output(no new lines in source). In every new line, created by ".serialize()", there is also a space in the beginning. I would need to get rid of both.
Example of output:
BEGIN:VCARD
VERSION:3.0
EMAIL;TYPE=INTERNET:test#example.at
FN:Valentina test
N:Valentina;test;;;
NOTE:PelletiererIn Mitglieder 2 Preiserhebung Aussendung 2 Pressespiegelver
sand 2 GeschäftsführerIn PPA_PelletiererInnen GeschäftsführerIn_Pellet
iererIn
ORG:Test Company
TEL;TYPE=CELL:
TEL;TYPE=CELL:
TEL;TYPE=CELL:
END:VCARD
Is there a way that I can force the output in a single line?
output = ""
for _,row in df.iterrows():
j = vobject.vCard()
j.add('n')
j.n.value = vobject.vcard.Name(row["First Name"],row["Last Name"])
j.add('fn')
j.fn.value = (str(row["First Name"]) + " " + row["Last Name"])
o = j.add("email")
o.value = str((row["E-mail Address"]))
o.type_param = "INTERNET"
#o = j.add("email")
#o.value = str((row["E-mail 2 Address"]))
#o.type_param = "INTERNET"
j.add('org')
j.org.value = [row["Organization"]]
k = j.add("tel")
k.value = str(row["Home Phone"])
k.type_param = "CELL"
k = j.add("tel")
k.value = str(row["Business Phone"])
k.type_param = "CELL"
k = j.add("tel")
k.value = str(row["Mobile Phone"])
k.type_param = "CELL"
j.add("note")
j.note.value = row["Notiz für Kontaktexport"]
output += j.serialize()
print(output)

Related

Julia DifferentialEquations.jl all variable output

I have the following example:
using DifferentialEquations
function test1(du,u,p,t)
a,b,c = p
d=a^0.1*(t+1)
e=u[1]/a
f=u[2]/d
du[1] = a*u[1]
du[2] = d*u[2]
du[3] = b*u[2] - c*u[3]
end
p = (2,0.75,0.8)
u0 = [1.0;1.0;1.0]
tspan = (0.0,3.0)
prob = ODEProblem(test1,u0,tspan,p)
sol = solve(prob,saveat=0.3)
The sol objects contain state outputs but, I need efficiently other variables ("d","e","f") as well.
The closest I can get is:
function test2(du,u,p,t)
global i
global Out_values
global sampletimes
a,b,c = p
d=a^0.1*(t+1)
e=u[1]/a
f=u[2]/d
if t in sampletimes
Out_values[1,i] = d
Out_values[2,i] = e
Out_values[3,i] = f
i=i+1
end
du[1] = a*u[1]
du[2] = d*u[2]
du[3] = b*u[2] - c*u[3]
end
sampletimes = tspan[1]:0.3:tspan[2]
Out_values = Array{Float64}(undef, 3, 2*11)
i=1
prob = ODEProblem(test2,u0,tspan,p)
sol = solve(prob,saveat=0.3,tstops=sampletimes)
However, this solution is not ideal because:
it duplicates saveat and I get two sets of slightly different outputs (not sure why), and
it can't expand if I decide not to use saveat and I want to output all solutions, i.e. sol = solve(prob).
Any help is appreciated.

Coloring three pieces of one picture using one colormap

I need to draw a contourplot with function defined on a hexagonal area of points. I build this function using three separate meshgrids and then draw all contourplots on one axis. This looks something like this:
steps = int(k0/kstep0) # just a parameter of how many points are taken in hexagon
energies1 = np.zeros((steps,steps))
energies2 = np.zeros((steps,steps))
energies3 = np.zeros((steps,steps))
gridi = np.arange(steps)
gridj = np.arange(steps)
iv,jv = np.meshgrid(gridi,gridj)
Kx1 = -0.5*np.sqrt(3)*k0
Ky1 = -0.5*k0 # Dirac point coordinates
kstepix1 = 0
kstepiy1 = kstep0
kstepjx1 = 0.5*np.sqrt(3)*kstep0
kstepjy1 = -0.5*kstep0
kxv1 = kstepix1*jv+kstepjx1*iv+Kx1
kyv1 = kstepiy1*jv+kstepjy1*iv+Ky1
Kx2 = 0
Ky2 = k0 # Dirac point coordinates
kstepix2 = 0.5*np.sqrt(3)*kstep0
kstepiy2 = -0.5*kstep0
kstepjx2 = -0.5*np.sqrt(3)*kstep0
kstepjy2 = -0.5*kstep0
kxv2 = kstepix2*iv+kstepjx2*jv+Kx2
kyv2 = kstepiy2*iv+kstepjy2*jv+Ky2
Kx3 = 0.5*np.sqrt(3)*k0
Ky3 = -0.5*k0 # Dirac point coordinates
kstepix3 = -0.5*np.sqrt(3)*kstep0
kstepiy3 = -0.5*kstep0
kstepjx3 = 0
kstepjy3 = kstep0
kxv3 = kstepix3*jv+kstepjx3*iv+Kx3
kyv3 = kstepiy3*jv+kstepjy3*iv+Ky3
for i in np.arange(steps):
for j in np.arange(steps):
kx = i*kstepix1 + j*kstepjx1 + Kx1
ky = i*kstepiy1 + j*kstepjy1 + Ky1
ham = TwistHamiltonian(kx,ky,angle,N,t_layers) # here I solve some matrix and extract its eigenvalues
eigenvalues, eigenvectors = np.linalg.eigh(ham)
energies1[i,j] = np.min(np.abs(eigenvalues))
for i in np.arange(steps):
for j in np.arange(steps):
kx = i*kstepix2 + j*kstepjx2 + Kx2
ky = i*kstepiy2 + j*kstepjy2 + Ky2
ham = TwistHamiltonian(kx,ky,angle,N,t_layers)
eigenvalues, eigenvectors = np.linalg.eigh(ham)
energies2[i,j] = np.min(np.abs(eigenvalues))
for i in np.arange(steps):
for j in np.arange(steps):
kx = i*kstepix3 + j*kstepjx3 + Kx3
ky = i*kstepiy3 + j*kstepjy3 + Ky3
ham = TwistHamiltonian(kx,ky,angle,N,t_layers)
eigenvalues, eigenvectors = np.linalg.eigh(ham)
energies3[i,j] = np.min(np.abs(eigenvalues))
from matplotlib import pyplot as plt
from matplotlib.cm import ScalarMappable
save_to = '../plots/ContourPlots/TwistEnergy'+'kstep0_'+str(kstep0)+'tlayers_'+str(t_layers)+'theta'+str(theta)[:5]+"_"+str(N)+'.png'
fig, ax = plt.subplots(figsize=(9,9))
cp1 = ax.contourf(kxv1,kyv1,energies1,cmap='RdGy')
cp2 = ax.contourf(kxv2,kyv2,energies2,cmap='RdGy')
cp3 = ax.contourf(kxv3,kyv3,energies3,cmap='RdGy')
The results are close to desired example of output image. However, the coloring of the three pieces is slightly different and it messes the whole pucture. How can I improve this situation?

Create Dataframe name from 2 strings or variables pandas

i am extracting selected pages from a pdf file. and want to assign dataframe name based on the pages extracted:
file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
for i in selected_pages():
df{str(i)} = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True,area = [100,10,740,950],pages= (i), index = False)
print (df{str(i)} )
The idea, ultimately, as in above example, is to have dataframes: df10, df11. I have tried "df" + str(i), "df" & str(i) & df{str(i)}. however all are giving error msg: SyntaxError: invalid syntax
Or any better way of doing it is most welcome. thanks
This is where a dictionary would be a much better option.
Also note the error you have at the start of the loop. selected_pages is a list, so you can't do selected_pages().
file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
df = {}
for i in selected_pages:
df[i] = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True, area = [100,10,740,950], pages= (i), index = False)
i = int(i) - 1 # this will bring it to 10
dfB = df[str(i)]
#select row number to drop: 0:4
dfB.drop(dfB.index[0:4],axis =0, inplace = True)
dfB.columns = ['col1','col2','col3','col4','col5']

How to speed up simple linear algebra optimization probelm in Julia?

I implemented the LSDD changepoint detection method decribed in [1] in Julia, to see if I could make it faster than the existing python implementation [2], which is based on a grid search that looks for the optimal parameters.
I obtain the desired results but despite my best efforts, my grid search version of it takes about the same time to compute as the python one, which is still way too long for real applications.
I also tried using the Optimize package which only makes things worse (2 or 3 times slower).
Here is the grid search that I implemented :
using Random
using LinearAlgebra
function squared_distance(X::Array{Float64,1},C::Array{Float64,1})
sqd = zeros(length(X),length(C))
for i in 1:length(X)
for j in 1:length(C)
sqd[i,j] = X[i]^2 + C[j]^2 - 2*X[i]*C[j]
end
end
return sqd
end
function lsdd(x::Array{Float64,1},y::Array{Float64,1}; folds = 5, sigma_list = nothing , lambda_list = nothing)
lx,ly = length(x), length(y)
b = min(lx+ly,300)
C = shuffle(vcat(x,y))[1:b]
CC_dist2 = squared_distance(C,C)
xC_dist2, yC_dist2 = squared_distance(x,C), squared_distance(y,C)
Tx,Ty = length(x) - div(lx,folds), length(y) - div(ly,folds)
#Define the training and testing data sets
cv_split1, cv_split2 = floor.(collect(1:lx)*folds/lx), floor.(collect(1:ly)*folds/ly)
cv_index1, cv_index2 = shuffle(cv_split1), shuffle(cv_split2)
tr_idx1,tr_idx2 = [findall(x->x!=i,cv_index1) for i in 1:folds], [findall(x->x!=i,cv_index2) for i in 1:folds]
te_idx1,te_idx2 = [findall(x->x==i,cv_index1) for i in 1:folds], [findall(x->x==i,cv_index2) for i in 1:folds]
xTr_dist, yTr_dist = [xC_dist2[i,:] for i in tr_idx1], [yC_dist2[i,:] for i in tr_idx2]
xTe_dist, yTe_dist = [xC_dist2[i,:] for i in te_idx1], [yC_dist2[i,:] for i in te_idx2]
if sigma_list == nothing
sigma_list = [0.25, 0.5, 0.75, 1, 1.2, 1.5, 2, 2.5, 2.2, 3, 5]
end
if lambda_list == nothing
lambda_list = [1.00000000e-03, 3.16227766e-03, 1.00000000e-02, 3.16227766e-02,
1.00000000e-01, 3.16227766e-01, 1.00000000e+00, 3.16227766e+00,
1.00000000e+01]
end
#memory prealocation
score_cv = zeros(length(sigma_list),length(lambda_list))
H = zeros(b,b)
hx_tr, hy_tr = [zeros(b,1) for i in 1:folds], [zeros(b,1) for i in 1:folds]
hx_te, hy_te = [zeros(1,b) for i in 1:folds], [zeros(1,b) for i in 1:folds]
#h_tr,h_te = zeros(b,1), zeros(1,b)
theta = zeros(b)
for (sigma_idx,sigma) in enumerate(sigma_list)
#the expression of H is different for higher dimension
#H = sqrt((sigma^2)*pi)*exp.(-CC_dist2/(4*sigma^2))
set_H(H,CC_dist2,sigma,b)
#check if the sum is performed along the right dimension
set_htr(hx_tr,xTr_dist,sigma,Tx), set_htr(hy_tr,yTr_dist,sigma,Ty)
set_hte(hx_te,xTe_dist,sigma,lx-Tx), set_hte(hy_te,yTe_dist,sigma,ly-Ty)
for i in 1:folds
h_tr = hx_tr[i] - hy_tr[i]
h_te = hx_te[i] - hy_te[i]
#set_h(h_tr,hx_tr[i],hy_tr[i],b)
#set_h(h_te,hx_te[i],hy_te[i],b)
for (lambda_idx,lambda) in enumerate(lambda_list)
set_theta(theta,H,lambda,h_tr,b)
score_cv[sigma_idx,lambda_idx] += dot(theta,H*theta) - 2*dot(theta,h_te)
end
end
end
#retrieve the value of the optimal parameters
sigma_chosen = sigma_list[findmin(score_cv)[2][2]]
lambda_chosen = lambda_list[findmin(score_cv)[2][2]]
#calculating the new "optimal" solution
H = sqrt((sigma_chosen^2)*pi)*exp.(-CC_dist2/(4*sigma_chosen^2))
H_lambda = H + lambda_chosen*Matrix{Float64}(I, b, b)
h = (1/lx)*sum(exp.(-xC_dist2/(2*sigma_chosen^2)),dims = 1) - (1/ly)*sum(exp.(-yC_dist2/(2*sigma_chosen^2)),dims = 1)
theta_final = H_lambda\transpose(h)
f = transpose(theta_final).*sum(exp.(-vcat(xC_dist2,yC_dist2)/(2*sigma_chosen^2)),dims = 1)
L2 = 2*dot(theta_final,h) - dot(theta_final,H*theta_final)
return L2
end
function set_H(H::Array{Float64,2},dist::Array{Float64,2},sigma::Float64,b::Int16)
for i in 1:b
for j in 1:b
H[i,j] = sqrt((sigma^2)*pi)*exp(-dist[i,j]/(4*sigma^2))
end
end
end
function set_theta(theta::Array{Float64,1},H::Array{Float64,2},lambda::Float64,h::Array{Float64,2},b::Int64)
Hl = (H + lambda*Matrix{Float64}(I, b, b))
LAPACK.posv!('L', Hl, h)
theta = h
end
function set_htr(h::Array{Float64,1},dists::Array{Float64,2},sigma::Float64,T::Int16)
for (CVidx,dist) in enumerate(dists)
for (idx,value) in enumerate((1/T)*sum(exp.(-dist/(2*sigma^2)),dims = 1))
h[CVidx][idx] = value
end
end
end
function set_hte(h::Array{Float64,1},dists::Array{Float64,2},sigma::Array{Float64,1},T::Int16)
for (CVidx,dist) in enumerate(dists)
for (idx,value) in enumerate((1/T)*sum(exp.(-dist/(2*sigma^2)),dims = 1))
h[CVidx][idx] = value
end
end
end
function set_h(h,h1,h2,b)
for i in 1:b
h[i] = h1[i] - h2[i]
end
end
The set_H, set_h and set_theta functions are there because I read somewhere that modifying prealocated memory in place with a function was faster, but it did not make a great difference.
To test it, I use two random distribution as input data :
x,y = rand(500),1.5*rand(500)
lsdd(x,y) #returns a value around 0.3
Now here is the version of the code where I try to use Optimizer :
function Theta(sigma::Float64,lambda::Float64,x::Array{Float64,1},y::Array{Float64,1},folds::Int8)
lx,ly = length(x), length(y)
b = min(lx+ly,300)
C = shuffle(vcat(x,y))[1:b]
CC_dist2 = squared_distance(C,C)
xC_dist2, yC_dist2 = squared_distance(x,C), squared_distance(y,C)
#the subsets are not be mutually exclusive !
Tx,Ty = length(x) - div(lx,folds), length(y) - div(ly,folds)
shuffled_x, shuffled_y = [shuffle(1:lx) for i in 1:folds], [shuffle(1:ly) for i in 1:folds]
cv_index1, cv_index2 = floor.(collect(1:lx)*folds/lx)[shuffle(1:lx)], floor.(collect(1:ly)*folds/ly)[shuffle(1:ly)]
tr_idx1,tr_idx2 = [i[1:Tx] for i in shuffled_x], [i[1:Ty] for i in shuffled_y]
te_idx1,te_idx2 = [i[Tx:end] for i in shuffled_x], [i[Ty:end] for i in shuffled_y]
xTr_dist, yTr_dist = [xC_dist2[i,:] for i in tr_idx1], [yC_dist2[i,:] for i in tr_idx2]
xTe_dist, yTe_dist = [xC_dist2[i,:] for i in te_idx1], [yC_dist2[i,:] for i in te_idx2]
score_cv = 0
Id = Matrix{Float64}(I, b, b)
H = sqrt((sigma^2)*pi)*exp.(-CC_dist2/(4*sigma^2))
hx_tr, hy_tr = [transpose((1/Tx)*sum(exp.(-dist/(2*sigma^2)),dims = 1)) for dist in xTr_dist], [transpose((1/Ty)*sum(exp.(-dist/(2*sigma^2)),dims = 1)) for dist in yTr_dist]
hx_te, hy_te = [(lx-Tx)*sum(exp.(-dist/(2*sigma^2)),dims = 1) for dist in xTe_dist], [(ly-Ty)*sum(exp.(-dist/(2*sigma^2)),dims = 1) for dist in yTe_dist]
for i in 1:folds
h_tr, h_te = hx_tr[i] - hy_tr[i], hx_te[i] - hy_te[i]
#theta = (H + lambda * Id)\h_tr
theta = copy(h_tr)
Hl = (H + lambda*Matrix{Float64}(I, b, b))
LAPACK.posv!('L', Hl, theta)
score_cv += dot(theta,H*theta) - 2*dot(theta,h_te)
end
return score_cv,(CC_dist2,xC_dist2,yC_dist2)
end
function cost(params::Array{Float64,1},x::Array{Float64,1},y::Array{Float64,1},folds::Int8)
s,l = params[1],params[2]
return Theta(s,l,x,y,folds)[1]
end
"""
Performs the optinization
"""
function lsdd3(x::Array{Float64,1},y::Array{Float64,1}; folds = 4)
start = [1,0.1]
b = min(length(x)+length(y),300)
lx,ly = length(x),length(y)
#result = optimize(params -> cost(params,x,y,folds),fill(0.0,2),fill(50.0,2),start, Fminbox(LBFGS(linesearch=LineSearches.BackTracking())); autodiff = :forward)
result = optimize(params -> cost(params,x,y,folds),start, BFGS(),Optim.Options(f_calls_limit = 5, iterations = 5))
#bboptimize(rosenbrock2d; SearchRange = [(-5.0, 5.0), (-2.0, 2.0)])
#result = optimize(cost,[0,0],[Inf,Inf],start, Fminbox(AcceleratedGradientDescent()))
sigma_chosen,lambda_chosen = Optim.minimizer(result)
CC_dist2, xC_dist2, yC_dist2 = Theta(sigma_chosen,lambda_chosen,x,y,folds)[2]
H = sqrt((sigma_chosen^2)*pi)*exp.(-CC_dist2/(4*sigma_chosen^2))
h = (1/lx)*sum(exp.(-xC_dist2/(2*sigma_chosen^2)),dims = 1) - (1/ly)*sum(exp.(-yC_dist2/(2*sigma_chosen^2)),dims = 1)
theta_final = (H + lambda_chosen*Matrix{Float64}(I, b, b))\transpose(h)
f = transpose(theta_final).*sum(exp.(-vcat(xC_dist2,yC_dist2)/(2*sigma_chosen^2)),dims = 1)
L2 = 2*dot(theta_final,h) - dot(theta_final,H*theta_final)
return L2
end
No matter, which kind of option I use in the optimizer, I always end up with something too slow. Maybe the grid search is the best option, but I don't know how to make it faster... Does anyone have an idea how I could proceed further ?
[1] : http://www.mcduplessis.com/wp-content/uploads/2016/05/Journal-IEICE-2014-CLSDD-1.pdf
[2] : http://www.ms.k.u-tokyo.ac.jp/software.html

groovy closure instantiate variables

is it possible to create a set of variables from a list of values using a closure??
the reason for asking this is to create some recursive functionality based on a list of (say) two three four or five parts
The code here of course doesn't work but any pointers would be helpful.then
def longthing = 'A for B with C in D on E'
//eg shopping for 30 mins with Fiona in Birmingham on Friday at 15:00
def breaks = [" on ", " in ", "with ", " for "]
def vary = ['when', 'place', 'with', 'event']
i = 0
line = place = with = event = ""
breaks.each{
shortline = longthing.split(breaks[i])
longthing= shortline[0]
//this is the line which obviously will not work
${vary[i]} = shortline[1]
rez[i] = shortline[1]
i++
}
return place + "; " + with + "; " + event
// looking for answer of D; C; B
EDIT>>
Yes I am trying to find a groovier way to clean up this, which i have to do after the each loop
len = rez[3].trim()
if(len.contains("all")){
len = "all"
} else if (len.contains(" ")){
len = len.substring(0, len.indexOf(" ")+2 )
}
len = len.replaceAll(" ", "")
with = rez[2].trim()
place = rez[1].trim()
when = rez[0].trim()
event = shortline[0]
and if I decide to add another item to the list (which I just did) I have to remember which [i] it is to extract it successfully
This is the worker part for then parsing dates/times to then use jChronic to convert natural text into Gregorian Calendar info so I can then set an event in a Google Calendar
How about:
def longthing = 'A for B with C in D on E'
def breaks = [" on ", " in ", "with ", " for "]
def vary = ['when', 'place', 'with', 'event']
rez = []
line = place = with = event = ""
breaks.eachWithIndex{ b, i ->
shortline = longthing.split(b)
longthing = shortline[0]
this[vary[i]] = shortline[1]
rez[i] = shortline[1]
}
return place + "; " + with + "; " + event
when you use a closure with a List and "each", groovy loops over the element in the List, putting the value in the list in the "it" variable. However, since you also want to keep track of the index, there is a groovy eachWithIndex that also passes in the index
http://groovy.codehaus.org/GDK+Extensions+to+Object
so something like
breaks.eachWithIndex {item, index ->
... code here ...
}