Python multiprocessing while loop, return and append outputs - while-loop

I have a conceptually simple code I want to parallelize, but all the other threads I found about are too complicated and I do not understand how to apply them to my case, or even if they are applicable.
In my code, a function with multiple arguments is called over a while loop and returns both an output and the exit condition from the loop. I want to parallelize the while loop. I am using Python 3.7.3.
Here is a simplified example:
import multiprocessing as mp
import numpy as np
import time
def foo(i, arg1, arg2):
n = np.random.rand()
n = arg1*n + arg2
if n > 0.9:
stop = True
else:
stop = False
return [i, n], stop
if __name__ == '__main__':
i = 0
stop = False
output = list()
while not stop:
out, stop = foo(i, 1, 0)
i = i + 1
if not stop:
output.append(out)
print(np.asarray(output))
Output:
[[ 0. 0.25295033]
[ 1. 0.53795096]
[ 2. 0.48774803]
[ 3. 0.09281972]
[ 4. 0.75053227]
[ 5. 0.30367072]
[ 6. 0.57043762]
[ 7. 0.4589554 ]
[ 8. 0.33231446]
[ 9. 0.76805717]
[10. 0.22486246]
[11. 0.69499273]
[12. 0.67616563]]
EDIT. I would like to "bump" this thread as this is something I really need help about and I cannot do it by myself. Meta-etiquette says I should edit by adding value to the question, but I do not think I could add anything else: I just need to parallelize the code presented. I would really appreciate any (practical) feedback.

Related

How to solve a set of nonlinear equations multiple times when one of the coefficient of the variable changes

I want to solve for the variables s,L for different values of t. t is a part of my second equation and its values changes, I tried to solve for s,L for different t values then append the values to an empty list so that i could have different values of s, L for diiferent t values. But what i was getting is just an empty list.PLease help me with this
from scipy.optimize import fsolve
import numpy as np
import math as m
q0=0.0011
thetas,thetai,thetar=0.43,0.1,0.05
ks=0.0022#m/hr
psib=-0.15# m
lamda=1
eta=2+3*lamda
ki=8.676*10**(-8)
si=0.13157
t=np.array([3,18,24])
S=0.02/24
delta=-0.1001
b=[]
n=[]
for i in range(3):
def equations(p):
s, L = p
f1=(ks*s**(3+(2/lamda))-(psib/(1-eta))*(((ki*si**(-1/lamda))-(ks*s**(3+(1/lamda))))/L)-q0)
f2=(L*(s*(thetas-thetar))+S*t[i]*0.5*(m.exp(-delta*psib*(-1+s**(-1/lamda))))-(q0-ki)*t[i])
return(f1,f2)
s,L=fsolve(equations,([0.19,0.001]))
b.append(s)
n.append(L)
print(b)
print(n)
There are several ways to evaluate this system with an adjustable parameter. You could plug each value in before solving, which would make it compatible with additional solvers if fsolve didn't give you the desired results, or you could utilize the args parameter within fsolve. If I set up a dummy system such that I try to find x,y,z for some initial guess, and step through a parameter, I can append a preallocated solution array with the results
import numpy as np
from scipy.optimize import fsolve
a = np.linspace(0,10,21)
def equations(variables, a):
x,y,z = variables
eq1 = x+y+z*a
eq2 = x-y-z
eq3 = x*y*x*a
return tuple([eq1, eq2, eq3])
solutions = np.zeros((21,3))
for idx, i in enumerate(a):
solutions[idx] = fsolve(equations, [-1,0,1], args=(i))
print(solutions)
which gives
[[ 5.00000000e-01 -5.00000000e-01 1.00000000e+00]
[ 9.86864911e-17 -2.96059473e-16 3.94745964e-16]
[ 1.62191889e-39 -1.28197512e-16 1.28197512e-16]
[-2.15414908e-17 -1.07707454e-16 8.61659633e-17]
[ 2.19853562e-28 6.59560686e-28 -4.39707124e-28]
[-1.20530409e-28 -2.81237621e-28 1.60707212e-28]
[-3.34744837e-17 -6.69489674e-17 3.34744837e-17]
[ 6.53567253e-17 1.17642106e-16 -5.22853803e-17]
[-3.14018492e-17 -5.23364153e-17 2.09345661e-17]
[-5.99115518e-17 -9.41467242e-17 3.42351724e-17]
[ 5.18199815e-29 7.77299722e-29 -2.59099907e-29]
[-2.70691440e-17 -3.90998747e-17 1.20307307e-17]
[-2.57288510e-17 -3.60203914e-17 1.02915404e-17]
[-2.44785120e-17 -3.33797891e-17 8.90127708e-18]
[-1.27252940e-28 -1.69670587e-28 4.24176466e-29]
[ 2.24744956e-56 2.93897250e-56 -6.91522941e-57]
[-2.12580678e-17 -2.73318015e-17 6.07373366e-18]
[-2.03436865e-17 -2.57686696e-17 5.42498307e-18]
[-3.89960988e-17 -4.87451235e-17 9.74902470e-18]
[-1.87148635e-17 -2.31183608e-17 4.40349730e-18]
[-7.19531738e-17 -8.79427680e-17 1.59895942e-17]]

time library not working with PySimpleGUI

here is the code:
import PySimpleGUI as pg
import time as tm
import numpy as np
num = 0
layout = [
[pg.Text('0', key='-txt')],
[pg.Button('Start')],
]
timer_nums = np.arange(0.1, 1, .1)
win = pg.Window('Stopwatch', layout)
while True:
e, v = win.read()
if e == pg.WIN_CLOSED:
break
if e == 'Start':
tm.sleep(.14)
for i in timer_nums:
win['-txt'].update(round(i, 3))
print(round(i, 3))
tm.sleep(.1)
win.close()
So I'm trying to use the tm.sleep() to delay the count on a stopwatch. But each time i run the program the gui freezes, so i have the print function in so i can see the console telling me it's working right but the gui won't update txt until the for loop finishes. Which kinda defeats the purpose of the program lol. Please help

Possible tensorflow cholesky_solve inconsistency?

I am trying to solve a linear system of equations using tensorflow.cholesky_solve and I'm getting some unexpected results.
I wrote a script to compare the output of a very simple linear system with simple matrix inversion a la tensorflow.matrix_inverse, the non-cholesky based matrix equation solver tensorflow.matrix_solve, and tensorflow.cholesky_solve.
According to my understanding of the docs I've linked, these three cases should all yield a solution of the identity matrix divided by 2, but this is not the case for tensorflow.cholesky_solve. Perhaps I'm misunderstanding the docs?
import tensorflow as tf
I = tf.eye(2, dtype=tf.float32)
X = 2 * tf.eye(2, dtype=tf.float32)
X_inv = tf.matrix_inverse(X)
X_solve = tf.matrix_solve(X, I)
X_chol_solve = tf.cholesky_solve(tf.cholesky(X), I)
with tf.Session() as sess:
for x in [X_inv, X_solve, X_chol_solve]:
print('{}:\n{}'.format(x.name, sess.run(x)))
print
yielding output:
MatrixInverse:0:
[[ 0.5 0. ]
[ 0. 0.5]]
MatrixSolve:0:
[[ 0.5 0. ]
[ 0. 0.5]]
cholesky_solve/MatrixTriangularSolve_1:0:
[[ 1. 0.]
[ 0. 1.]]
Process finished with exit code 0
I think it's a bug. Notice how the result doesn't even depend on the RHS, unless RHS = 0, in which case you get nan instead of 0. Please report it on GitHub.

Multiprocessing shared numpy array

I need to share numpy array between Processes, to store in it some results. Im not quite sure if what I have done so far is correct. This is my simplified code.
from multiprocessing import Process, Lock, Array
import numpy as np
def worker(shared,lock):
numpy_arr = np.frombuffer(shared.get_obj())
# do some work ...
with lock:
for i in range(10):
numpy_arr[0] += 1
numpy_arr += 1
return
if __name__ == '__main__':
jobs = []
lock = Lock()
shared_array = Array('d', 1000000)
for process in range(4):
p = Process(target=worker, args=(shared_array,lock))
jobs.append(p)
p.start()
for process in jobs:
process.join()
m = np.frombuffer(shared_array.get_obj())
np.save('data', m)
print (m[:5])
From this code i obtain expected results, but again, Im not sure if this is the correct way. And finally, what is the diffrence between multiprocessing.Array and multiprocessing.sharedctypes.Array ?

apply generic function in a vectorized fashion using numpy/pandas

I am trying to vectorize my code and, thanks in large part to some users (https://stackoverflow.com/users/3293881/divakar, https://stackoverflow.com/users/625914/behzad-nouri), I was able to make huge progress. Essentially, I am trying to apply a generic function (in this case max_dd_array_ret) to each of the bins I found (see vectorize complex slicing with pandas dataframe for details on date vectorization and Start, End and Duration of Maximum Drawdown in Python for the rationale behind max_dd_array_ret). the problem is the following: I should be able to obtain the result df_2 and, to some degree, ranged_DD(asd_1.values, starts, ends+1) is what I am looking for, except for the tragic effect that it's as if the first two bins are merged and the last one is missing as it can be gauged by looking at the results.
any explanation and fix is very welcomed
import pandas as pd
import numpy as np
from time import time
from scipy.stats import binned_statistic
def max_dd_array_ret(xs):
xs = (xs+1).cumprod()
i = np.argmax(np.maximum.accumulate(xs) - xs) # end of the period
j = np.argmax(xs[:i])
max_dd = abs(xs[j]/xs[i] -1)
return max_dd if max_dd is not None else 0
def get_ranges_arr(starts,ends):
# Taken from https://stackoverflow.com/a/37626057/3293881
counts = ends - starts
counts_csum = counts.cumsum()
id_arr = np.ones(counts_csum[-1],dtype=int)
id_arr[0] = starts[0]
id_arr[counts_csum[:-1]] = starts[1:] - ends[:-1] + 1
return id_arr.cumsum()
def ranged_DD(arr,starts,ends):
# Get all indices and the IDs corresponding to same groups
idx = get_ranges_arr(starts,ends)
id_arr = np.repeat(np.arange(starts.size),ends-starts)
slice_arr = arr[idx]
return binned_statistic(id_arr, slice_arr, statistic=max_dd_array_ret)[0]
asd_1 = pd.Series(0.01 * np.random.randn(500), index=pd.date_range('2011-1-1', periods=500)).pct_change()
index_1 = pd.to_datetime(['2011-2-2', '2011-4-3', '2011-5-1','2011-7-2', '2011-8-3', '2011-9-1','2011-10-2', '2011-11-3', '2011-12-1','2012-1-2', '2012-2-3', '2012-3-1',])
index_2 = pd.to_datetime(['2011-2-15', '2011-4-16', '2011-5-17','2011-7-17', '2011-8-17', '2011-9-17','2011-10-17', '2011-11-17', '2011-12-17','2012-1-17', '2012-2-17', '2012-3-17',])
starts = asd_1.index.searchsorted(index_1)
ends = asd_1.index.searchsorted(index_2)
df_2 = pd.DataFrame([max_dd_array_ret(asd_1.loc[i:j]) for i, j in zip(index_1, index_2)], index=index_1)
print(df_2[0].values)
print(ranged_DD(asd_1.values, starts, ends+1))
results:
df_2
[ 1.75893509 6.08002911 2.60131797 1.55631781 1.8770067 2.50709085
1.43863472 1.85322338 1.84767224 1.32605754 1.48688414 5.44786663]
ranged_DD(asd_1.values, starts, ends+1)
[ 6.08002911 2.60131797 1.55631781 1.8770067 2.50709085 1.43863472
1.85322338 1.84767224 1.32605754 1.48688414]
which are identical except for the first two:
[ 1.75893509 6.08002911 vs [ 6.08002911
and the last two
1.48688414 5.44786663] vs 1.48688414]
p.s.:while looking in more detail at the docs (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html) I found that this might be the problem
"All but the last (righthand-most) bin is half-open. In other words,
if bins is [1, 2, 3, 4], then the first bin is [1, 2) (including 1,
but excluding 2) and the second [2, 3). The last bin, however, is [3,
4], which includes 4. New in version 0.11.0."
problem is I don't how to reset it.