Converting recursive solution to dynamic programming - time-complexity

Problem statement: find the number of "vowel only" strings that can be made from a given sequence of Morse code (the entire string must be used)
I have this current recursive solution. I want to speed up this algorithm to run in O(n) time. I know that I can define my array as S[j] = the maximum number of unique strings that can be created with access from 1 ... j. But I don't know where to go from there.
morsedict = {'A': '.-',
'E': '.',
'I': '..',
'O': '---',
'U': '..-'}
maxcombinations = 0
def countCombinations(codelist):
if len(codelist) is 0:
global maxcombinations
maxcombinations += 1
return
if codelist[0] in morsedict.values():
countCombinations(codelist[1:])
if len(codelist) >= 2 and codelist[:2] in morsedict.values():
countCombinations(codelist[2:])
if len(codelist) >= 3 and codelist[:3] in morsedict.values():
countCombinations(codelist[3:])
return

For future researchers here is the solution for conversion to a DP problem:
morsedict = {'A': '.-',
'E': '.',
'I': '..',
'O': '---',
'U': '..-'}
def countcombinations(codelist):
# Generate the DP array to match the size of the codeword
maxcombinations = [0] * (len(codelist))
# How many unique strings can I create with access to j elements: j = current index
# j = 0: access to nothing (1 because we need somewhere to start)
maxcombinations[0] = 1
# Brute force calculate the first case due to its simplicity
if codelist[0: 2] in morsedict.values():
maxcombinations[1] = 1
else:
maxcombinations[1] = 0
# For the rest of the indices, we look back in the DP array to see how good we can do given a certain length string
for i in range(1, len(codelist)):
firststr = codelist[i]
secondstr = codelist[(i - 1): i + 1]
thirdstr = codelist[(i - 2): i + 1]
if len(firststr) is 1 and firststr in morsedict.values():
maxcombinations[i] += maxcombinations[i - 1]
if len(secondstr) is 2 and secondstr in morsedict.values():
maxcombinations[i] += maxcombinations[i - 2]
if len(thirdstr) is 3 and thirdstr in morsedict.values():
maxcombinations[i] += maxcombinations[i - 3]
print(maxcombinations[-1])
if __name__ == "__main__":
input()
codelist = input()
countcombinations(codelist)

Related

Python 3 vectorizing nested for loop where inner loop depends on parameter

In the geosciensces while porting code from Fortran to python I see variations of these nested for loops(sometimes double nested and sometimes triple nested) that I would like to vectorize(shown here as an minimum reproducible example)
import numpy as np
import sys
import math
def main():
t = np.arange(0,300)
n1=7
tc = test(n1,t)
def test(n1,t):
n2 = int(2*t.size/(n1+1))
print(n2)
tChunked = np.zeros(shape = (n1,n2))
for i in range(0,n1):
istart = int(i*n2/2)
for j in range(0,n2):
tChunked[i,j] = t[istart+j]
return tChunked
main()
What have I tried ?
I have gotten as far as elminating the istart and getting j and using outer addition to get istart+j. But how do I use the index k to get a 2d tChunked array in a single line is where I am stuck.
istart = np.linspace(0,math.ceil(n1*n2/2),num=n1,endpoint=False,dtype=np.int32)
jstart = np.linspace(0,n2,num=n2,endpoint=False,dtype=np.int32)
k = jstart[:,np.newaxis]+istart
numpy will output a 2D array if the index is 2D. So you simply do this.
def test2(n1, t):
n2 = int(2 * t.size / (n1 + 1))
istart = np.linspace(0, math.ceil(n1 * n2 / 2), num=n1, endpoint=False, dtype=np.int32)
jstart = np.linspace(0, n2, num=n2, endpoint=False, dtype=np.int32)
k = istart[:, np.newaxis] + jstart # Note: I switched i and j.
tChunked = t[k] # This creates an array of the same shape as k.
return tChunked
If you have to deal with a lot of nested loops, maybe the solution is to use numba as it can result in better performance than native numpy. Specially for non-python functions like the one you showed.
As easy as:
from numba import njit
#njit
def test(n1,t):
n2 = int(2*t.size/(n1+1))
print(n2)
tChunked = np.zeros(shape = (n1,n2))
for i in range(0,n1):
istart = int(i*n2/2)
for j in range(0,n2):
tChunked[i,j] = t[istart+j]
What made me fall head-over-heels in love with Python was actually NumPy and specifically its amazing indexing and indexing routines!
In test_extra_crispy() we can use zip() to get our ducks (initial conditions) in a row, then indexing using offsets to do the "transplanting" of blocks of values:
i_values = np.arange(7)
istarts = (i_values * n2 / 2).astype(int)
for i, istart in zip(i_values, istarts):
tChunked[i, :n2] = t[istart:istart+n2]
See also
How to np.roll() faster? (and all the linked questions on that page
Best way to insert values of 3D array inside of another larger array (for higher dimensions)
We can see that for
t = np.arange(10000000)
n1 = 7
"extra crispy" is a lot faster than the original (91 vs 4246 ms), but only a little faster than test2() from ken's answer which is not significant considering that it does more careful checking than my brute force treatment.
However "extra crispy" avoids adding an additional axis and expanding the memory usage. For modest arrays this doesn't matter at all. But if you have a huge array then disk space and access time becomes a real problem.
"Just use Jit"?
Zaero Divide's answer points out that #jit goes a long way towards converting impractical python nested loops to fast compiled do loops.
But always check out what methods are already available in NumPy which are already running efficiently with compiled code. One can easily get a gigaflop on a laptop for problems amenable to NumPy's many handy methods.
In my case test_jit() ran four times slower than "extra crispy" (350 vs 91 ms)!
You shouldn't write script that depends on #jit to run fast if you don't need to; not everyone wants to "just use #jit".
Strangely shaped indexing
If you need to address a more random-shaped volume within an array, you can use indexing like this:
array = np.array([[0, 0, 1, 0, 0], [0, 1, 0, 1, 0], [1, 0, 0, 0, 1], [0, 1, 0, 1, 0], [0, 0, 1, 0, 0]])
print(array)
gives
[[0 0 1 0 0]
[0 1 0 1 0]
[1 0 0 0 1]
[0 1 0 1 0]
[0 0 1 0 0]]
and we can get indices for the 1's like this:
i, j = np.where(array == 1)
print(i)
print(j)
which gives
If we want to start with a zeroed array and insert those 1's via numpy indexing, just do this
array = np.zeros((5, 5), dtype=int)
array[i, j] = 1
which recreates the original array.
import numpy as np
import matplotlib.pyplot as plt
import time
def test_original(n1, t):
n2 = int(2*t.size / (n1 + 1))
tChunked = np.zeros(shape = (n1, n2))
for i in range(n1):
istart = int(i * n2 / 2)
for j in range(0, n2):
tChunked[i, j] = t[istart + j]
return tChunked
t = np.arange(10000000)
n1 = 7
t_start = time.process_time()
tc_original = test_original(n1, t)
print('original process time (ms)', round(1000*(time.process_time() - t_start), 3))
# print('tc_original.shape: ', tc_original.shape)
fig, ax = plt.subplots(1, 1)
for thing in tc_original:
ax.plot(thing)
plt.show()
def test_extra_crispy(n1, t):
n2 = int(2*t.size / (n1 + 1))
tChunked = np.zeros(shape = (n1, n2))
i_values = np.arange(7)
istarts = (i_values * n2 / 2).astype(int)
for i, istart in zip(i_values, istarts):
tChunked[i, :n2] = t[istart:istart+n2]
return tChunked
t_start = time.process_time()
tc_extra_crispy = test_extra_crispy(n1, t)
print('extra crispy process time (ms)', round(1000*(time.process_time() - t_start), 3))
# print('tc_extra_crispy.shape: ', tc_extra_crispy.shape)
print('np.all(tc_extra_crispy == tc_original): ', np.all(tc_extra_crispy == tc_original))
import math
def test2(n1, t): # https://stackoverflow.com/a/72492815/3904031
n2 = int(2 * t.size / (n1 + 1))
istart = np.linspace(0, math.ceil(n1 * n2 / 2), num=n1, endpoint=False, dtype=np.int32)
jstart = np.linspace(0, n2, num=n2, endpoint=False, dtype=np.int32)
k = istart[:, np.newaxis] + jstart # Note: I switched i and j.
tChunked = t[k] # This creates an array of the same shape as k.
return tChunked
t_start = time.process_time()
tc_test2 = test2(n1, t)
print('test2 process time (ms)', round(1000*(time.process_time() - t_start), 3))
# print('tc_test2.shape: ', tc_test2.shape)
print('np.all(tc_test2 == tc_original): ', np.all(tc_test2 == tc_original))
# from https://stackoverflow.com/a/72494777/3904031
from numba import njit
#njit
def test_jit(n1,t):
n2 = int(2*t.size/(n1+1))
print(n2)
tChunked = np.zeros(shape = (n1,n2))
for i in range(0,n1):
istart = int(i*n2/2)
for j in range(0,n2):
tChunked[i,j] = t[istart+j]
return tChunked
t_start = time.process_time()
tc_jit = test_jit(n1, t)
print('jit process time (ms)', round(1000*(time.process_time() - t_start), 3))
# print('tc_jit.shape: ', tc_jit.shape)
print('np.all(tc_jit == tc_original): ', np.all(tc_jit == tc_original))

How to stop the iteration when the Jacobian reached to an arbitrary (small) value in Newton-CG method?

How to put a stopping condition on jacobian (or gradient) for Newton-CG methode?
I want the algorithme to stop when the jacobian reaches to 1e-2, is it possible to do with Newton-CG ??
input:
scipy.optimize.minimize(f, [5.0,1.0,2.0,5.0], args=Data, method='Newton-CG',jac=Jacf)
output:
jac: array([7.64265411e-08, 1.74985718e-08, 4.12408407e-07, 5.02972841e-08])
message: 'Optimization terminated successfully.'
nfev: 12
nhev: 0
nit: 11
njev: 68
status: 0
success: True
x: array([0.22545395, 0.3480084 , 1.06811724, 1.64873479])
in BFGS method, which is symilar to Newton-CG, there is a gtol option, it allows to stop the iteration when the gradient reaches to some value. But in Newton-CG theres no that type of option.
Does anyone know how to stop the iteration when the jacobien reaches to 1e-2.
Here are some details to reproduce my code:
def convert_line2matrix(a):
n = len(a)
if (np.sqrt(n) % 1 == 0) :
d = int(np.sqrt(n))
Mat = np.zeros((d,d))
for i in range(d):
for j in range(d):
Mat[i,j] = a[j+d*i]
else:
raise ValueError(f"{a} cant be converted into a (n x n) matrix. The array has {len(a)} elements, \n\t thus impossible to build a square matrix with {len(a)} elements.")
return Mat
def convert_matrix2line(Matrix):
result = []
dim = len(Matrix)
for i in range(dim):
for j in range(dim):
result.append(Matrix[i,j])
return np.array(result)
my_data = np.array([[0.21530249, 0.32450331, 0 ],
[0.1930605 , 0.31788079, 0 ],
[0.17793594, 0.31788079, 0 ],
[0.16459075, 0.31125828, 1 ],
[0.24822064, 0.31125828, 0 ],
[0.28647687, 0.32450331, 0 ],
[0.32829181, 0.31788079, 0 ],
[0.38879004, 0.32450331, 0 ],
[0.42882562, 0.32450331, 0 ],
[0.47419929, 0.32450331, 0 ],
[0.5044484 , 0.32450331, 0 ],
[0.1797153 , 0.31125828, 0 ],
[0.16548043, 0.31125828, 1 ],
[0.17793594, 0.29801325, 1 ],
[0.1930605 , 0.31788079, 0 ]])
Data = pd.DataFrame(my_data, columns=['X_1','X_2', 'Allum'])
def logLB(params,Data):
B = convert_line2matrix(params)
X = np.array(Data.iloc[:,:len(B)])
Y = np.array(Data.iloc[:,len(B)])
result = 0
n = len(Data)
BB = np.transpose(B) # B
for i in range(n):
if(1-np.exp(-X[i].T # BB # X[i]) > 0):
result += Y[i]*(-np.transpose(X[i]) # BB # X[i]) + (1 - Y[i])*np.log(1-np.exp(-X[i].T # BB # X[i]))
return result
def f(params, Data):
return -logLB(params, Data)
def dlogLB(params, Data):
B = convert_line2matrix(params)
X = np.array(Data.iloc[:,:len(B)])
Y = np.array(Data.iloc[:,len(B)])
BB = B.T # B
N = len(Data)
M = len(B)
Jacobian = np.zeros(np.shape(B))
for n in range(len(B)):
for m in range(len(B)):
result = 0
for c in range(N):
som = 0
for i in range(M):
som += X[c,m]*B[n,i]*X[c,i]
if (1 - np.exp(-X[c].T # BB # X[c]) > 0):
result += -2*Y[c]*som + (1-Y[c])*np.exp(-X[c].T # BB # X[c])*(2*som)/(1 - np.exp(-X[c].T # BB # X[c]))
Jacobian[n,m] = result
return convert_matrix2line(Jacobian)
def Jacf(params, Data):
return -dlogLB(params, Data)
I assume that you want to stop the optimizer as soon as the euclidian norm of the gradient reaches a specific value, which is exactly the meaning of the BFGS method's gtol option. Otherwise, it doesn't make any sense mathematically, since the evaluated gradient is a vector and thus can't be compared to a scalar value.
The Newton-CG method doesn't provide a similar option. However, you could use a simple callback that is called after each iteration and terminates the algorithm when the callback returns True. Unfortunately, you can only terminate the optimizer by a callback with the trust-constr method. For all other methods, the callback's return value is ignored, so it's very limited.
A possible hacky and ugly way to terminate the optimizer by the callback anyway would be raising an exception:
import numpy as np
from scipy.optimize import minimize
class Callback:
def __init__(self, eps, args, jac):
self.eps = eps
self.args = args
self.jac = jac
self.x = None
self.gtol = None
def __call__(self, xk):
self.x = xk
self.gtol = np.linalg.norm(self.jac(xk, *self.args))
if self.gtol <= self.eps:
raise Exception("Gradient norm is below threshold")
Here, xk is the current iterate, eps your desired tolerance, args a tuple containing your optional objective und gradient arguments and jac the gradient. Then, you can use it like this:
from scipy.optimize import minimize
cb = Callback(1.0e-1, (Data,), Jacf)
try:
res = minimize(f, [5.0,1.0,2.0,5.0], args=Data, method='Newton-CG',
jac=Jacf, callback=cb)
except:
x = cb.x
gtol = cb.gtol
print(f"gtol = {gtol:E}, x = {x}")
which yields
gtol = 5.515263E-02, x = [14.43322108 -5.18163542 0.22582261 -0.04859385]

Finding those elements in an array which are "close"

I have an 1 dimensional sorted array and would like to find all pairs of elements whose difference is no larger than 5.
A naive approach would to be to make N^2 comparisons doing something like
diffs = np.tile(x, (x.size,1) ) - x[:, np.newaxis]
D = np.logical_and(diffs>0, diffs<5)
indicies = np.argwhere(D)
Note here that the output of my example are indices of x. If I wanted the values of x which satisfy the criteria, I could do x[indicies].
This works for smaller arrays, but not arrays of the size with which I work.
An idea I had was to find where there are gaps larger than 5 between consecutive elements. I would split the array into two pieces, and compare all the elements in each piece.
Is this a more efficient way of finding elements which satisfy my criteria? How could I go about writing this?
Here is a small example:
x = np.array([ 9, 12,
21,
36, 39, 44, 46, 47,
58,
64, 65,])
the result should look like
array([[ 0, 1],
[ 3, 4],
[ 5, 6],
[ 5, 7],
[ 6, 7],
[ 9, 10]], dtype=int64)
Here is a solution that iterates over offsets while shrinking the set of candidates until there are none left:
import numpy as np
def f_pp(A, maxgap):
d0 = np.diff(A)
d = d0.copy()
IDX = []
k = 1
idx, = np.where(d <= maxgap)
vidx = idx[d[idx] > 0]
while vidx.size:
IDX.append(vidx[:, None] + (0, k))
if idx[-1] + k + 1 == A.size:
idx = idx[:-1]
d[idx] = d[idx] + d0[idx+k]
k += 1
idx = idx[d[idx] <= maxgap]
vidx = idx[d[idx] > 0]
return np.concatenate(IDX, axis=0)
data = np.cumsum(np.random.exponential(size=10000)).repeat(np.random.randint(1, 20, (10000,)))
pairs = f_pp(data, 1)
#pairs = set(map(tuple, pairs))
from timeit import timeit
kwds = dict(globals=globals(), number=100)
print(data.size, 'points', pairs.shape[0], 'close pairs')
print('pp', timeit("f_pp(data, 1)", **kwds)*10, 'ms')
Sample run:
99963 points 1020651 close pairs
pp 43.00256529124454 ms
Your idea of slicing the array is a very efficient approach. Since your data are sorted you can just calculate the difference and split it:
d=np.diff(x)
ind=np.where(d>5)[0]
pieces=np.split(x,ind)
Here pieces is a list, where you can then use in a loop with your own code on every element.
The best algorithm is highly dependent on the nature of your data which I'm unaware. For example another possibility is to write a nested loop:
pairs=[]
for i in range(x.size):
j=i+1
while x[j]-x[i]<=5 and j<x.size:
pairs.append([i,j])
j+=1
If you want it to be more clever, you can edit the outer loop in a way to jump when j hits a gap.

solving a sparse non linear system of equations using scipy.optimize.root

I want to solve the following non-linear system of equations.
Notes
the dot between a_k and x represents dot product.
the 0 in the first equation represents 0 vector and 0 in the second equation is scaler 0
all the matrices are sparse if that matters.
Known
K is an n x n (positive definite) matrix
each A_k is a known (symmetric) matrix
each a_k is a known n x 1 vector
N is known (let's say N = 50). But I need a method where I can easily change N.
Unknown (trying to solve for)
x is an n x 1 a vector.
each alpha_k for 1 <= k <= N a scaler
My thinking.
I am thinking of using scipy root to find x and each alpha_k. We essentially have n equations from each row of the first equation and another N equations from the constraint equations to solve for our n + N variables. Therefore we have the required number of equations to have a solution.
I also have a reliable initial guess for x and the alpha_k's.
Toy example.
n = 4
N = 2
K = np.matrix([[0.5, 0, 0, 0], [0, 1, 0, 0],[0,0,1,0], [0,0,0,0.5]])
A_1 = np.matrix([[0.98,0,0.46,0.80],[0,0,0.56,0],[0.93,0.82,0,0.27],[0,0,0,0.23]])
A_2 = np.matrix([[0.23, 0,0,0],[0.03,0.01,0,0],[0,0.32,0,0],[0.62,0,0,0.45]])
a_1 = np.matrix(scipy.rand(4,1))
a_2 = np.matrix(scipy.rand(4,1))
We are trying to solve for
x = [x1, x2, x3, x4] and alpha_1, alpha_2
Questions:
I can actually brute force this toy problem and feed it to the solver. But how do I do I solve this toy problem in such a way that I can extend it easily to the case when I have let's say n=50 and N=50
I will probably have to explicitly compute the Jacobian for larger matrices??.
Can anyone give me any pointers?
I think the scipy.optimize.root approach holds water, but steering clear of the trivial solution might be the real challenge for this system of equations.
In any event, this function uses root to solve the system of equations.
def solver(x0, alpha0, K, A, a):
'''
x0 - nx1 numpy array. Initial guess on x.
alpha0 - nx1 numpy array. Initial guess on alpha.
K - nxn numpy.array.
A - Length N List of nxn numpy.arrays.
a - Length N list of nx1 numpy.arrays.
'''
# Establish the function that produces the rhs of the system of equations.
n = K.shape[0]
N = len(A)
def lhs(x_alpha):
'''
x_alpha is a concatenation of x and alpha.
'''
x = np.ravel(x_alpha[:n])
alpha = np.ravel(x_alpha[n:])
lhs_top = np.ravel(K.dot(x))
for k in xrange(N):
lhs_top += alpha[k]*(np.ravel(np.dot(A[k], x)) + np.ravel(a[k]))
lhs_bottom = [0.5*x.dot(np.ravel(A[k].dot(x))) + np.ravel(a[k]).dot(x)
for k in xrange(N)]
lhs = np.array(lhs_top.tolist() + lhs_bottom)
return lhs
# Solve the system of equations.
x0.shape = (n, 1)
alpha0.shape = (N, 1)
x_alpha_0 = np.vstack((x0, alpha0))
sol = root(lhs, x_alpha_0)
x_alpha_root = sol['x']
# Compute norm of residual.
res = sol['fun']
res_norm = np.linalg.norm(res)
# Break out the x and alpha components.
x_root = x_alpha_root[:n]
alpha_root = x_alpha_root[n:]
return x_root, alpha_root, res_norm
Running on the toy example, however, only produces the trivial solution.
# Toy example.
n = 4
N = 2
K = np.matrix([[0.5, 0, 0, 0], [0, 1, 0, 0],[0,0,1,0], [0,0,0,0.5]])
A_1 = np.matrix([[0.98,0,0.46,0.80],[0,0,0.56,0],[0.93,0.82,0,0.27],
[0,0,0,0.23]])
A_2 = np.matrix([[0.23, 0,0,0],[0.03,0.01,0,0],[0,0.32,0,0],
[0.62,0,0,0.45]])
a_1 = np.matrix(scipy.rand(4,1))
a_2 = np.matrix(scipy.rand(4,1))
A = [A_1, A_2]
a = [a_1, a_2]
x0 = scipy.rand(n, 1)
alpha0 = scipy.rand(N, 1)
print 'x0 =', x0
print 'alpha0 =', alpha0
x_root, alpha_root, res_norm = solver(x0, alpha0, K, A, a)
print 'x_root =', x_root
print 'alpha_root =', alpha_root
print 'res_norm =', res_norm
Output is
x0 = [[ 0.00764503]
[ 0.08058471]
[ 0.88300129]
[ 0.85299622]]
alpha0 = [[ 0.67872815]
[ 0.69693346]]
x_root = [ 9.88131292e-324 -4.94065646e-324 0.00000000e+000
0.00000000e+000]
alpha_root = [ -4.94065646e-324 0.00000000e+000]
res_norm = 0.0

Odd-size numpy arrays send/receive

I would like to gather numpy array contents from all processors to one. In case all arrays are of the same size, it works. However I don't see a natural way of doing the same task for arrays of proc-dependent size. Please consider the following code:
from mpi4py import MPI
import numpy
comm = MPI.COMM_WORLD
rank = comm.rank
size = comm.size
if rank >= size/2:
nb_elts = 5
else:
nb_elts = 2
# create data
lst = []
for i in xrange(nb_elts):
lst.append(rank*3+i)
array_lst = numpy.array(lst, dtype=int)
# communicate array
result = []
if rank == 0:
result = array_lst
for p in xrange(1, size):
received = numpy.empty(nb_elts, dtype=numpy.int)
comm.Recv(received, p, tag=13)
result = numpy.concatenate([result, received])
else:
comm.Send(array_lst, 0, tag=13)
My problem is at the "received" allocation. How can I know what is the size to be allocated? Do I have to first send/receive each array size?
Based on a suggestion below, I'll go with
data_array = numpy.ones(rank + 3, dtype=int)
data_array *= rank + 5
print '[{}] data: {} ({})'.format(rank, data_array, type(data_array))
# make all processors aware of data array sizes
all_sizes = {rank: data_array.size}
gathered_all_sizes = comm_py.allgather(all_sizes)
for d in gathered_all_sizes:
all_sizes.update(d)
# prepare Gatherv as described by #francis
nbsum = 0
sendcounts = []
displacements = []
for p in xrange(size):
n = all_sizes[p]
displacements.append(nbsum)
sendcounts.append(n)
nbsum += n
if rank==0:
result = numpy.empty(nbsum, dtype=numpy.int)
else:
result = None
comm_py.Gatherv(data_array,[result, tuple(sendcounts), tuple(displacements), MPI.INT64_T], root=0)
print '[{}] gathered data: {}'.format(rank, result)
In the code you pasted, both Send() and Recv() sends nb_elts elements. The problem is that nb_elts is not the same for every processes... Hence, the number of item received does not match the number of elements that were sent and the program complains:
mpi4py.MPI.Exception: MPI_ERR_TRUNCATE: message truncated
To prevent that, the root process must compute the number of items that the other processes have sent. Hence, in the loop for p in xrange(1, size), nb_elts must be computed according to p, not rank.
The following code based on yours has been corrected. I would add that the natural way to perform this gathering operation is to use Gatherv(). See http://materials.jeremybejarano.com/MPIwithPython/collectiveCom.html and the documentation of mpi4py for instance. I added the corresponding sample code. The only tricky point is that numpy.int is 64bit long. Hence, the Gatherv() uses the MPI type MPI_DOUBLE.
from mpi4py import MPI
import numpy
comm = MPI.COMM_WORLD
rank = comm.rank
size = comm.size
if rank >= size/2:
nb_elts = 5
else:
nb_elts = 2
# create data
lst = []
for i in xrange(nb_elts):
lst.append(rank*3+i)
array_lst = numpy.array(lst, dtype=int)
# communicate array
result = []
if rank == 0:
result = array_lst
for p in xrange(1, size):
if p >= size/2:
nb_elts = 5
else:
nb_elts = 2
received = numpy.empty(nb_elts, dtype=numpy.int)
comm.Recv(received, p, tag=13)
result = numpy.concatenate([result, received])
else:
comm.Send(array_lst, 0, tag=13)
if rank==0:
print "Send Recv, result= "+str(result)
#How to use Gatherv:
nbsum=0
sendcounts=[]
displacements=[]
for p in xrange(0,size):
displacements.append(nbsum)
if p >= size/2:
nbsum+= 5
sendcounts.append(5)
else:
nbsum+= 2
sendcounts.append(2)
if rank==0:
print "nbsum "+str(nbsum)
print "sendcounts "+str(tuple(sendcounts))
print "displacements "+str(tuple(displacements))
print "rank "+str(rank)+" array_lst "+str(array_lst)
print "numpy.int "+str(numpy.dtype(numpy.int))+" "+str(numpy.dtype(numpy.int).itemsize)+" "+str(numpy.dtype(numpy.int).name)
if rank==0:
result2=numpy.empty(nbsum, dtype=numpy.int)
else:
result2=None
comm.Gatherv(array_lst,[result2,tuple(sendcounts),tuple(displacements),MPI.DOUBLE],root=0)
if rank==0:
print "Gatherv, result2= "+str(result2)