random.multivariate_normal on a dask array? - numpy

I've been struggling to find a way to get this calc that works for a dask workflow.
I have code that uses np.random.mulivariate_normal function and while many of these types are available to us on dask array it seems this one it not. Sooo.... I attempted to create my own based on an example provided in the dask documentation.
Here is my attempt which is giving errors that I am having difficulty understanding. I also provided random input variables to make it easy to replicate:
import numpy as np
from dask.distributed import Client
import dask.array as da
def mvn(mu, sigma, n, blocksize):
chunks = ((blocksize,) * (n // blocksize),
(blocksize,) * (n // blocksize))
name = 'mvn' # unique identifier
dsk = {(name, i, j): (np.random.multivariate_normal(mu,sigma, blocksize))
if i == j else
(np.zeros, (blocksize, blocksize))
for i in range(n // blocksize)
for j in range(n // blocksize)}
dtype = np.random.multivariate_normal(0).dtype # take dtype default from numpy
return da.Array(dsk, name, chunks, dtype)
n = 10000
A = da.random.normal(0, 1, size=(n,n), chunks=(1000, 1000))
sigma = da.dot(A,A.transpose())
mu = 4.0*da.ones(n, chunks = 1000)
R = da.numpy.random.mvn(mu, sigma, n, chunks=(100))
Any suggestions or am I so far off the mark here that I should abandon all hope? Thanks!

If you have a cluster to run this on, you can use my answer from this post, copied here for refrence:
An work arround for now, is to use a cholesky decomposition. Note that any covariance matrix C can be expressed as C=G*G'. It then follows that x = G'*y is correlated as specified in C if y is standard normal (see this excellent post on StackExchange Mathematic). In code:
Numpy
n_dim =4
size = 100000
A = np.random.randn(n_dim, n_dim)
covm = A.dot(A.T)
x= np.random.multivariate_normal(size=size, mean=np.zeros(len(covm)),cov=covm)
## verify numpys covariance is correct
np.cov(x, rowvar=False)
covm
Dask
## create covariance matrix
A = da.random.standard_normal(size=(n_dim, n_dim),chunks=(2,2))
covm = A.dot(A.T)
## get cholesky decomp
L = da.linalg.cholesky(covm, lower=True)
## drawn standard normal
sn= da.random.standard_normal(size=(size, n_dim),chunks=(100,100))
## correct for correlation
x =L.dot(sn.T)
x.shape
## verify
covm.compute()
da.cov(x, rowvar=True).compute()

This answer can be fleshed out, but I imagine you would have an easier time using dask's delayed, da.from_delayed and da.*stack.
One immediate problem I see with what you have: with np.random.multivariate_normal(mu,sigma, blocksize) you are directly calling the function, instead of making the spec. You probably wanted (np.random.multivariate_normal, mu,sigma, blocksize). This shows that working with raw dask dictionaries can be tricky!

Related

Is there a Rolling implementation of PCA in python?

For time-series analysis, it's useful to have rolling PCA functions to analyse how the dynamics of the time-series changes over time to avoid look-ahead bias.
We may want to answer the question: 'how many principle components are needed to keep 90% of the variance?'. The number of principle components to explain 90% variance may change over time, depending on the dynamics of the time-series.
In addition, we may want to reduce the number of components p in a given dataset to k < p on a rolling basis to more easily visualise the data.
While scikit has a PCA module, it does not support rolling calculations. Similarly with numpy SVD. We could use these packages in a manual for loop, but for large arrays (>10,000 rows) it would become very slow.
Is there a fast rolling implementation of PCA in python to address some of the questions above?
While I didn't manage to find a rolling implementation of PCA, it is a relatively straightforward matter to use the packages and tools mentioned in the question to code a manual rolling PCA function. In addition, we will use numba to gain a small speed-up, as it supports numpy.linalg.svd and numpy.linalg.eig.
The code in this answer is inspired by the excellent explanations of PCA here and here
import numpy as np
from numpy.linalg import eig
from numba import njit
import numpy.typing as npt
#njit
def rolling_pca(
arr: npt.NDArray[np.float64],
n_components: int,
window: int,
min_periods: int
) -> npt.NDArray[np.float64]:
"""Perform PCA on the covariance matrix of arr.
Return the lower dimensional array.
Data is assumed to have non-zero mean, so will be demeaned
in the process.
Args:
arr: Input data. Shape (n_samples, n_variables).
n_components: Number of components to reduce data matrix to.
Must be less than arr.shape[1].
window: Sliding window size.
min_periods: Minimum number of observations required to perform calculation.
Returns:
Reduced data matrix. Shape (n_samples, n_components)
"""
# create a copy to ensure we don't change data in place
arr_copy = arr.copy()
n = arr_copy.shape[0]
# create an empty array which will be populated with the output
reduced_out = np.full((n, n_components), np.nan)
# iterate over each row (timestamp in a timeseries)
for i in range(min_periods, n + 1):
if i < window:
lookback = i
else:
lookback = window
start_idx = i - lookback
curr_arr = arr_copy[start_idx: i, :]
# demean returns
curr_arr = curr_arr - (np.sum(curr_arr, axis=0) / lookback)
# calculate the covariance matrix
cov = (curr_arr.T # curr_arr) / (lookback - 1)
# get the eigenvectors
# sort eigvals to get top largest corresponding eigenvectors
evals, evecs = eig(cov)
idx = np.argsort(evals)[:n_components]
evecs = evecs[:, idx]
# multiply the top eigenvectors by the current array to get a reduced matrix
reduced = (evecs.T # curr_arr.T).T
reduced_out[start_idx: i, :] = reduced
return reduced_out
After profiling the code, the two slowest parts are, as expected, the calls to eig() and the matrix multiplication curr_arr.T # curr_arr. As the array curr_arr is limited by the window size, a pure numpy (no numba) implementation of matrix multiplcation is faster than using numba. This is because the arrays used in the matrix multiplication are small, and are not contiguous (see this post for more details). I didn't get around to resolving this issue, but if anyone has any suggestions, it would speed up this function quite a bit more.
I've compared the average timings between 3 implementations to see the effect of the speedup that numba offers. The 3 implementations are:
Manual for loop using numba, exactly as above
Manual for loop without numba, but otherwise same as the code above
Manual for loop using Sklearn instead of numpy eig, no numba (as numba does not support Sklearn)
Note that the following parameters are fixed so we get as fair a comparison as possible between implementations
Window size = 120
Minimum number of periods = 22
Input data number of variables = 20
Number of components to reduce to via PCA = 10
Number of iterations to time function over so as to get an average timing per implementation = 10
Only the row size (number of samples) is allowed to vary so we can visualise how execution time varies with array length.
We can see for large arrays (100k rows), we can decrease the time from about 14.19s using Sklearn, to 5.36s using numba, about a 2.6X speedup.
PCA Reconstruction
We implement some code similar to what we used above to reconstruct the original data matrix, using only the top principle components. Namely, we use SVD to decompose the matrix X into 3 matrices, U, S, and V^T. With these matrices, we can calculate how much variance is kept cumulatively by the components, and only keep the top k components that explain a desired amount of variance.
import numpy as np
from numpy.linalg import svd
from numba import njit
import numpy.typing as npt
#njit
def __get_num_top_components(
singular_values: npt.NDArray[np.float64],
threshold: float,
n: int,
) -> int:
"""Get the number of top eigen-components required by threshold.
Args:
singular_values: Singular values from SVD.
threshold: Minimum amount of explained variance to be kept.
n: Number of samples in data matrix.
Returns:
Required number of components to keep.
"""
evals = singular_values ** 2 / (n - 1)
evals = evals / np.sum(evals)
cumsum_evals = np.cumsum(evals)
top_k = np.argwhere(cumsum_evals > threshold).min()
return top_k
#njit
def rolling_pca_reconstruction(
arr: npt.NDArray[np.float64],
threshold: float,
window: int,
min_periods: int
) -> npt.NDArray[np.float64]:
"""Perform PCA on arr and return reconstructed matrix.
This method follows the logic succinctly outlined here:
https://stats.stackexchange.com/a/134283/178320
Args:
arr: Input data. Shape (n_samples, n_variables).
threshold: Minimum amount of explained variance to be kept.
Must be a number in (0., 1.).
window: Sliding window size.
min_periods: Minimum number of observations required to perform calculation.
Returns:
Reconstructed data matrix. Shape (n_samples, n_variables)
"""
arr_copy = arr.copy()
n = arr_copy.shape[0]
p = arr_copy.shape[1]
recon_out = np.full((n, p), np.nan)
for i in range(min_periods, n + 1):
if i < window:
lookback = i
else:
lookback = window
start_idx = i - lookback
curr_arr = arr_copy[start_idx: i, :]
# demean data
curr_arr = curr_arr - (np.sum(curr_arr, axis=0) / lookback)
# perform SVD on data, no need for full matrices, this is faster
u, s, vh = svd(curr_arr, full_matrices=False)
# calculate the number of components that explains threshold variance
top_k = __get_num_top_components(
singular_values=s,
threshold=threshold,
n=lookback,
)
# reconstruct the data matrix using the top_k components
tmp_recon = u[:, :top_k] # np.diag(s)[:top_k, :top_k] # vh[:, :top_k].T
recon_out[start_idx: i, :] = tmp_recon
return recon_out
The output of rolling_pca_reconstruction() is the reconstructed data, of same dimension as the input data arr. One useful modification that could be made to this code is to record top_k at each iteration, to understand how many components are needed to explain treshold variance over time.

How does GEKKO optimization with bounded variables work?

I am using GEKKO to estimate the parameters of a differential equation and I have bounded one of the variables between 0 and 1. However, when I solve the ODE, I get values outside of the bounds for this variable, so I was wondering if somebody knew how GEKKO finds the solution, as this might help me resolve the issue.
Here is the code I use to fit the data. This gives me a solution x and u where x is between 0 and 1.
However, afterwards, I try to solve the ODE using scipy.integrate.solve_ivp, with the initial value of u that I got, and the solution I get for u is not between this bounds. Since it should be unique, I am wondering what is the process that GEKKO follows to find the solution (does it proyect the values to the bound or how does it deal with this?) Any comment is very appreciated.
Here is an MVCE. If you run it you can see that with GEKKO, I get a solution between the bounds and then, when I solve the ODE with solve_ivp, I don't get the same solution. Can you explain why this happens and how can I deal with it? I want to use solve_ivp to predict the next values.
from scipy.integrate import solve_ivp
from gekko import GEKKO
import matplotlib.pyplot as plt
time=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357]
m = GEKKO(remote=False)
m.time= [0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357]
x_data= [0.0003777630481280617, 0.002024573836061331,\
0.0008954383363035536, 0.005331749410182463]
x = m.CV(value=x_data, lb=0); x.FSTATUS = 1 # fit to measurement
x.SPLO = 0
sigma = m.FV(value=0.5, lb= 0, ub=1); sigma.STATUS=1
d = m.Param(0.05)
k = m.Param(0.001)
b = m.Param(0.5)
r = m.FV(value=0.5, lb= 0); r.STATUS=1
m_param = m.Param(1)
u = m.Var(value=0.1, lb=0, ub=1)
m.free(u)
a = m.Param(0.999)
Kmax= m.Param(100000)
m.Equations([x.dt()==x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-\
m_param/(k+b*u)-d), u.dt() == \
sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
+2*a*(k**2)*r*u-b*m_param)/((b*u+k)**2))])
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.options.EV_TYPE = 1 # linear error (2 for squared)
m.solve(disp=False, debug=False) # display solver output
def model_case_3(t, z, r, k, b, Kmax, sigma):
m=1
a=0.999
x, u= z
dxdt = x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-m/(k+b*u)-0.05)
dudt = sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
+2*a*(k**2)*r*u-b*m)/((b*u+k)**2))
return [dxdt, dudt]
sol = solve_ivp(fun=model_case_3, t_span=[0.0, 0.2356902356902357],\
y0=[0.0003777630481280617, u.value[0]],\
t_eval=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357], \
args=(r.value[0], 0.001, 0.5,1000000 , sigma.value[0]))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,3), constrained_layout=True)
ax1.set_title('x')
ax1.plot(time, x.value, time, sol['y'][0])
ax2.set_title('u')
ax2.plot(time, u.value, time, sol['y'][1])
plt.show()
It is not an issue with the version of Gekko as I have Gekko 0.2.8, so I am wondering if it has anything to do with the initialization of variables. I run the example I posted on spyder (I was using google colab) and I got the correct solution, but when I run the rest of the cases I got again negative values for u (solving with solve_ivp), which is quite strange.
You can add a bound to a variable when it is created by setting lb (lower bound) and ub (upper bound).
z = m.Var(lb=0,ub=10)
After you create the variable, the bound is adjusted with .lower and .upper.
z.LOWER = 1
z.UPPER = 9
Here is an example problem that shows the use of bounds where x is constrained to be greater than 0.5.
from gekko import GEKKO
t_data = [0, 0.1, 0.2, 0.4, 0.8, 1]
x_data = [2.0, 1.6, 1.2, 0.7, 0.3, 0.15]
m = GEKKO(remote=False)
m.time = t_data
x = m.CV(value=x_data,lb=0.5,ub=3); x.FSTATUS = 1 # fit to measurement
k = m.FV(); k.STATUS = 1 # adjustable parameter
m.Equation(x.dt()== -k * x) # differential equation
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.solve(disp=False) # display solver output
k = k.value[0]; print(k)
A plot of the results shows that the bounds are enforced but the model prediction does not fit the data because of the lower bound constraint (x>=0.5).
import numpy as np
import matplotlib.pyplot as plt # plot solution
plt.plot(m.time,x.value,'bo',\
label='Predicted (k='+str(np.round(k,2))+')')
plt.plot(m.time,x_data,'rx',label='Measured')
# plot exact solution
t = np.linspace(0,1); xe = 2*np.exp(-k*t)
plt.plot(t,xe,'k:',label='Exact Solution')
plt.legend()
plt.xlabel('Time'), plt.ylabel('Value')
plt.show()
Without the restrictive lower bound, the solver optimizes to best fit the points.
x = m.CV(value=x_data,lb=0.0,ub=3)
Response 1 to Question Edit
The only way that a variable (such as u) is outside of the bounds is if the solver did not report a successful solution. To report a successful solution, the solver must satisfy the Karush Kuhn Tucker conditions for optimality. I recommend that you check that it satisfied all of the equations by checking that m.options.APPSTATUS==1 after the m.solve() command. If you can include an MVCE (https://stackoverflow.com/help/minimal-reproducible-example) that has sample data so the script can run, we can help you check it.
Response 2 to Question Edit
Thanks for including a minimal reproducible example. Here are the results that I get with Gekko 0.2.8. If you are using an earlier version, I recommend that you upgrade with pip install gekko --upgrade.
The solver reports a successful solution.
EXIT: Optimal Solution Found.
The solution was found.
The final value of the objective function is 0.03164650667928192
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 0.23339999999999997 sec
Objective : 0.0316473666078486
Successful solution
---------------------------------------------------
The constraints x>=0 and 0<=u<=1 are satisfied. Could it just be an issue with an older version of Gekko?

Locally weighted smoothing for binary valued random variable

I have a random variable as follows:
f(x) = 1 with probability g(x)
f(x) = 0 with probability 1-g(x)
where 0 < g(x) < 1.
Assume g(x) = x. Let's say I am observing this variable without knowing the function g and obtained 100 samples as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binned_statistic
list = np.ndarray(shape=(200,2))
g = np.random.rand(200)
for i in range(len(g)):
list[i] = (g[i], np.random.choice([0, 1], p=[1-g[i], g[i]]))
print(list)
plt.plot(list[:,0], list[:,1], 'o')
Plot of 0s and 1s
Now, I would like to retrieve the function g from these points. The best I could think is to use draw a histogram and use the mean statistic:
bin_means, bin_edges, bin_number = binned_statistic(list[:,0], list[:,1], statistic='mean', bins=10)
plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], lw=2)
Histogram mean statistics
Instead, I would like to have a continuous estimation of the generating function.
I guess it is about kernel density estimation but I could not find the appropriate pointer.
straightforward without explicitly fitting an estimator:
import seaborn as sns
g = sns.lmplot(x= , y= , y_jitter=.02 , logistic=True)
plug in x= your exogenous variable and analogously y = dependent variable. y_jitter is jitter the point for better visibility if you have a lot of data points. logistic = True is the main point here. It will give you the logistic regression line of the data.
Seaborn is basically tailored around matplotlib and works great with pandas, in case you want to extend your data to a DataFrame.

NumPy vectorization with integration

I have a vector and wish to make another vector of the same length whose k-th component is
The question is: how can we vectorize this for speed? NumPy vectorize() is actually a for loop, so it doesn't count.
Veedrac pointed out that "There is no way to apply a pure Python function to every element of a NumPy array without calling it that many times". Since I'm using NumPy functions rather than "pure Python" ones, I suppose it's possible to vectorize, but I don't know how.
import numpy as np
from scipy.integrate import quad
ws = 2 * np.random.random(10) - 1
n = len(ws)
integrals = np.empty(n)
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
def temp(x): return np.array([f(x, w) for w in ws]).sum()
def integrand(x, w): return f(x, w) * np.log(temp(x))
## Python for loop
for k in range(n):
integrals[k] = quad(integrand, -1, 1, args = ws[k])[0]
## NumPy vectorize
integrals = np.vectorize(quad)(integrand, -1, 1, args = ws)[0]
On a side note, is a Cython for loop always faster than NumPy vectorization?
The function quad executes an adaptive algorithm, which means the computations it performs depend on the specific thing being integrated. This cannot be vectorized in principle.
In your case, a for loop of length 10 is a non-issue. If the program takes long, it's because integration takes long, not because you have a for loop.
When you absolutely need to vectorize integration (not in the example above), use a non-adaptive method, with the understanding that precision may suffer. These can be directly applied to a 2D NumPy array obtained by evaluating all of your functions on some regularly spaced 1D array (a linspace). You'll have to choose the linspace yourself since the methods aren't adaptive.
numpy.trapz is the simplest and least precise
scipy.integrate.simps is equally easy to use and more precise (Simpson's rule requires an odd number of samples, but the method works around having an even number, too).
scipy.integrate.romb is in principle of higher accuracy than Simpson (for smooth data) but it requires the number of samples to be 2**n+1 for some integer n.
#zaq's answer focusing on quad is spot on. So I'll look at some other aspects of the problem.
In recent https://stackoverflow.com/a/41205930/901925 I argue that vectorize is of most value when you need to apply the full broadcasting mechanism to a function that only takes scalar values. Your quad qualifies as taking scalar inputs. But you are only iterating on one array, ws. The x that is passed on to your functions is generated by quad itself. quad and integrand are still Python functions, even if they use numpy operations.
cython improves low level iteration, stuff that it can convert to C code. Your primary iteration is at a high level, calling an imported function, quad. Cython can't touch or rewrite that.
You might be able to speed up integrand (and on down) with cython, but first focus on getting the most speed from that with regular numpy code.
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
With if w<0 w must be scalar. Can it be written so it works with an array w? If so, then
np.array([f(x, w) for w in ws]).sum()
could be rewritten as
fn(x, ws).sum()
Alternatively, since both x and w are scalar, you might get a bit of speed improvement by using math.exp etc instead of np.exp. Same for log and abs.
I'd try to write f(x,w) so it takes arrays for both x and w, returning a 2d result. If so, then temp and integrand would also work with arrays. Since quad feeds a scalar x, that may not help here, but with other integrators it could make a big difference.
If f(x,w) can be evaluated on a regular nx10 grid of x=np.linspace(-1,1,n) and ws, then an integral (of sorts) just requires a couple of summations over that space.
You can use quadpy for fully vectorized computation. You'll have to adapt your function to allow for vector inputs first, but that is done rather easily:
import numpy as np
import quadpy
np.random.seed(0)
ws = 2 * np.random.random(10) - 1
def f(x):
out = np.empty((len(ws), *x.shape))
out0 = np.abs(np.multiply.outer(ws, x))
out1 = np.multiply.outer(ws, np.exp(x))
out[ws < 0] = out0[ws < 0]
out[ws >= 0] = out1[ws >= 0]
return out
def integrand(x):
return f(x) * np.log(np.sum(f(x), axis=0))
val, err = quadpy.quad(integrand, -1, +1, epsabs=1.0e-10)
print(val)
[0.3266534 1.44001826 0.68767868 0.30035222 0.18011948 0.97630376
0.14724906 2.62169217 3.10276876 0.27499376]

Numpy - AttributeError: 'Zero' object has no attribute 'exp'

I'm having trouble solving a discrepancy between something breaking at runtime, but using the exact same data and operations in the python console, having it work fine.
# f_err - currently has value 1.11819388872025
# l_scales - currently a numpy array [1.17840183376334 1.13456764589809]
sq_euc_dists = self.se_term(x1, x2, l_scales) # this is fine. It calls cdists on x1/l_scales, x2/l_scales vectors
return (f_err**2) * np.exp(-0.5 * sq_euc_dists) # <-- errors on this line
The error that I get is
AttributeError: 'Zero' object has no attribute 'exp'
However, calling those exact same lines, with the same f_err, l_scales, and x1, x2 in the console right after it errors out, somehow does not produce errors.
I was not able to find a post referring to the 'Zero' object error specifically, and the non-'Zero' ones I found didn't seem to apply to my case here.
EDIT: It was a bit lacking in info, so here's an actual (extracted) runnable example with sample data I took straight out of a failed run, which when run in isolation works fine/I can't reproduce the error except in runtime.
Note that the sqeucld_dist function below is quite bad and I should be using scipy's cdist instead. However, because I'm using sympy's symbols for matrix elementwise gradients with over 15 partial derivatives in my real data, cdist is not an option as it doesn't deal with arbitrary objects.
import numpy as np
def se_term(x1, x2, l):
return sqeucl_dist(x1/l, x2/l)
def sqeucl_dist(x, xs):
return np.sum([(i-j)**2 for i in x for j in xs], axis=1).reshape(x.shape[0], xs.shape[0])
x = np.array([[-0.29932052, 0.40997373], [0.40203481, 2.19895326], [-0.37679417, -1.11028267], [-2.53012051, 1.09819485], [0.59390005, 0.9735], [0.78276777, -1.18787904], [-0.9300892, 1.18802775], [0.44852545, -1.57954101], [1.33285028, -0.58594779], [0.7401607, 2.69842268], [-2.04258086, 0.43581565], [0.17353396, -1.34430191], [0.97214259, -1.29342284], [-0.11103534, -0.15112815], [0.41541759, -1.51803154], [-0.59852383, 0.78442389], [2.01323359, -0.85283772], [-0.14074266, -0.63457529], [-0.49504797, -1.06690869], [-0.18028754, -0.70835799], [-1.3794126, 0.20592016], [-0.49685373, -1.46109525], [-1.41276934, -0.66472598], [-1.44173868, 0.42678815], [0.64623684, 1.19927771], [-0.5945761, -0.10417961]])
f_err = 1.11466725760716
l = [1.18388412685279, 1.02290811104357]
result = (f_err**2) * np.exp(-0.5 * se_term(x, x, l)) # This runs fine, but fails with the exact same calls and data during runtime
Any help greatly appreciated!
Here is how to reproduce the error you are seeing:
import sympy
import numpy
zero = sympy.sympify('0')
numpy.exp(zero)
You will see the same exception you are seeing.
You can fix this (inefficiently) by changing your code to the following to make things floating point.
def sqeucl_dist(x, xs):
return np.sum([np.vectorize(float)(i-j)**2 for i in x for j in xs],
axis=1).reshape(x.shape[0], xs.shape[0])
It will be better to fix your gradient function using lambdify.
Here's an example of how lambdify can be used on partial d
from sympy.abc import x, y, z
expression = x**2 + sympy.sin(y) + z
derivatives = [expression.diff(var, 1) for var in [x, y, z]]
derivatives is now [2*x, cos(y), 1], a list of Sympy expressions. To create a function which will evaluate this numerically at a particular set of values, we use lambdify as follows (passing 'numpy' as an argument like that means to use numpy.cos rather than sympy.cos):
derivative_calc = sympy.lambdify((x, y, z), derivatives, 'numpy')
Now derivative_calc(1, 2, 3) will return [2, -0.41614683654714241, 1]. These are ints and numpy.float64s.
A side note: np.exp(M) will calculate the element-wise exponent of each of the elements of M. If you are trying to do a matrix exponential, you need np.linalg.exmp.