numpy vectorization instead of loop - numpy

I have the following equation:
where v, mu are |R^3, where Sigma is |R^(3x3) and where the result is a scalar value. Implementing this in numpy is no problem:
result = np.transpose(v - mu) # Sigma_inv # (v - mu)
Now I have a bunch of v-vectors (lets call them V \in |R^3xn) and I would
like to execute the above equation in a vectorized manner so that, as
a result I get a new vector Result \in |R^1xn.
# pseudocode
Result = np.zeros((n, 1))
for i,v in V:
Result[i,:] = np.transpose(v - mu) # Sigma_inv # (v - mu)
I looked at np.vectorize but the documentation suggests that its just the same as looping over all entries which I would prefer not to do. What would be an elegant vectorized solution?
As a side node: n might be quite large and a |R^nxn matrix will certainly not fit into my memory!
edit: working code sample
import numpy as np
S = np.array([[1, 2], [3,4]])
V = np.array([[10, 11, 12, 13, 14, 15],[20, 21, 22, 23, 24, 25]])
Res = np.zeros((V.shape[1], 1))
for i in range(V.shape[1]):
v = np.transpose(np.atleast_2d(V[:,i]))
Res[i,:] = (np.transpose(v) # S # v)[0][0]
print(Res)

Using a combination of matrix-multiplication and np.einsum -
np.einsum('ij,ij->j',V,S.dot(V))

Does this work for you?
res = np.diag(V.T # S # V).reshape(-1, 1)
It seems to provide the same result as you want.
import numpy as np
S = np.array([[1, 2], [3,4]])
V = np.array([[10, 11, 12, 13, 14, 15],[20, 21, 22, 23, 24, 25]])
Res = np.zeros((V.shape[1], 1))
for i in range(V.shape[1]):
v = np.transpose(np.atleast_2d(V[:,i]))
Res[i,:] = (np.transpose(v) # S # v)[0][0]
res = np.diag(V.T # S # V).reshape(-1, 1)
print(np.all(np.isclose(Res, res)))
# output: True
Although there is probably a more memory efficient solution using np.einsum.

Here is a simple solution:
import numpy as np
S = np.array([[1, 2], [3,4]])
V = np.array([[10, 11, 12, 13, 14, 15],[20, 21, 22, 23, 24, 25]])
Res = np.sum((V.T # S) * V.T, axis=1)

This are multiplications of matrix/vector stacks. numpy.matmul can do that after bringing S and V into the correct shape:
S = S[np.newaxis, :, :]
VT = V.T[:, np.newaxis, :]
V = VT.transpose(0, 2, 1)
tmp = np.matmul(S, V)
Res = np.matmul(VT, tmp)
print(Res)
#[[[2700]]
# [[3040]]
# [[3400]]
# [[3780]]
# [[4180]]
# [[4600]]]

Related

How do I input a Time Series in spmvg nfoursid

I want to use this algorithm for n4sid model estimation. However, in the Documentation, there is an input DataFrame generated from Random Samples, where I want to input a Time Series Dataframe. Calling the nfoursid method leads to an Type Error or Value Error.
Documentation:
https://github.com/spmvg/nfoursid/blob/master/examples/Overview.ipynb
Imported libs:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from nfoursid.kalman import Kalman
from nfoursid.nfoursid import NFourSID
from nfoursid.state_space import StateSpace
import time
import datetime
import math
import scipy as sp
My input Time Series as Data Frame (flawless):
import yfinance as yfin
yfin.pdr_override()
spy = pdr.get_data_yahoo('AAPL',start='2022-08-23',end='2022-10-24')
spy['Log Return'] = np.log(spy['Adj Close']/spy['Adj Close'].shift(1))
AAPL=pd.DataFrame((spy['Log Return']))
The input DataFrame as proposed in the documentation:
state_space = StateSpace(A, B, C, D)
for _ in range(NUM_TRAINING_DATAPOINTS):
input_state = np.random.standard_normal((INPUT_DIM, 1))
noise = np.random.standard_normal((OUTPUT_DIM, 1)) * NOISE_AMPLITUDE
state_space.step(input_state, noise)
The call using the input proposed in the documentation:
#---->libs already imported
pd.set_option('display.max_columns', None)
np.random.seed(0) # reproducible results
NUM_TRAINING_DATAPOINTS = 1000
# create a training-set by simulating a state-space model with this many datapoints
NUM_TEST_DATAPOINTS = 20 # same for the test-set
INPUT_DIM = 3 #---->this probably needs to adapted to the AAPL dimensions
OUTPUT_DIM = 2
INTERNAL_STATE_DIM = 4 # actual order of the state-space model in the training- and test-set
NOISE_AMPLITUDE = .1 # add noise to the training- and test-set
FIGSIZE = 8
# define system matrices for the state-space model of the training- and test-set
A = np.array([
[1, .01, 0, 0],
[0, 1, .01, 0],
[0, 0, 1, .02],
[0, -.01, 0, 1],
]) / 1.01
B = np.array([
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 1, 1],
]
) / 3
C = np.array([
[1, 0, 1, 1],
[0, 0, 1, -1],
])
D = np.array([
[1, 0, 1],
[0, 1, 0]
]) / 10
)
#---->maybe I have to input the DataFrame already here at the state-space model:
state_space = StateSpace(A, B, C, D)
for _ in range(NUM_TRAINING_DATAPOINTS):
input_state = np.random.standard_normal((INPUT_DIM, 1))
noise = np.random.standard_normal((OUTPUT_DIM, 1)) * NOISE_AMPLITUDE
state_space.step(input_state, noise)
#----
#---->This is the method with the input DF, in this case the random state-space model
nfoursid = NFourSID(
state_space.to_dataframe(), # the state-space model can summarize inputs and outputs as a dataframe
output_columns=state_space.y_column_names,
input_columns=state_space.u_column_names,
num_block_rows=10
)
nfoursid.subspace_identification()
Pasting my DF at the call of the method nfoursid which leads to an error:
df2 = pd.DataFrame()
nfoursid = NFourSID(
output_columns=df2,
input_columns=AAPL,
num_block_rows=10
)
TypeError: NFourSID.init() missing 1 required positional argument: 'dataframe'
Pasting DF in the state_space led to:
ValueError: Dimensions of u (43, 1) are inconsistent. Expected (3, 1).
and
TypeError: 'DataFrame' object is not callable

Numpy.polyfit Not Returning Polynomial

I am trying to create a python program in which the user inputs a set of data and the program spits out an output in which it creates a graph with a line/polynomial which best fits the data.
This is the code:
from matplotlib import pyplot as plt
import numpy as np
x = []
y = []
x_num = 0
while True:
sequence = int(input("Input 1 number in the sequence, type 9040321 to stop"))
if sequence == 9040321:
poly = np.polyfit(x, y, deg=2, rcond=None, full=False, w=None, cov=False)
plt.plot(poly)
plt.scatter(x, y, c="blue", label="data")
plt.legend()
plt.show()
break
else:
y.append(sequence)
x.append(x_num)
x_num += 1
I used the polynomial where I inputed 1, 2, 4, 8 each in separate inputs. MatPlotLib graphed it properly, however, for the degree of 2, the output was the following image:
This is clearly not correct, however I am unsure what the problem is. I think it has something to do with the degree, however when I change the degree to 3, it still does not fit. I am looking for a graph like y=sqrt(x) to go over each of the points and when that is not possible, create the line that fits the best.
Edit: I added a print(poly) feature and for the selected input above, it gives [0.75 0.05 1.05]. I do not know what to make of this.
Approximation by a second degree polynomial
np.polyfit gives the coefficients of a polynomial close to the given points. To plot the polynomial as a smooth curve with matplotlib, you need to calculate a lot of x,y pairs. Using np.linspace(start, stop, numsteps) for the xs, numpy's vectorization allows calculating all the corresponding ys in one go. E.g. ys = a * x**2 + b * x + c.
from matplotlib import pyplot as plt
import numpy as np
x = [0, 1, 2, 3, 4, 5, 6]
y = [1, 2, 4, 8, 16, 32, 64]
plt.scatter(x, y, color='crimson', label='given points')
poly = np.polyfit(x, y, deg=2, rcond=None, full=False, w=None, cov=False)
xs = np.linspace(min(x), max(x), 100)
ys = poly[0] * xs ** 2 + poly[1] * xs + poly[2]
plt.plot(xs, ys, color='dodgerblue', label=f'$({poly[0]:.2f})x^2+({poly[1]:.2f})x + ({poly[2]:.2f})$')
plt.legend()
plt.show()
Higher degree approximating polynomials
Given N points, an N-1 degree polynomial can pass exactly through each of them. Here is an example with 7 points and polynomials of up to degree 6,
from matplotlib import pyplot as plt
import numpy as np
x = [0, 1, 2, 3, 4, 5, 6]
y = [1, 2, 4, 8, 16, 32, 64]
plt.scatter(x, y, color='black', zorder=3, label='given points')
for degree in range(0, len(x)):
poly = np.polyfit(x, y, deg=degree, rcond=None, full=False, w=None, cov=False)
xs = np.linspace(min(x) - 0.5, max(x) + 0.5, 100)
ys = sum(poly_i * xs**i for i, poly_i in enumerate(poly[::-1]))
plt.plot(xs, ys, label=f'degree {degree}')
plt.legend()
plt.show()
Another example
x = [0, 1, 2, 3, 4]
y = [1, 1, 6, 5, 5]
import numpy as np
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 2, 4, 8]
coeffs = np.polyfit(x, y, 2)
print(coeffs)
poly = np.poly1d(coeffs)
print(poly)
x_cont = np.linspace(0, 4, 81)
y_cont = poly(x_cont)
plt.scatter(x, y)
plt.plot(x_cont, y_cont)
plt.grid(1)
plt.show()
Executing the code, you have the graph above and this is printed in the terminal:
[ 0.75 -1.45 1.75]
2
0.75 x - 1.45 x + 1.75
It seems to me that you had false expectations about the output of polyfit.

Efficiently compute product of all other elements in Numpy

Let A be a 2D matrix. How can I compute a matrix B, such that each element of B is the product of all other entries in the same row of A?
Example:
A = np.array([[5, 0, 6], # the input
[3, 1, 9],
[2, 0, 0]])
B = np.array([[0, 30, 0], # the result
[9, 27, 3],
[0, 0, 0]])
The naïve strategy (B = np.prod(A, axis=-1, keepdims=True) / A) runs into division-by-zero errors, and unfortunately these zeros are important elsewhere in the program and cannot trivially be replaced with tiny epsilons.
I've tried using np.where to address the three cases (rows without zeros, rows with one zero, rows with multiple zeros), but although that prevents NaNs in the output, it still requires computing everything up front before letting np.where pick and choose element-wise, which seems like a lot of code and unnecessary computational effort (and still produces div-by-zero warnings in the process).
What is the smartest, fastest way of solving this problem?
I found this answer and, inspired by it, came up with the following efficient-ish solution:
def products_of_others(a, axes=None):
if axes is None:
axes = tuple(range(a.ndim))
if isinstance(axes, int):
axes = (axes,)
# flatten the desired axes into one last dimension
original_shape = a.shape
other_axes = tuple([ax for ax in range(a.ndim) if ax not in axes])
new_ax_order = other_axes + axes
old_ax_order = np.argsort(new_ax_order)
a = np.transpose(a, new_ax_order)
a = np.reshape(a, [original_shape[ax] for ax in other_axes] + [np.prod([original_shape[ax] for ax in axes])])
after = np.concatenate([a[..., 1:], np.ones_like(a[..., 0:1])], axis=-1)
before = np.concatenate([np.ones_like(a[..., 0:1]), a[..., :-1]], axis=-1)
after_prod = np.cumprod(after[..., ::-1], axis=-1)[..., ::-1]
before_prod = np.cumprod(before, axis=-1)
# undo the flattening
out = np.reshape(after_prod * before_prod, [original_shape[ax] for ax in other_axes] + [original_shape[ax] for ax in axes])
out = np.transpose(out, old_ax_order)
return out

extracting diagonals (sideway down) from 5d matrices using einsum

I only managed to extract one diagonal using Numpy einsum. How do I get the other diagonals like [6, 37, 68, 99] with help of einsum?
x = np.arange(1, 26 ).reshape(5,5)
y = np.arange(26, 51).reshape(5,5)
z = np.arange(51, 76).reshape(5,5)
t = np.arange(76, 101).reshape(5,5)
p = np.arange(101, 126).reshape(5,5)
a4 = np.array([x, y, z, t, p]
Extracting one diagonal:
>>>np.einsum('iii->i', a4)
>>>[ 1 32 63 94 125]
I don't have any "easy" solution using einsum but it is quite simple with a for loop:
import numpy as np
# Generation of a 3x3x3 matrix
x = np.arange(1 , 10).reshape(3,3)
y = np.arange(11, 20).reshape(3,3)
z = np.arange(21, 30).reshape(3,3)
M = np.array([x, y, z])
# Generation of the index
I = np.arange(0,len(M))
# Generation of all the possible diagonals
for ii in [1,-1]:
for jj in [1,-1]:
print(M[I[::ii],I[::jj],I])
# OUTPUT:
# [ 1 15 29]
# [ 7 15 23]
# [21 15 9]
# [27 15 3]
We fix the index of the last dimension and we find all the possible combinations of backward and forward indexing for the other dimensions.
Do you realize that this einsum is the same as:
In [64]: a4=np.arange(1,126).reshape(5,5,5)
In [65]: i=np.arange(5)
In [66]: a4[i,i,i]
Out[66]: array([ 1, 32, 63, 94, 125])
It should be easy to tweak the indices to get other diagonals.
In [73]: a4[np.arange(4),np.arange(1,5),np.arange(4)]
Out[73]: array([ 6, 37, 68, 99])
That `iii->i' producing the main diagonal is more of an happy accident than a designed feature. Don't try to push it.

How to append a element to mxnet NDArray?

In numpy, one can append an element to an array by using np.append().
But though numpy and mxnet arrays are supposed to be sumilar, there is not append() function in NDArray class.
Update(18/04/24):
Thanks Thom. In fact, what I tried to achieve is this in numpy :
import numpy as np
np_a1 = np.empty((0,3), int)
np_a1 = np.append(np_a1, np.array([[1,2,3],[4,5,6]]), axis=0)
np_a1 = np.append(np_a1, np.array([[7,8,9]]), axis=0)
print("\nnp_a1:\n", np_a1)
print(np_a1.shape)
Thanks to you answer, I did that :
import mxnet as mx
nd_a1 = mx.nd.array([[0, 0, 0]])
# nd_a1 = mx.nd.empty((0,3))
nd_a1 = mx.nd.concat(nd_a1, mx.nd.array([[1,2,3],[4,5,6]]), dim=0)
nd_a1 = mx.nd.concat(nd_a1, mx.nd.array([[7, 8, 9]]), dim=0)
print("\nnd_a1", nd_a1)
print(nd_a1.shape)
But I can't figure out how to start from an empty nd array.
Starting from :
nd_a1 = mx.nd.empty((0,3))
does not work
You can use mx.nd.concat to achieve this. Using the example given in the numpy docs, you need to be careful with dimensions before concatenating. MXNet works well with data in batches (often the first dimension if the is batch dimension) as this is useful when training/using neural networks, but this makes the example below look more complicated than it would be in practice.
import numpy as np
import mxnet as mx
a = np.array([1, 2, 3])
b = np.array([[4, 5, 6], [7, 8, 9]])
out = np.append(a, b)
print(out)
a = mx.nd.array([1, 2, 3])
b = mx.nd.array([[4, 5, 6], [7, 8, 9]])
a = a.expand_dims(0)
out = mx.nd.concat(a, b, dim=0)
out = out.reshape(shape=(-1,))
print(out)