How do I input a Time Series in spmvg nfoursid - pandas

I want to use this algorithm for n4sid model estimation. However, in the Documentation, there is an input DataFrame generated from Random Samples, where I want to input a Time Series Dataframe. Calling the nfoursid method leads to an Type Error or Value Error.
Documentation:
https://github.com/spmvg/nfoursid/blob/master/examples/Overview.ipynb
Imported libs:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from nfoursid.kalman import Kalman
from nfoursid.nfoursid import NFourSID
from nfoursid.state_space import StateSpace
import time
import datetime
import math
import scipy as sp
My input Time Series as Data Frame (flawless):
import yfinance as yfin
yfin.pdr_override()
spy = pdr.get_data_yahoo('AAPL',start='2022-08-23',end='2022-10-24')
spy['Log Return'] = np.log(spy['Adj Close']/spy['Adj Close'].shift(1))
AAPL=pd.DataFrame((spy['Log Return']))
The input DataFrame as proposed in the documentation:
state_space = StateSpace(A, B, C, D)
for _ in range(NUM_TRAINING_DATAPOINTS):
input_state = np.random.standard_normal((INPUT_DIM, 1))
noise = np.random.standard_normal((OUTPUT_DIM, 1)) * NOISE_AMPLITUDE
state_space.step(input_state, noise)
The call using the input proposed in the documentation:
#---->libs already imported
pd.set_option('display.max_columns', None)
np.random.seed(0) # reproducible results
NUM_TRAINING_DATAPOINTS = 1000
# create a training-set by simulating a state-space model with this many datapoints
NUM_TEST_DATAPOINTS = 20 # same for the test-set
INPUT_DIM = 3 #---->this probably needs to adapted to the AAPL dimensions
OUTPUT_DIM = 2
INTERNAL_STATE_DIM = 4 # actual order of the state-space model in the training- and test-set
NOISE_AMPLITUDE = .1 # add noise to the training- and test-set
FIGSIZE = 8
# define system matrices for the state-space model of the training- and test-set
A = np.array([
[1, .01, 0, 0],
[0, 1, .01, 0],
[0, 0, 1, .02],
[0, -.01, 0, 1],
]) / 1.01
B = np.array([
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 1, 1],
]
) / 3
C = np.array([
[1, 0, 1, 1],
[0, 0, 1, -1],
])
D = np.array([
[1, 0, 1],
[0, 1, 0]
]) / 10
)
#---->maybe I have to input the DataFrame already here at the state-space model:
state_space = StateSpace(A, B, C, D)
for _ in range(NUM_TRAINING_DATAPOINTS):
input_state = np.random.standard_normal((INPUT_DIM, 1))
noise = np.random.standard_normal((OUTPUT_DIM, 1)) * NOISE_AMPLITUDE
state_space.step(input_state, noise)
#----
#---->This is the method with the input DF, in this case the random state-space model
nfoursid = NFourSID(
state_space.to_dataframe(), # the state-space model can summarize inputs and outputs as a dataframe
output_columns=state_space.y_column_names,
input_columns=state_space.u_column_names,
num_block_rows=10
)
nfoursid.subspace_identification()
Pasting my DF at the call of the method nfoursid which leads to an error:
df2 = pd.DataFrame()
nfoursid = NFourSID(
output_columns=df2,
input_columns=AAPL,
num_block_rows=10
)
TypeError: NFourSID.init() missing 1 required positional argument: 'dataframe'
Pasting DF in the state_space led to:
ValueError: Dimensions of u (43, 1) are inconsistent. Expected (3, 1).
and
TypeError: 'DataFrame' object is not callable

Related

Python: create (sparse) stacked diagonal block matrix

I need to create a matrix with the form
M=[
[a1, 0, 0],
[0, b1, 0],
[0, 0, c1],
[a2, 0, 0],
[0, b2, 0],
[0, 0, c2],
[a3, 0, 0],
[0, b3, 0],
[0, 0, c3],
...]
where a(i), b(i) and c(i) are [1xp] blocks. The resulting matrix M has the form [3m x 3p]. I am given the input data in the form of 3 matrices [m x p]:
A = [[a1.T, a2.T, a3.T, ...]].T
B = [[b1.T, b2.T, b3.T, ...]].T
C = [[c1.T, c2.T, c3.T, ...]].T
How can I create the matrix M? Ideally it would be sparse using the scipy.sparse library but I am even struggling creating it as a dense matrix using numpy. Is there no way around a loop or at least list comprehension in this case?
No need to make it complicated. For your scale, the following executes in less than a second.
import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
m = 70_000
p = 20
abc = rand.random((3, m, p))
M_dense = np.zeros((m, 3, 3*p))
for i in range(3):
M_dense[:, i, i*p:(i+1)*p] = abc[i, ...]
M_sparse = scipy.sparse.csr_matrix(M_dense.reshape((-1, 3*p)))
print(M_sparse.shape)
(210000, 60)
Far better, though, is to construct the sparse matrix directly. Note the permuted shape of abc.
abc = rand.random((m, 3, p))
data = abc.ravel()
indices = np.tile(np.arange(3*p), m)
indptr = np.arange(0, data.size+1, p)
M_sparse = scipy.sparse.csr_matrix((data, indices, indptr))

how to take numpy array as an input in logistic regression?

Currently i'm working on a video recommendation system which will predicts a video in a form of 0 (Negative) and 1 (positive). I successfully scrape data set from YouTube and also find sentiments of YouTube comments in the form of 0 (Negative) and 1 (positive).I encode text data of my csv using one hot encoder and get output in the form of numpy array. Now My question is how to give the numpy array as an input (X) in logistic regression ? Below are my code, output and csv(1874 X 2).
Target variable is Comments_Sentiments
#OneHotEncoding
import numpy as np
import pandas as pd
from sklearn import preprocessing
X = pd.read_csv("C:/Users/Shahnawaz Irfan/Desktop/USIrancrisis/demo.csv")
#X.head(5)
X = X.select_dtypes(include=[object])
#X.head(5)
#X.shape
#X.columns
le = preprocessing.LabelEncoder()
X_2 = X.apply(le.fit_transform)
X_2.head()
enc = preprocessing.OneHotEncoder()
enc.fit(X_2)
onehotlabels = enc.transform(X_2).toarray()
onehotlabels.shape
onehotlabels
Output is:
array([[1.],
[1.],
[1.],
...,
[1.],
[1.],
[1.]])
Can any one resolve this query by taking this numpy array as an input in logistic regression?
you can use the inverse functionenc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]]) enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])

Vectorization of selective cumulative sum

I have a pandas Series where each element is a list with indices:
series_example = pd.Series([[1, 3, 2], [1, 2]])
In addition, I have an array with values associated to every index:
arr_example = np.array([3., 0.5, 0.25, 0.1])
I want to create a new Series with the cumulative sums of the elements of the array given by the indices in the row of the input Series. In the example, the output Series would have the following contents:
0 [0.5, 0.6, 0.85]
1 [0.5, 0.75]
dtype: object
The non-vectorized way to do it would be the following:
def non_vector_transform(series, array):
series_output = pd.Series(np.zeros(len(series_example)), dtype = object)
for i in range(len(series)):
element_list = series[i]
series_output[i] = []
acum = 0
for element in element_list:
acum += array[element]
series_output[i].append(acum)
return series_output
I would like to do this in a vectorized way. Any vectorization magician to help me in here?
Use Series.apply and np.cumsum:
import numpy as np
import pandas as pd
series_example = pd.Series([[1, 3, 2], [1, 2]])
arr_example = np.array([3., 0.5, 0.25, 0.1])
result = series_example.apply(lambda x: np.cumsum(arr_example[x]))
print(result)
Or if you prefer a for loop:
import numpy as np
import pandas as pd
series_example = pd.Series([[1, 3, 2], [1, 2]])
arr_example = np.array([3., 0.5, 0.25, 0.1])
# Copy only if you do not want to overwrite the original series
result = series_example.copy()
for i, x in result.iteritems():
result[i] = np.cumsum(arr_example[x])
print(result)
Output:
0 [0.5, 0.6, 0.85]
1 [0.5, 0.75]
dtype: object

How to append a element to mxnet NDArray?

In numpy, one can append an element to an array by using np.append().
But though numpy and mxnet arrays are supposed to be sumilar, there is not append() function in NDArray class.
Update(18/04/24):
Thanks Thom. In fact, what I tried to achieve is this in numpy :
import numpy as np
np_a1 = np.empty((0,3), int)
np_a1 = np.append(np_a1, np.array([[1,2,3],[4,5,6]]), axis=0)
np_a1 = np.append(np_a1, np.array([[7,8,9]]), axis=0)
print("\nnp_a1:\n", np_a1)
print(np_a1.shape)
Thanks to you answer, I did that :
import mxnet as mx
nd_a1 = mx.nd.array([[0, 0, 0]])
# nd_a1 = mx.nd.empty((0,3))
nd_a1 = mx.nd.concat(nd_a1, mx.nd.array([[1,2,3],[4,5,6]]), dim=0)
nd_a1 = mx.nd.concat(nd_a1, mx.nd.array([[7, 8, 9]]), dim=0)
print("\nnd_a1", nd_a1)
print(nd_a1.shape)
But I can't figure out how to start from an empty nd array.
Starting from :
nd_a1 = mx.nd.empty((0,3))
does not work
You can use mx.nd.concat to achieve this. Using the example given in the numpy docs, you need to be careful with dimensions before concatenating. MXNet works well with data in batches (often the first dimension if the is batch dimension) as this is useful when training/using neural networks, but this makes the example below look more complicated than it would be in practice.
import numpy as np
import mxnet as mx
a = np.array([1, 2, 3])
b = np.array([[4, 5, 6], [7, 8, 9]])
out = np.append(a, b)
print(out)
a = mx.nd.array([1, 2, 3])
b = mx.nd.array([[4, 5, 6], [7, 8, 9]])
a = a.expand_dims(0)
out = mx.nd.concat(a, b, dim=0)
out = out.reshape(shape=(-1,))
print(out)

Eigenvector normalization in numpy

I'm using the linalg in numpy to compute eigenvalues and eigenvectors of matrices of signed reals.
I've read this previous question but still don't grasp the normalization of eigenvectors.
Here is an example straight off Wikipedia:
import numpy as np
from numpy import linalg as la
a = np.matrix([[2, 1], [1, 2]], dtype=np.float)
eigh_vals, eigh_vects = np.linalg.eig(a)
print 'eigen_values='
print eigh_vals
print 'eigen_vectors='
print eigh_vects
The eigenvalues are 1 and 3.
For eigenvectors we expect scalar multiples of [1, -1] and [1, 1], which I get:
eig_vals=
[ 3. 1.]
eig_vets=
[[ 0.70710678 -0.70710678]
[ 0.70710678 0.70710678]]
I understand the 1/sqrt(2) factor is to have the norm=1 but why?
Can normalization be 'switched off'?
Thanks!
The key message for the first eigenvector in the Wikipedia article is
Any non-zero vector with v1 = −v2 solves this equation.
So the actual solution is V1 = [x, -x]. Picking the vector V1 = [1, -1] may be pleasing to the human eye, but it is just as aritrary as picking a vector V1 = [104051, -104051] or any other real value.
Actually, picking V1 = [1, -1] / sqrt(2) is the least arbitrary. Of all the possible vectors for V1, it's the only one that is of unit length.
However if instead of unit length you prefer the first value to be 1, you can do
eigh_vects /= eigh_vects[:, 0]
import numpy as np
import sympy as sp
v = sp.Matrix([[2, 1], [1, 2]])
v_vec = v.eigenvects()
v_vec is a list contains 2 tuples:
[(1, 1, [Matrix([
[-1],
[ 1]])]), (3, 1, [Matrix([
[1],
[1]])])]
1 and 3 is the two eigenvalues. The '1' behind 1 & 3 is the number of the eigenvalues. In each tuple, the third element is the eigenvector of each eigenvalue. It is a Matrix object in sp. You can convert a Matrix object to the np array.
v_vec1 = np.array(v_vec[0][2], dtype=float)
v_vec2 = np.array(v_vec[1][2], dtype=float)
print('v_vec1 =', v_vec1)
print('v_vec2 =', v_vec2)
Here is the normalized eigenvectors you would get:
v_vec1 = [[-1. 1.]]
v_vec2 = [[1. 1.]]
If sympy is an option for you, it appears to normalize less aggressively:
import sympy
a = sympy.Matrix([[2, 1], [1, 2]])
a.eigenvects()
# [(1, 1, [Matrix([
# [-1],
# [ 1]])]), (3, 1, [Matrix([
# [1],
# [1]])])]