Is there a Rolling implementation of PCA in python? - numpy

For time-series analysis, it's useful to have rolling PCA functions to analyse how the dynamics of the time-series changes over time to avoid look-ahead bias.
We may want to answer the question: 'how many principle components are needed to keep 90% of the variance?'. The number of principle components to explain 90% variance may change over time, depending on the dynamics of the time-series.
In addition, we may want to reduce the number of components p in a given dataset to k < p on a rolling basis to more easily visualise the data.
While scikit has a PCA module, it does not support rolling calculations. Similarly with numpy SVD. We could use these packages in a manual for loop, but for large arrays (>10,000 rows) it would become very slow.
Is there a fast rolling implementation of PCA in python to address some of the questions above?

While I didn't manage to find a rolling implementation of PCA, it is a relatively straightforward matter to use the packages and tools mentioned in the question to code a manual rolling PCA function. In addition, we will use numba to gain a small speed-up, as it supports numpy.linalg.svd and numpy.linalg.eig.
The code in this answer is inspired by the excellent explanations of PCA here and here
import numpy as np
from numpy.linalg import eig
from numba import njit
import numpy.typing as npt
#njit
def rolling_pca(
arr: npt.NDArray[np.float64],
n_components: int,
window: int,
min_periods: int
) -> npt.NDArray[np.float64]:
"""Perform PCA on the covariance matrix of arr.
Return the lower dimensional array.
Data is assumed to have non-zero mean, so will be demeaned
in the process.
Args:
arr: Input data. Shape (n_samples, n_variables).
n_components: Number of components to reduce data matrix to.
Must be less than arr.shape[1].
window: Sliding window size.
min_periods: Minimum number of observations required to perform calculation.
Returns:
Reduced data matrix. Shape (n_samples, n_components)
"""
# create a copy to ensure we don't change data in place
arr_copy = arr.copy()
n = arr_copy.shape[0]
# create an empty array which will be populated with the output
reduced_out = np.full((n, n_components), np.nan)
# iterate over each row (timestamp in a timeseries)
for i in range(min_periods, n + 1):
if i < window:
lookback = i
else:
lookback = window
start_idx = i - lookback
curr_arr = arr_copy[start_idx: i, :]
# demean returns
curr_arr = curr_arr - (np.sum(curr_arr, axis=0) / lookback)
# calculate the covariance matrix
cov = (curr_arr.T # curr_arr) / (lookback - 1)
# get the eigenvectors
# sort eigvals to get top largest corresponding eigenvectors
evals, evecs = eig(cov)
idx = np.argsort(evals)[:n_components]
evecs = evecs[:, idx]
# multiply the top eigenvectors by the current array to get a reduced matrix
reduced = (evecs.T # curr_arr.T).T
reduced_out[start_idx: i, :] = reduced
return reduced_out
After profiling the code, the two slowest parts are, as expected, the calls to eig() and the matrix multiplication curr_arr.T # curr_arr. As the array curr_arr is limited by the window size, a pure numpy (no numba) implementation of matrix multiplcation is faster than using numba. This is because the arrays used in the matrix multiplication are small, and are not contiguous (see this post for more details). I didn't get around to resolving this issue, but if anyone has any suggestions, it would speed up this function quite a bit more.
I've compared the average timings between 3 implementations to see the effect of the speedup that numba offers. The 3 implementations are:
Manual for loop using numba, exactly as above
Manual for loop without numba, but otherwise same as the code above
Manual for loop using Sklearn instead of numpy eig, no numba (as numba does not support Sklearn)
Note that the following parameters are fixed so we get as fair a comparison as possible between implementations
Window size = 120
Minimum number of periods = 22
Input data number of variables = 20
Number of components to reduce to via PCA = 10
Number of iterations to time function over so as to get an average timing per implementation = 10
Only the row size (number of samples) is allowed to vary so we can visualise how execution time varies with array length.
We can see for large arrays (100k rows), we can decrease the time from about 14.19s using Sklearn, to 5.36s using numba, about a 2.6X speedup.
PCA Reconstruction
We implement some code similar to what we used above to reconstruct the original data matrix, using only the top principle components. Namely, we use SVD to decompose the matrix X into 3 matrices, U, S, and V^T. With these matrices, we can calculate how much variance is kept cumulatively by the components, and only keep the top k components that explain a desired amount of variance.
import numpy as np
from numpy.linalg import svd
from numba import njit
import numpy.typing as npt
#njit
def __get_num_top_components(
singular_values: npt.NDArray[np.float64],
threshold: float,
n: int,
) -> int:
"""Get the number of top eigen-components required by threshold.
Args:
singular_values: Singular values from SVD.
threshold: Minimum amount of explained variance to be kept.
n: Number of samples in data matrix.
Returns:
Required number of components to keep.
"""
evals = singular_values ** 2 / (n - 1)
evals = evals / np.sum(evals)
cumsum_evals = np.cumsum(evals)
top_k = np.argwhere(cumsum_evals > threshold).min()
return top_k
#njit
def rolling_pca_reconstruction(
arr: npt.NDArray[np.float64],
threshold: float,
window: int,
min_periods: int
) -> npt.NDArray[np.float64]:
"""Perform PCA on arr and return reconstructed matrix.
This method follows the logic succinctly outlined here:
https://stats.stackexchange.com/a/134283/178320
Args:
arr: Input data. Shape (n_samples, n_variables).
threshold: Minimum amount of explained variance to be kept.
Must be a number in (0., 1.).
window: Sliding window size.
min_periods: Minimum number of observations required to perform calculation.
Returns:
Reconstructed data matrix. Shape (n_samples, n_variables)
"""
arr_copy = arr.copy()
n = arr_copy.shape[0]
p = arr_copy.shape[1]
recon_out = np.full((n, p), np.nan)
for i in range(min_periods, n + 1):
if i < window:
lookback = i
else:
lookback = window
start_idx = i - lookback
curr_arr = arr_copy[start_idx: i, :]
# demean data
curr_arr = curr_arr - (np.sum(curr_arr, axis=0) / lookback)
# perform SVD on data, no need for full matrices, this is faster
u, s, vh = svd(curr_arr, full_matrices=False)
# calculate the number of components that explains threshold variance
top_k = __get_num_top_components(
singular_values=s,
threshold=threshold,
n=lookback,
)
# reconstruct the data matrix using the top_k components
tmp_recon = u[:, :top_k] # np.diag(s)[:top_k, :top_k] # vh[:, :top_k].T
recon_out[start_idx: i, :] = tmp_recon
return recon_out
The output of rolling_pca_reconstruction() is the reconstructed data, of same dimension as the input data arr. One useful modification that could be made to this code is to record top_k at each iteration, to understand how many components are needed to explain treshold variance over time.

Related

random.multivariate_normal on a dask array?

I've been struggling to find a way to get this calc that works for a dask workflow.
I have code that uses np.random.mulivariate_normal function and while many of these types are available to us on dask array it seems this one it not. Sooo.... I attempted to create my own based on an example provided in the dask documentation.
Here is my attempt which is giving errors that I am having difficulty understanding. I also provided random input variables to make it easy to replicate:
import numpy as np
from dask.distributed import Client
import dask.array as da
def mvn(mu, sigma, n, blocksize):
chunks = ((blocksize,) * (n // blocksize),
(blocksize,) * (n // blocksize))
name = 'mvn' # unique identifier
dsk = {(name, i, j): (np.random.multivariate_normal(mu,sigma, blocksize))
if i == j else
(np.zeros, (blocksize, blocksize))
for i in range(n // blocksize)
for j in range(n // blocksize)}
dtype = np.random.multivariate_normal(0).dtype # take dtype default from numpy
return da.Array(dsk, name, chunks, dtype)
n = 10000
A = da.random.normal(0, 1, size=(n,n), chunks=(1000, 1000))
sigma = da.dot(A,A.transpose())
mu = 4.0*da.ones(n, chunks = 1000)
R = da.numpy.random.mvn(mu, sigma, n, chunks=(100))
Any suggestions or am I so far off the mark here that I should abandon all hope? Thanks!
If you have a cluster to run this on, you can use my answer from this post, copied here for refrence:
An work arround for now, is to use a cholesky decomposition. Note that any covariance matrix C can be expressed as C=G*G'. It then follows that x = G'*y is correlated as specified in C if y is standard normal (see this excellent post on StackExchange Mathematic). In code:
Numpy
n_dim =4
size = 100000
A = np.random.randn(n_dim, n_dim)
covm = A.dot(A.T)
x= np.random.multivariate_normal(size=size, mean=np.zeros(len(covm)),cov=covm)
## verify numpys covariance is correct
np.cov(x, rowvar=False)
covm
Dask
## create covariance matrix
A = da.random.standard_normal(size=(n_dim, n_dim),chunks=(2,2))
covm = A.dot(A.T)
## get cholesky decomp
L = da.linalg.cholesky(covm, lower=True)
## drawn standard normal
sn= da.random.standard_normal(size=(size, n_dim),chunks=(100,100))
## correct for correlation
x =L.dot(sn.T)
x.shape
## verify
covm.compute()
da.cov(x, rowvar=True).compute()
This answer can be fleshed out, but I imagine you would have an easier time using dask's delayed, da.from_delayed and da.*stack.
One immediate problem I see with what you have: with np.random.multivariate_normal(mu,sigma, blocksize) you are directly calling the function, instead of making the spec. You probably wanted (np.random.multivariate_normal, mu,sigma, blocksize). This shows that working with raw dask dictionaries can be tricky!

Scipy Butter bandpass is not producing the desired results

So I'm trying to bandpass filter a wav PCM 24-bit 44.1khz file. What I would like to do is bandpass each frequency from 0Hz-22Khz.
So far I have loaded the data and can display it on Matplot and it looks like the following.
But when I go to apply the bandpass filter which I got from here
http://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html
I get the following result:
So I'm trying to bandpass at 100-101Hz as a test, here is my code:
from WaveData import WaveData
import matplotlib.pyplot as plt
from scipy.signal import butter, lfilter, freqz
from scipy.io.wavfile import read
import numpy as np
from WaveData import WaveData
class Filter:
def __init__(self, wav):
self.waveData = WaveData(wav)
def butter_bandpass(self, lowcut, highcut, fs, order=5):
nyq = 0.5 * fs
low = lowcut / nyq
high = highcut / nyq
b, a = butter(order, [low, high], btype='band')
return b, a
def butter_bandpass_filter(self, data, lowcut, highcut, fs, order):
b, a = self.butter_bandpass(lowcut, highcut, fs, order=order)
y = lfilter(b, a, data)
return y
def getFilteredSignal(self, freq):
return self.butter_bandpass_filter(data=self.waveData.file['Data'], lowcut=100, highcut=101, fs=44100, order=3)
def getUnprocessedData(self):
return self.waveData.file['Data']
def plot(self, signalA, signalB=None):
plt.plot(signalA)
if signalB != None:
plt.plot(signalB)
plt.show()
if __name__ == "__main__":
# file = WaveData("kick.wav")
# fileA = read("kick0.wav")
f = Filter("kick.wav")
a, b = f. butter_bandpass(lowcut=100, highcut=101, fs=44100)
w, h = freqz(b, a, worN=22000) ##Filted signal is not working?
f.plot(h, w)
print("break")
I dont understand where I have gone wrong.
Thanks
What #WoodyDev said is true: 1 Hz out of 44.1 kHz is way way too tiny of a bandpass for any kind of filter. Just look at the filter coefficients butter returns:
In [3]: butter(5, [100/(44.1e3/2), 101/(44.1e3/2)], btype='band')
Out[3]:
(array([ 1.83424060e-21, 0.00000000e+00, -9.17120299e-21, 0.00000000e+00,
1.83424060e-20, 0.00000000e+00, -1.83424060e-20, 0.00000000e+00,
9.17120299e-21, 0.00000000e+00, -1.83424060e-21]),
array([ 1. , -9.99851389, 44.98765092, -119.95470631,
209.90388506, -251.87018009, 209.88453023, -119.93258575,
44.9752074 , -9.99482662, 0.99953904]))
Look at the b coefficients (the first array): their values at 1e-20, meaning the filter design totally failed to converge, and if you apply it to any signal, the output will be zero—which is what you found.
You didn't mention your application but if you really really want to keep the signal's frequency content between 100 and 101 Hz, you could take a zero-padded FFT of the signal, zero out the portions of the spectrum outside that band, and IFFT (look at rfft, irfft, and rfftfreq in numpy.fft module).
Here's a function that applies a brick-wall bandpass filter in the Fourier domain using FFTs:
import numpy.fft as fft
import numpy as np
def fftBandpass(x, low, high, fs=1.0):
"""
Apply a bandpass signal via FFTs.
Parameters
----------
x : array_like
Input signal vector. Assumed to be real-only.
low : float
Lower bound of the passband in Hertz. (If less than or equal
to zero, a high-pass filter is applied.)
high : float
Upper bound of the passband, Hertz.
fs : float
Sample rate in units of samples per second. If `high > fs / 2`,
the output is low-pass filtered.
Returns
-------
y : ndarray
Output signal vector with all frequencies outside the `[low, high]`
passband zeroed.
Caveat
------
Note that the energe in `y` will be lower than the energy in `x`, i.e.,
`sum(abs(y)) < sum(abs(x))`.
"""
xf = fft.rfft(x)
f = fft.rfftfreq(len(x), d=1 / fs)
xf[f < low] = 0
xf[f > high] = 0
return fft.irfft(xf, len(x))
if __name__ == '__main__':
fs = 44.1e3
N = int(fs)
x = np.random.randn(N)
t = np.arange(N) / fs
import pylab as plt
plt.figure()
plt.plot(t, x, t, 100 * fftBandpass(x, 100, 101, fs=fs))
plt.xlabel('time (seconds)')
plt.ylabel('signal')
plt.legend(['original', 'scaled bandpassed'])
plt.show()
You can put this in a file, fftBandpass.py, and just run it with python fftBandpass.py to see it create the following plot:
Note I had to scale the 1 Hz bandpassed signal by 100 because, after bandpassing that much, there's very little energy in the signal. Also note that the signal living inside this small a passband is pretty much just a sinusoid at around 100 Hz.
If you put the following in your own code: from fftBandpass import fftBandpass, you can use the fftBandpass function.
Another thing you could try is to decimate the signal 100x, so convert it to a signal that was sampled at 441 Hz. 1 Hz out of 441 Hz is still a crazy-narrow passband but you might have better luck than trying to bandpass the original signal. See scipy.signal.decimate, but don't try and call it with q=100, instead recursively decimate the signal, by 2, then 2, then 5, then 5 (for total decimation of 100x).
So there are some problems with your code which means you aren't plotting the results correctly, although I believe this isn't your main problem.
Check your code
In the example you linked, they show precisely the process for calculating, and plotting the filter at different orders:
for order in [3, 6, 9]:
b, a = butter_bandpass(lowcut, highcut, fs, order=order)
w, h = freqz(b, a, worN=2000)
plt.plot((fs * 0.5 / np.pi) * w, abs(h), label="order = %d" % order)
You are currently not scaling your frequency axis correctly, or calling the absolute to get the real informatino from h, like the correct code above.
Check your theory
However your main issue, is your such steep bandpass (i.e. only 100Hz - 101Hz). It is very rare that I have seen a filter so sharp as this is very processing intensive (will require a lot of filter coefficients), and because you are only looking at a range of 1Hz, it will completely get rid of all other frequencies.
So the graph you have shown with the gain as 0 may very well be correct. If you use their example and change the bandpass cutoff frequencies to 100Hz -> 101Hz, then the output result is an array of (almost if not completely) zeros. This is because it will only be looking at the energy of the signal in a 1Hz range which will be very very small if you think about it.
If you are doing this for analysis, the frequency spacing tends to be much larger i.e. Octave Bands (or smaller divisions of octave bands).
The Spectrogram
As I am not sure of your end purpose I cannot clarify exactly which route you should take to get there. However, using bandpass filters on every single frequency up to 20kHz seems kind of silly in this day and age.
If I remember correctly, some of the first spectrogram attempts with needles on paper used this technique with analog band pass filter banks to analyze the frequency content. So this makes me think you may be looking for something to do with a spectrogram? It lets you analyze the whole signal's frequency information vs time and still has all of the signal's amplitude information. Python already has spectrogram functionality included as part of scipy or Matplotlib.

Locally weighted smoothing for binary valued random variable

I have a random variable as follows:
f(x) = 1 with probability g(x)
f(x) = 0 with probability 1-g(x)
where 0 < g(x) < 1.
Assume g(x) = x. Let's say I am observing this variable without knowing the function g and obtained 100 samples as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binned_statistic
list = np.ndarray(shape=(200,2))
g = np.random.rand(200)
for i in range(len(g)):
list[i] = (g[i], np.random.choice([0, 1], p=[1-g[i], g[i]]))
print(list)
plt.plot(list[:,0], list[:,1], 'o')
Plot of 0s and 1s
Now, I would like to retrieve the function g from these points. The best I could think is to use draw a histogram and use the mean statistic:
bin_means, bin_edges, bin_number = binned_statistic(list[:,0], list[:,1], statistic='mean', bins=10)
plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], lw=2)
Histogram mean statistics
Instead, I would like to have a continuous estimation of the generating function.
I guess it is about kernel density estimation but I could not find the appropriate pointer.
straightforward without explicitly fitting an estimator:
import seaborn as sns
g = sns.lmplot(x= , y= , y_jitter=.02 , logistic=True)
plug in x= your exogenous variable and analogously y = dependent variable. y_jitter is jitter the point for better visibility if you have a lot of data points. logistic = True is the main point here. It will give you the logistic regression line of the data.
Seaborn is basically tailored around matplotlib and works great with pandas, in case you want to extend your data to a DataFrame.

NumPy vectorization with integration

I have a vector and wish to make another vector of the same length whose k-th component is
The question is: how can we vectorize this for speed? NumPy vectorize() is actually a for loop, so it doesn't count.
Veedrac pointed out that "There is no way to apply a pure Python function to every element of a NumPy array without calling it that many times". Since I'm using NumPy functions rather than "pure Python" ones, I suppose it's possible to vectorize, but I don't know how.
import numpy as np
from scipy.integrate import quad
ws = 2 * np.random.random(10) - 1
n = len(ws)
integrals = np.empty(n)
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
def temp(x): return np.array([f(x, w) for w in ws]).sum()
def integrand(x, w): return f(x, w) * np.log(temp(x))
## Python for loop
for k in range(n):
integrals[k] = quad(integrand, -1, 1, args = ws[k])[0]
## NumPy vectorize
integrals = np.vectorize(quad)(integrand, -1, 1, args = ws)[0]
On a side note, is a Cython for loop always faster than NumPy vectorization?
The function quad executes an adaptive algorithm, which means the computations it performs depend on the specific thing being integrated. This cannot be vectorized in principle.
In your case, a for loop of length 10 is a non-issue. If the program takes long, it's because integration takes long, not because you have a for loop.
When you absolutely need to vectorize integration (not in the example above), use a non-adaptive method, with the understanding that precision may suffer. These can be directly applied to a 2D NumPy array obtained by evaluating all of your functions on some regularly spaced 1D array (a linspace). You'll have to choose the linspace yourself since the methods aren't adaptive.
numpy.trapz is the simplest and least precise
scipy.integrate.simps is equally easy to use and more precise (Simpson's rule requires an odd number of samples, but the method works around having an even number, too).
scipy.integrate.romb is in principle of higher accuracy than Simpson (for smooth data) but it requires the number of samples to be 2**n+1 for some integer n.
#zaq's answer focusing on quad is spot on. So I'll look at some other aspects of the problem.
In recent https://stackoverflow.com/a/41205930/901925 I argue that vectorize is of most value when you need to apply the full broadcasting mechanism to a function that only takes scalar values. Your quad qualifies as taking scalar inputs. But you are only iterating on one array, ws. The x that is passed on to your functions is generated by quad itself. quad and integrand are still Python functions, even if they use numpy operations.
cython improves low level iteration, stuff that it can convert to C code. Your primary iteration is at a high level, calling an imported function, quad. Cython can't touch or rewrite that.
You might be able to speed up integrand (and on down) with cython, but first focus on getting the most speed from that with regular numpy code.
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
With if w<0 w must be scalar. Can it be written so it works with an array w? If so, then
np.array([f(x, w) for w in ws]).sum()
could be rewritten as
fn(x, ws).sum()
Alternatively, since both x and w are scalar, you might get a bit of speed improvement by using math.exp etc instead of np.exp. Same for log and abs.
I'd try to write f(x,w) so it takes arrays for both x and w, returning a 2d result. If so, then temp and integrand would also work with arrays. Since quad feeds a scalar x, that may not help here, but with other integrators it could make a big difference.
If f(x,w) can be evaluated on a regular nx10 grid of x=np.linspace(-1,1,n) and ws, then an integral (of sorts) just requires a couple of summations over that space.
You can use quadpy for fully vectorized computation. You'll have to adapt your function to allow for vector inputs first, but that is done rather easily:
import numpy as np
import quadpy
np.random.seed(0)
ws = 2 * np.random.random(10) - 1
def f(x):
out = np.empty((len(ws), *x.shape))
out0 = np.abs(np.multiply.outer(ws, x))
out1 = np.multiply.outer(ws, np.exp(x))
out[ws < 0] = out0[ws < 0]
out[ws >= 0] = out1[ws >= 0]
return out
def integrand(x):
return f(x) * np.log(np.sum(f(x), axis=0))
val, err = quadpy.quad(integrand, -1, +1, epsabs=1.0e-10)
print(val)
[0.3266534 1.44001826 0.68767868 0.30035222 0.18011948 0.97630376
0.14724906 2.62169217 3.10276876 0.27499376]

Exponential decay curve fitting in numpy and scipy

I'm having a bit of trouble with fitting a curve to some data, but can't work out where I am going wrong.
In the past I have done this with numpy.linalg.lstsq for exponential functions and scipy.optimize.curve_fit for sigmoid functions. This time I wished to create a script that would let me specify various functions, determine parameters and test their fit against the data. While doing this I noticed that Scipy leastsq and Numpy lstsq seem to provide different answers for the same set of data and the same function. The function is simply y = e^(l*x) and is constrained such that y=1 at x=0.
Excel trend line agrees with the Numpy lstsq result, but as Scipy leastsq is able to take any function, it would be good to work out what the problem is.
import scipy.optimize as optimize
import numpy as np
import matplotlib.pyplot as plt
## Sampled data
x = np.array([0, 14, 37, 975, 2013, 2095, 2147])
y = np.array([1.0, 0.764317544, 0.647136491, 0.070803763, 0.003630962, 0.001485394, 0.000495131])
# function
fp = lambda p, x: np.exp(p*x)
# error function
e = lambda p, x, y: (fp(p, x) - y)
# using scipy least squares
l1, s = optimize.leastsq(e, -0.004, args=(x,y))
print l1
# [-0.0132281]
# using numpy least squares
l2 = np.linalg.lstsq(np.vstack([x, np.zeros(len(x))]).T,np.log(y))[0][0]
print l2
# -0.00313461628963 (same answer as Excel trend line)
# smooth x for plotting
x_ = np.arange(0, x[-1], 0.2)
plt.figure()
plt.plot(x, y, 'rx', x_, fp(l1, x_), 'b-', x_, fp(l2, x_), 'g-')
plt.show()
Edit - additional information
The MWE above includes a small sample of the dataset. When fitting the actual data the scipy.optimize.curve_fit curve presents an R^2 of 0.82, while the numpy.linalg.lstsq curve, which is the same as that calculated by Excel, has an R^2 of 0.41.
You are minimizing different error functions.
When you use numpy.linalg.lstsq, the error function being minimized is
np.sum((np.log(y) - p * x)**2)
while scipy.optimize.leastsq minimizes the function
np.sum((y - np.exp(p * x))**2)
The first case requires a linear dependency between the dependent and independent variables, but the solution is known analitically, while the second can handle any dependency, but relies on an iterative method.
On a separate note, I cannot test it right now, but when using numpy.linalg.lstsq, I you don't need to vstack a row of zeros, the following works as well:
l2 = np.linalg.lstsq(x[:, None], np.log(y))[0][0]
To expound a bit on Jaime's point, any non-linear transformation of the data will lead to a different error function and hence to different solutions. These will lead to different confidence intervals for the fitting parameters. So you have three possible criteria to use to make a decision: which error you want to minimize, which parameters you want more confidence in, and finally, if you are using the fitting to predict some value, which method yields less error in the interesting predicted value. Playing around a bit analytically and in Excel suggests that different kinds of noise in the data (e.g. if the noise function scales the amplitude, affects the time-constant or is additive) leads to different choices of solution.
I'll also add that while this trick "works" for exponential decay to 0, it can't be used in the more general (and common) case of damped exponentials (rising or falling) to values that cannot be assumed to be 0.