Cache intermediate results of prior calls to lapacke gelsd - least-squares

An iterative algorithm calls LAPACKE_sgelsd each iteration with a single column of B. Subsequent calls often use the same A matrix. I believe a substantial performance improvement would be to cache or some how reuse intermediate results from the previous iteration when the A matrix has not changed. This should be somewhat similar to the gains possible when passing multiple columns for B. Is that correct? How difficult would it be to implement, and how could it be done? It uses openblas. Thank you.

Instead of caching intermediate results, the pseudo inverse can be computed and cached. It can be computed this approach, summarized as:
Calculate the SVD
Set all "small" singular values to zero
Invert all non-zero singular values
Multiply the three matrices again
Pseudo inverse is the transpose of the result.
The result is the pseudo inverse * B.

Related

How to solve this quadratic optimization problem

My problem is described in this picture(It's like a Pyramid structure):
The objective function is below:
In this problem, D is known, A is the object that I want to get. It is a layered structure, each block in the upper layer is divided into four sub-blocks in the layer below. And the value of the upper layer node is equal to the sum of the four child nodes of the lower layer. In above example, I used only 2 layers.
What I want to do is simulate the distribution of D with A, so in the objective function is the ratio of two adjacent squares in each row in A compared to the value in D. I do this comparison on each layer and sum them. Then it is all of my objective function. But in the finest layer, the value in A has a constrain A<=1, the value in A can be a number between 0 and 1. I have tried to solve it using Quadratic programming in python library CVXPY. However, it seems the speed is slow.
So I want to solve it in another way, because this is a convex optimization problem, which can guarantee the global optimal solution. What I think is whether it is possible to use the method of derivation. There are two unknown variables in each item, that is, the two items with A in the formula. Partial derivatives are obtained for them, and the restriction of A<=1 is added, then solve using gradient descent method. Is this mathematically feasible, because I don't know much about optimization, and if it is possible, how should I do it? If not possible, what other methods can I use?

Implementing a 2D recursive spatial filter using Scipy

Minimally, I would like to know how to achieve what is stated in the title. Specifically, signal.lfilter seems like the only implementation of a difference equation filter in scipy, but it is 1D, as shown in the docs. I would like to know how to implement a 2D version as described by this difference equation. If that's as simple as "bro, use this function," please let me know, pardon my naiveté, and feel free to disregard the rest of the post.
I am new to DSP and acknowledging there might be a different approach to answering my question so I will explain the broader goal and give context for the question in the hopes someone knows how do want I want with Scipy, or perhaps a better way than what I explicitly asked for.
To get straight into it, broadly speaking I am using vectorized computation methods (Numpy/Scipy) to implement a Monte Carlo simulation to improve upon a naive for loop. I have successfully abstracted most of my operations to array computation / linear algebra, but a few specific ones (recursive computations) have eluded my intuition and I continually end up in the digital signal processing world when I go looking for how this type of thing has been done by others (that or machine learning but those "frameworks" are much opinionated). The reason most of my google searches end up on scipy.signal or scipy.ndimage library references is clear to me at this point, and subsequent to accepting the "signal" representation of my data, I have spent a considerable amount of time (about as much as reasonable for a field that is not my own) ramping up the learning curve to try and figure out what I need from these libraries.
My simulation entails updating a vector of data representing the state of a system each period for n periods, and then repeating that whole process a "Monte Carlo" amount of times. The updates in each of n periods are inherently recursive as the next depends on the state of the prior. It can be characterized as a difference equation as linked above. Additionally this vector is theoretically indexed on an grid of points with uneven stepsize. Here is an example vector y and its theoretical grid t:
y = np.r_[0.0024, 0.004, 0.0058, 0.0083, 0.0099, 0.0133, 0.0164]
t = np.r_[0.25, 0.5, 1, 2, 5, 10, 20]
I need to iteratively perform numerous operations to y for each of n "updates." Specifically, I am computing the curvature along the curve y(t) using finite difference approximations and using the result at each point to adjust the corresponding y(t) prior to the next update. In a loop this amounts to inplace variable reassignment with the desired update in each iteration.
y += some_function(y)
Not only does this seem inefficient, but vectorizing things seems intuitive given y is a vector to begin with. Furthermore I am interested in preserving each "updated" y(t) along the n updates, which would require a data structure of dimensions len(y) x n. At this point, why not perform the updates inplace in the array? This is wherein lies the question. Many of the update operations I have succesfully vectorized the "Numpy way" (such as adding random variates to each point), but some appear overly complex in the array world.
Specifically, as mentioned above the one involving computing curvature at each element using its neighbouring two elements, and then imediately using that result to update the next row of the array before performing its own curvature "update." I was able to implement a non-recursive version (each row fails to consider its "updated self" from the prior row) of the curvature operation using ndimage generic_filter. Given the uneven grid, I have unique coefficients (kernel weights) for each triplet in the kernel footprint (instead of always using [1,-2,1] for y'' if I had a uniform grid). This last part has already forced me to use a spatial filter from ndimage rather than a 1d convolution. I'll point out, something conceptually similar was discussed in this math.exchange post, and it seems to me only the third response saliently addressed the difference between mathematical notion of "convolution" which should be associative from general spatial filtering kernels that would require two sequential filtering operations or a cleverly merged kernel.
In any case this does not seem to actually address my concern as it is not about 2D recursion filtering but rather having a backwards looking kernel footprint. Additionally, I think I've concluded it is not applicable in that this only allows for "recursion" (backward looking kernel footprints in the spatial filtering world) in a manner directly proportional to the size of the recursion. Meaning if I wanted to filter each of n rows incorporating calculations on all prior rows, it would require a convolution kernel far too big (for my n anyways). If I'm understanding all this correctly, a recursive linear filter is algorithmically more efficient in that it returns (for use in computation) the result of itself applied over the previous n samples (up to a level where the stability of the algorithm is affected) using another companion vector (z). In my case, I would only need to look back one step at output signal y[n-1] to compute y[n] from curvature at x[n] as the rest works itself out like a cumsum. signal.lfilter works for this, but I can't used that to compute curvature, as that requires a kernel footprint that can "see" at least its left and right neighbors (pixels), which is how I ended up using generic_filter.
It seems to me I should be able to do both simultaneously with one filter namely spatial and recursive filtering; or somehow I've missed the maths of how this could be mathematically simplified/combined (convolution of multiples kernels?).
It seems like this should be a common problem, but perhaps it is rarely relevant to do both at once in signal processing and image filtering. Perhaps this is why you don't use signals libraries solely to implement a fast monte carlo simulation; though it seems less esoteric than using a tensor math library to implement a recursive neural network scan ... which I'm attempting to do right now.
EDIT: For those familiar with the theoretical side of DSP, I know that what I am describing, the process of designing a recursive filters with arbitrary impulse responses, is achieved by employing a mathematical technique called the z-transform which I understand is generally used for two things:
converting between the recursion coefficients and the frequency response
combining cascaded and parallel stages into a single filter
Both are exactly what I am trying to accomplish.
Also, reworded title away from FIR / IIR because those imply specific definitions of "recursion" and may be confusing / misnomer.

Select important features then impute or first impute then select important features?

I have a dataset with lots of features (mostly categorical features(Yes/No)) and lots of missing values.
One of the techniques for dimensionality reduction is to generate a large and carefully constructed set of trees against a target attribute and then use each attribute’s usage statistics to find the most informative subset of features. That is basically we can generate a large set of very shallow trees, with each tree being trained on a small fraction of the total number of attributes. If an attribute is often selected as best split, it is most likely an informative feature to retain.
I am also using an imputer to fill the missing values.
My doubt is what should be the order to the above two. Which of the above two (dimensionality reduction and imputation) to do first and why?
From mathematical perspective you should always avoid data imputation (in the sense - use it only if you have to). In other words - if you have a method which can work with missing values - use it (if you do not - you are left with data imputation).
Data imputation is nearly always heavily biased, it has been shown so many times, I believe that I even read paper about it which is ~20 years old. In general - in order to do a statistically sound data imputation you need to fit a very good generative model. Just imputing "most common", mean value etc. makes assumptions about the data of similar strength to the Naive Bayes.

Element-wise operations on arrays of different rank

How do I multiply two arrays of different rank, element-wise? For example, element-wise multiplying every row of a matrix with a vector.
real :: a(m,n), b(n)
My initial thought was to use spread(b,...), but it is my understanding that this tiles b in memory, which would make it undesirable for large arrays.
In MATLAB I would use bsxfun for this.
If the result of the expression is simply being assigned to another variable (versus being an intermediate in a more complicated expression or being used as an actual argument), then a loop (DO [CONCURRENT]) or FORALL assignment is likely to be best from the point of view of execution speed (though it will be processor dependent).

Optimize Blas-like operation - A`*B*A

Given two matrices, A and B, where B is symetric (and positive semi-definite), What is the best (fastest) way to calculate A`*B*A?
Currently, using BLAS, I first compute C=B*A using dsymm (introducing a temporary matrix C) and then A`*C using dgemm.
Is there a better (faster, no temporaries) way to do this using BLAS and mkl?
Thanks.
I'll offer somekind of answer: Compared to the general case A*B*C you know that the end result is symmetric matrix. After computing C=B*A with BLAS subroutine dsymm, you want to compute A'C, but you only need to compute the upper diagonal part of the matrix and the copy the strictly upper diagonal part to the lower diagonal part.
Unfortunately there doesn't seem to be a BLAS routine where you can claim beforehand that given two general matrices, the output matrix will be symmetric. I'm not sure if it would be beneficial to write you own function for this. This probably depends on the size of your matrices and the implementation.
EDIT:
This idea seems to be addressed recently here: A Matrix Multiplication Routine that Updates Only the Upper or Lower Triangular Part of the Result Matrix