Multiscale morphological dilation and erosion - multiscaleimage

Can anyone please specify what is meant by multiscale morphological filtering ? I understand the basic concepts of dilation and erosion. But in multiscale filtering, a scaled structuring function is being used. What does the term scaled mean ?
Please find more relevant information here : Please check link. I want to apply this structuring element in matlab coding but cannot do so. Please can anyone help me ?

Here the multiscale operator is described as:
F(x,s1,s2) = (f-s1)+s2
where f(x) is the original function and s1(x) is the structure function. Apparently, erosion and
dilation with different scales can filter positive and negative noises more perfectly.This operation satisfies
the four quantification principles of morphological filter. (from paper)
This operator is known in the Morphology community as an Alternating Sequential Filter, which basically performs filtering using a alternating series of dilations and erosions or openings and closings of increasing radii on the same image. This series of radii for the given structuring function can be decided based on the structure of the object/detail to be extracted or filtered. One can note that there are two different structuring elements s1 and s2 used to decide different scales for the erosions and dilations. This Matlab chain discusses on how to test it.

Related

Justification for the multiplication step in the proof of the front door adjustment

The proofs of the front-door adjustment that I've read take three steps:
Show P(M|do(X)) is identifiable
Show P(Y|do(M)) is identifiable
Multiply the do-free expressions for P(M|do(X)) and P(y|do(M)) to obtain P(Y|do(X))
where Y,X,M meet the assumptions for the frontdoor adjustment. A graph meeting these assumptions is:
X->M;M->Y;U->X;U->Y
I'm sure I'm being daft here, but I don't understand what justifies simply multiplying the expressions together to get P(Y|do(X)).
This is like saying:
P(Y|do(X)) = P(Y|do(M)) * P(M|do(X))
(where perhaps the assumptions for the front-door adjustment are necessary) but I don't recognize this rule in my study of causal inference.
Formal Description In a graph with a mediator and an unobserved confounder like the one you described, the path X->M->Y is confounded. Yet, the path X->M is unconfounded (Y is a collider), and we can estimate M->Y by controlling for X. Doing so gives us the two causal quantities that make up the path X->Y. For a graph with any kind of causal functions, propagating an effect along several edges means composing the functions that describe each edge. In the case of linear functions, this simply amounts to their multiplication (the assumption of linearity is very common in the context of causal inference).
Just like you said, we can write this as
P(M|do(X)) = P(M|X) # no controls necessary
P(Y|do(M)) = E_T(Y|X,T) # we marginalize out T
P(Y|do(X)) = P(M|do(X)) * P(Y|do(M)) # composing the functions
Intuition: The whole "trick" of the front door procedure lies in realizing that we can break up the estimation of a causal path along more than one edge into the estimation of its components, which can be solved without adjustment (X->M in our case), or using the backdoor criterion (M->Y in our case). Then, we can simply compose the individual parts to get the overall effect.
A nice explanation is also given in this video.

How is hashing implemented in SGNN (Self-Governing Neural Networks)?

So I've read the paper named Self-Governing Neural Networks for On-Device Short Text Classification which presents an embedding-free approach to projecting words into a neural representation. To quote them:
The key advantage of SGNNs over existing work is that they surmount the need for pre-trained word embeddings and complex networks with huge parameters. [...] our method is a truly embedding-free approach unlike majority of the widely-used state-of-the-art deep learning techniques in NLP
Basically, from what I understand, they proceed as follow:
You'd first need to compute n-grams (side-question: is that skip-gram like old skip-gram, or new skip-gram like word2vec? I assume it's the first one for what remains) on words' characters to obtain a featurized representation of words in a text, so as an example, with 4-grams you could yield a 1M-dimensional sparse feature vector per word. Hopefully, it's sparse so memory needn't to be fully used for that because it's almost one-hot (or count-vectorized, or tf-idf vectorized ngrams with lots of zeros).
Then you'd need to hash those n-grams sparse vectors using Locality-sensitive hashing (LSH). They seem to use Random Projection from what I've understood. Also, instead of ngram-vectors, they instead use tuples of n-gram feature index and its value for non-zero n-gram feature (which is also by definition a "sparse matrix" computed on-the-fly such as from a Default Dictionary of non-zero features instead of a full vector).
I found an implementation of Random Projection in scikit-learn. From my tests, it doesn't seem to yield a binary output, although the whole thing is using sparse on-the-fly computations within scikit-learn's sparse matrices as expected for a memory-efficient (non-zero dictionnary-like features) implementation I guess.
What doesn't work in all of this, and where my question lies, is in how they could end up with binary features from the sparse projection (the hashing). They seem to be saying that the hashing is done at the same time of computing the features, which is confusing, I would have expected the hashing to come in the order I wrote above as in 1-2-3 steps, but their steps 1 and 2 seems to be somehow merged.
My confusion arises mostly from the paragraphs starting with the phrase "On-the-fly Computation." at page 888 (PDF's page 2) of the paper in the right column. Here is an image depicting the passage that confuses me:
I'd like to convey my school project to a success (trying to mix BERT with SGNNs instead of using word embeddings). So, how would you demystify that? More precisely, how could a similar random hashing projection be achieved with scikit-learn, or TensorFlow, or with PyTorch? Trying to connect the dots here, I've significantly researched but their paper doesn't give implementation details, which is what I'd like to reproduce. I at least know that the SGNN uses 80 fourten-dimensionnal LSHes on character-level n-grams of words (is my understanding right in the first place?).
Thanks!
EDIT: after starting to code, I realized that the output of scikit-learn's SparseRandomProjection() looks like this:
[0.7278244729081154,
-0.7278244729081154,
0.0,
0.0,
0.7278244729081154,
0.0,
...
]
For now, this looks fine, it's closer to binary but it would still be castable to an integer instead of a float by using the good ratio in the first place. I still wonder about the skip-gram thing, I assume n-gram of characters of words for now but it's probably wrong. Will post code soon to GitHub.
EDIT #2: I coded something here, but with n-grams instead of skip-grams: https://github.com/guillaume-chevalier/SGNN-Self-Governing-Neural-Networks-Projection-Layer
More discussion threads on this here: https://github.com/guillaume-chevalier/SGNN-Self-Governing-Neural-Networks-Projection-Layer/issues?q=is%3Aissue
First of all, thanks for your implementation of the projection layer, it helped me get started with my own.
I read your discussion with #thinline72, and I agree with him that the features are calculated in the whole line of text, char by char, not word by word. I am not sure this difference in features is too relevant, though.
Answering your question: I interpret that they do steps 1 and 2 separately, as you suggested and did. Right, in the article excerpt that you include, they talk about hashing both in feature construction and projection, but I think those are 2 different hashes. And I interpret that the first hashing (feature construction) is automatically done by the CountVectorizer method.
Feel free to take a look at my implementation of the paper, where I built the end-to-end network and trained on the SwDA dataset, as split in the SGNN paper. I obtain a max of 71% accuracy, which is somewhat lower than the paper claims. I also used the binary hasher that #thinline72 recommended, and nltk's implementation of skipgrams (I am quite certain the SGNN paper is talking about "old" skipgrams, not "word2vec" skipgrams).

Implementing a 2D recursive spatial filter using Scipy

Minimally, I would like to know how to achieve what is stated in the title. Specifically, signal.lfilter seems like the only implementation of a difference equation filter in scipy, but it is 1D, as shown in the docs. I would like to know how to implement a 2D version as described by this difference equation. If that's as simple as "bro, use this function," please let me know, pardon my naiveté, and feel free to disregard the rest of the post.
I am new to DSP and acknowledging there might be a different approach to answering my question so I will explain the broader goal and give context for the question in the hopes someone knows how do want I want with Scipy, or perhaps a better way than what I explicitly asked for.
To get straight into it, broadly speaking I am using vectorized computation methods (Numpy/Scipy) to implement a Monte Carlo simulation to improve upon a naive for loop. I have successfully abstracted most of my operations to array computation / linear algebra, but a few specific ones (recursive computations) have eluded my intuition and I continually end up in the digital signal processing world when I go looking for how this type of thing has been done by others (that or machine learning but those "frameworks" are much opinionated). The reason most of my google searches end up on scipy.signal or scipy.ndimage library references is clear to me at this point, and subsequent to accepting the "signal" representation of my data, I have spent a considerable amount of time (about as much as reasonable for a field that is not my own) ramping up the learning curve to try and figure out what I need from these libraries.
My simulation entails updating a vector of data representing the state of a system each period for n periods, and then repeating that whole process a "Monte Carlo" amount of times. The updates in each of n periods are inherently recursive as the next depends on the state of the prior. It can be characterized as a difference equation as linked above. Additionally this vector is theoretically indexed on an grid of points with uneven stepsize. Here is an example vector y and its theoretical grid t:
y = np.r_[0.0024, 0.004, 0.0058, 0.0083, 0.0099, 0.0133, 0.0164]
t = np.r_[0.25, 0.5, 1, 2, 5, 10, 20]
I need to iteratively perform numerous operations to y for each of n "updates." Specifically, I am computing the curvature along the curve y(t) using finite difference approximations and using the result at each point to adjust the corresponding y(t) prior to the next update. In a loop this amounts to inplace variable reassignment with the desired update in each iteration.
y += some_function(y)
Not only does this seem inefficient, but vectorizing things seems intuitive given y is a vector to begin with. Furthermore I am interested in preserving each "updated" y(t) along the n updates, which would require a data structure of dimensions len(y) x n. At this point, why not perform the updates inplace in the array? This is wherein lies the question. Many of the update operations I have succesfully vectorized the "Numpy way" (such as adding random variates to each point), but some appear overly complex in the array world.
Specifically, as mentioned above the one involving computing curvature at each element using its neighbouring two elements, and then imediately using that result to update the next row of the array before performing its own curvature "update." I was able to implement a non-recursive version (each row fails to consider its "updated self" from the prior row) of the curvature operation using ndimage generic_filter. Given the uneven grid, I have unique coefficients (kernel weights) for each triplet in the kernel footprint (instead of always using [1,-2,1] for y'' if I had a uniform grid). This last part has already forced me to use a spatial filter from ndimage rather than a 1d convolution. I'll point out, something conceptually similar was discussed in this math.exchange post, and it seems to me only the third response saliently addressed the difference between mathematical notion of "convolution" which should be associative from general spatial filtering kernels that would require two sequential filtering operations or a cleverly merged kernel.
In any case this does not seem to actually address my concern as it is not about 2D recursion filtering but rather having a backwards looking kernel footprint. Additionally, I think I've concluded it is not applicable in that this only allows for "recursion" (backward looking kernel footprints in the spatial filtering world) in a manner directly proportional to the size of the recursion. Meaning if I wanted to filter each of n rows incorporating calculations on all prior rows, it would require a convolution kernel far too big (for my n anyways). If I'm understanding all this correctly, a recursive linear filter is algorithmically more efficient in that it returns (for use in computation) the result of itself applied over the previous n samples (up to a level where the stability of the algorithm is affected) using another companion vector (z). In my case, I would only need to look back one step at output signal y[n-1] to compute y[n] from curvature at x[n] as the rest works itself out like a cumsum. signal.lfilter works for this, but I can't used that to compute curvature, as that requires a kernel footprint that can "see" at least its left and right neighbors (pixels), which is how I ended up using generic_filter.
It seems to me I should be able to do both simultaneously with one filter namely spatial and recursive filtering; or somehow I've missed the maths of how this could be mathematically simplified/combined (convolution of multiples kernels?).
It seems like this should be a common problem, but perhaps it is rarely relevant to do both at once in signal processing and image filtering. Perhaps this is why you don't use signals libraries solely to implement a fast monte carlo simulation; though it seems less esoteric than using a tensor math library to implement a recursive neural network scan ... which I'm attempting to do right now.
EDIT: For those familiar with the theoretical side of DSP, I know that what I am describing, the process of designing a recursive filters with arbitrary impulse responses, is achieved by employing a mathematical technique called the z-transform which I understand is generally used for two things:
converting between the recursion coefficients and the frequency response
combining cascaded and parallel stages into a single filter
Both are exactly what I am trying to accomplish.
Also, reworded title away from FIR / IIR because those imply specific definitions of "recursion" and may be confusing / misnomer.

How to identify relevant features in WEKA?

I would like to perform feature analysis in WEKA. I have a data set of 8 features and 65 instances.
I would like to perform feature selection and optimization functionalities that are available for machine learning methods like SVM.
For example in Weka I would like to know how I can display which of the features contribute best to the classification result.
I think that WEKA provides a nice graphical user interface and allows a very detailed analysis of the influence of single features. But I dont know how to use it. Any help?
You have two options:
You can perform attribute selection using filters. For instance you can use the AttributeSelection tab (or filter) with the search method Ranker and the attribute evaluation metric InfoGainAttributeEval. This way you get a ranked list of the most predictive features according to its Information Gain score. I have done this many times with good results. Sometimes it helps even to increase the accuracy of SVMs, which are known not to need (too much) of feature selection. You can try with other search methods in order to find subgroups of coupled predictors, and with other metrics.
You can just look at the coefficients in the SVM output. For instance, in linear SVMs, the classifier is a polynomial like a1.f1 + a2.f2 + ... + an.fn + fn+1 > 0, being ai the attribute values for an instance, and fi the "weights" obtained in the SVM training algorithm. In consequence, those weights with values close to 0 represent attributes that do not count too much, thus being bad predictors; extreme weights (either positive or negative) represent good predictors.
Additionally, you can check the visualization options available for a particular classifier (e.g. J48 is a decision tree, the attribute used in the root test is for the best predictor). You can check the AttributeSelection tab visualization options as well.

Interpolation from irregular grid to regular grid

I have some 1D data (time series data) that is sampled irregularly; i.e., non-constant sample rate. I would like transform these data into a regularly sampled (uniform sample rate) time series. I have used linear interpolation in an attempt to accomplish this; however, this is not very effective when there is a large variation in the time between samples. This is no surprise. I have also attempted some ad hoc methods that again are not very effective.
I have looked at several papers on the use of matching pursuit for interpolation over irregular grids; but, how this approach could be used to obtain samples over a regular grid is not clear to me (at least not yet).
I would appreciate any suggestions on algorithms for interpolation from irregular grids to regular grids (1D data).
If you want to fit the data points exactly, run
scipy.interpolate.UnivariateSpline
with s=0
(and ask further if that's not clear).