I have a numpy 2dim array that represents a multi channel Bio-Signal. This array has dimension 20 x n_samples where the columns represent : Sample number - 16 channels data - time.
Given to bluetooth connection i have some package drops so i have gaps in signal. The array has to be imported into MNE-Python for further analysis. This library assumes that the sampling rate is constant (it's not able to able to handle gaps assuming that we MUST have a sample every 4 ms) so i have tried 3 different approaches:
Don't fill the gaps and let the signal to be spliced together (MNE Python create a structure with data equally spaced)
Fill the gaps with np.nan
Fill the gaps with 0s
My question is regarding the filtering that i need to apply on the data. I have used scipy.welch in order to get the PSD of the signal. It seems that the signal with nan as filler performs better than the original one and the one filled with 0s but the behavior is strange once i try to get the psd of a low passed and high pass filtered version of the signal.
Does anyone know what is the best approach?
Here are 3 images for the different filling strategies. (The top ones are the psd obtained with MNE library, the bottom ones with scipy.welch). The filter used is a FIR.
Filled with NAN
Filled with 0s
Spliced
Related
I have a large quantity of missing values that appear at random in my data. Unfortunately, I cannot simply drop observations with missing data as I am grouping observations by a feature and cannot drop NaNs without affecting the entire group.
I was hoping to simply mask features that were missing. So a single group might have 8 items in it, and each item may have 0 to N features, depending on how many got masked due to being missing.
I have been experimenting a lot with RaggedTensors, but have encountered a lot of issues ranging from not being able to flatten the RaggedTensor, not being able to concatenate it with regular tensors of uniform shape, and Dense layers requiring the last dimension of their input to be known, aka the number of features.
Does anybody know if there is a way to do this?
Leaving that they are from two different binaries.
I know that series/dataframe can hold any data type, and ndarray is also heterogenous data.
And also all the slicing operations of numpy are applicable to series.
Is there any other difference between them?
After some research I found the answer to my question I asked above. For anyone who needs, here it is from pandas docs:
A key difference between Series and ndarray is that operations between
Series automatically align the data based on the label. Thus, you can
write computations without giving consideration to whether the Series
involved have the same labels.
An example:
s[1:] + s[:-1]
The result for above would produce NaN for both first and last index.
If a label is not found in one Series or the other, the result will be marked as missing NaN.
I have been looking for ways to transfer matrix data from one block to another. I was wondering if it's possible to do the same. What I've thought of till now is converting the numpy matrix to a list, and sending the list through after padding it with the number of rows and columns in the end. After receiving, just reshape the list to a numpy matrix and process as required. But from what I understand, the length of a list must be known while making the blocks.
I'd like to know if it's possible to implement this, or if I'll have to look at it in some other way.
GNU Radio doesn't care what your items actually represent, only their size in bytes.
Therefore, you can define arbitrary item sizes, and put multiple numbers in one item. In fact, what the stream_to_vector and vector_to_stream do is exactly that.
You'd use a output_signature = gr.io_signature(1,1, [gr.sizeofgr_complex] * N_elements) with N_elements being your number of matrix entries.
As a side note: exchanging matrices does reek of things of channel estimates or equalization; these are often more elegantly handled by asynchronous message passing than item streams.
I have a dataframe data containing real values and some NaN values. I'm trying to perform locality sensitive hashing using random projections to reduce the dimension to 25 components, specifically with thesklearn.random_projection.GaussianRandomProjection class. However, when I run:
tx = random_projection.GaussianRandomProjection(n_components = 25)
data25 = tx.fit_transform(data)
I get Input contains NaN, infinity or a value too large for dtype('float64'). Is there a work-around to this? I tried changing all the NaN values to a value that is never present in my dataset, such as -1. How valid would my output be in this case? I'm not an expert behind the theory of locality sensitive hashing/random projections so any insight would be helpful as well. Thanks.
NA / NaN values (not-available / not-a-number) are, I have found, just plain troublesome.
You don't want to just substitute a random value like -1. If you are inclined to do that, use one of the Imputer classes. Otherwise, you are likely to very substantially change the distances between points. You likely want to preserve distances as much as possible if you are using random projection:
The dimensions and distribution of random projections matrices are controlled so as to preserve the pairwise distances between any two samples of the dataset.
However, this may or may not result in reasonable values for learning. As far as I know, imputation is an open field of study, which (for instance) this gentlemen has specialized in studying.
If you have enough examples, consider dropping rows or columns that contain NaN values. Another possibility is training a generative model like a Restricted Boltzman Machine and use that to fill in missing values:
rbm = sklearn.neural_network.BernoulliRBM().fit( data_with_no_nans )
mean_imputed_data = sklearn.preprocessing.Imputer().fit_transform( all_data )
rbm_imputation = rbm.gibbs( mean_imputed_data )
nan_mask = np.isnan( all_data )
all_data[ nan_mask ] = rbm_imputation[ nan_mask ]
Finally, you might consider imputing using nearest neighbors. For a given column, train a nearest neighbors model on all the variables except that column using all complete rows. Then, for a row missing that column, find the k nearest neighbors and use the average value among them. (This gets very costly, especially if you have rows with more than one missing value, as you will have to train a model for every combination of missing columns).
I have some Kinect data of somebody standing (reasonably) still and performing sets of punches. I am given it in the format of an x,y,z co-ordinate for each joint of which they are 20, so I have 60 data points per frame.
I'm trying to perform a classification task on the punches however I'm having some problems normalising my data. As you can see from the graph there are sections with much higher 'amplitude' than the others, my belief is that this is due to how close that person was to the kinect sensor when the readings were taken. (The graph is actually the first principal coefficient obtained by PCA for each frame, multiple sequences of the same punch are strung together in this graph)
Looking back at the data files it looks like those that are 'out' have a z co-ordinate (depth from sensor) of ~2.7 where as the others tent to hover around 3.3-3.6.
How can I perform a normalization with the depth values to make them closer to each other for each sequence? I've already tried differentiation to get the velocity, although it helps to normalise the output actually ends up too similar and makes it very hard to classify.
Edit: I should mention I am already using a normalization method by subtracting the hip position from each joint in an attempt to make the co-ordinates relative.
The Kinect can output some strange values when the person that is tracked is standing near the edges of the view of the Kinect. I would either completly ignore these data or just replace the data with an average of the previous 2 and next 2.
For example:
1,2,1,12,1,2,3
Replace 12 with (2 + 1 + 1 + 2) / 4 = 1.5
You can basically do this with the whole array of values you have, this way you have a more normalised line/graph.
You can also use the clippedEdges value to determine if one or more joints is outside the view.