Miscalculation of new function "pcsegdist" in Matlab R2018b - system

I try to test the new function "pcsegdist" in Matlab R2018b. However, the result is wrong for Segment point cloud into clusters based on Euclidean distance
Example: I test with 3D data points- 1797 points (please see attached test.txt file). It is noted that the smallest distance between 2 neighbor points is 0.3736
tic
clear;clc;filename = 'test.txt'; load('test.txt');P = test(:,1:3);%get data=coordinate(x,y,z) from set of data "column" at (all row & column 1,2,3)
ptCloud = pointCloud(P);
minDistance = 0.71;%this value should less than the smallest 3D distance between 2 clusters
[labels,numClusters] = pcsegdist(ptCloud,minDistance);%numClusters: the number of Cluster
%labels: is the kx1 matrix. This is index of each voxel in each cluster
toc
%% Generate the cell_cluster
cell_cluster={};x=P(:,1);y=P(:,2);z=P(:,3);
for i=1:numClusters
cluster_i=[x(labels==i),y(labels==i),z(labels==i)];%call x,y,z coord of all points which is belong the same cluster
cell_cluster{end+1} = cluster_i;%this is (1xk)cell. where k=number of cluster
end
figure;Plot_cell(cell_cluster);view(3);% plot result cluster(using function to plot)
But when I verify by using manually method (ground truth data), the result should as below figure:
Thus, I wonder about the result of new function "pcsegdist" in Matlab R2018b, Or I misunderstand or I wrong somewhere ?enter link description here

Related

How to fill a line in 2D image along a given radius with the data in a given line image?

I want to fill a 2D image along its polar radius, the data are stored in a image where each row or column corresponds to the radius in target image. How can I fill the target image efficiently? Such as with iradius or some functions? I do not prefer a pix-pix operation.
Are you looking for something like this?
number maxR = 100
image rValues := realimage("I(r)",4,maxR)
rValues = 10 + trunc(100*random())
image plot :=realimage("Ring",4,2*maxR,2*maxR)
rValues.ShowImage()
plot.ShowImage()
plot = rValues.warp(iradius,0)
You might also want to check out the relevant example code from the F1 help documentation of GMS itself:
Explaining warp a bit:
plot = rValues.warp(iradius,0)
Assigns values to plot based on a value-lookup in rValues.
For each pixel in plot a coordinate position in rValues is computed, and the value is simply looked up. If the computed coordinate is non-integer, bilinear interpolation between the 4 closest points is used.
In the example, the two 'formulas' for the coordinate calculation are simple x' = iradius and y' = 0 where iradius is an expression computed from the coordinate in plot, for convenience.
You can feed any expression into the parameters for warp( ) and the command is closely related to just using the square bracket notation of addressing values. In fact, the only difference is that warp performs the bilinear interpolation of values instead of truncating the coordinates to integer values.

Can't extract clusters from fcluster after using scipy's hierarchichal clustering

After doing hierarchichal clustering on my dataset and plotting it with dendrogram function it seems that it was correct clustered, but when I call function fcluster to extract the cluster ids I just get one cluster id, ever.
Why is this happening?
My code:
for key, values in use_case_idx.items():
vectors = []
labels = []
for value in values:
labels.append(value[0])
vectors.append(value[1])
try:
distance_matrix = pdist(vectors, metric='cosine')
Z = linkage(distance_matrix, 'ward')
plt.title("Ward")
dendrogram(Z, labels=labels)
except:
continue
plt.show()
clusters = fcluster(Z, 10, criterion='distance')
print(clusters)
And thus, the output:
More examples on: https://imgur.com/a/kEfub
What's wrong with this code?
Note: Each vector has 50 dimensions
The y-axis of the dendrogram shows the cophenetic distance between different nodes. Because you are using the distance criterion with a large value (much larger than the cophenetic distance), all elements are grouped into the same cluster.
Try using a smaller threshold (e.g. 0.025 for the first dendrogram you show). The dendrogram can act as a guide to choose "good" thresholds---although "good" is very subjective.
If you want to cluster your data into n distinct clusters you can do this using the criterion 'maxclust' so for example fcluster(data,n,criterion = 'maxclust')

comparing two frequency spectra

I'm trying to compare two frequency spectra but I am confused over a number of points.
One device samples at 40 Hz the other at 100 Hz and so I'm not sure if I need to take this into account. Anyway I have produced frequency spectra from both devices and now I wish to compare these. How can I do correlation at each point so that I get pearson correlations at each point. I know how to do an overall one of course but I want to see points of strong correlation and those less strong?
If you are calculating power spectral densities P(f), then it doesn't matter how your original signal x(t) is sampled. You can directly and quantitatuively compare both spectra. To make sure that you have calculated the spectral densities you can explicitly check Parsevals theorem:
$ \int P(f) df = \int x(t)^2 dt $
Of course you have to think about which frequencies are actually evaluated Remember that a fft gives you frequencies from f = 1/T until or below the Nyquist frequency f_ny = 1/(2 dt) depending on the number of samples in x(t) being even or odd.
Here's a python example code for psd
def psd(x,dt=1.):
"""Computes one-sided power spectral density of x.
PSD estimated via abs**2 of Fourier transform of x
Takes care of even or odd number of elements in x:
- if x is even both f=0 and Nyquist freq. appear once
- if x is odd f=0 appears once and Nyquist freq. does not appear
Note that there are no tapers applied: This may lead to leakage!
Parseval's theorem (Variance of time series equal to integral over PSD) holds and can be checked via
print ( np.var(x), sum(Px*f[1]) )
Accordingly, the etsimated PSD is independent of time series length
Author/date: M. von Papen / 16.03.2017
"""
N = np.size(x)
xf = np.fft.fft(x)
Px = abs(xf)**2./N*dt
f = np.arange(N/2+1)/(N*dt)
if np.mod(N,2) == 0:
Px[1:N/2] = 2.*Px[1:N/2]
else:
Px[1:N/2+1] = 2.*Px[1:N/2+1]
# Take one-sided spectrum
Px = Px[0:N/2+1]
return Px, f

Units of frequency when using FFT in NumPy

I am using the FFT function in NumPy to do some signal processing. I have array called signal
which has one data point for each hour and has a total of 576 data points. I use the following code on signal to look at its fourier transform.
t = len(signal)
ft = fft(signal,n=t)
mgft=abs(ft)
plot(mgft[0:t/2+1])
I see two peaks but I am unsure as to what the units of the x axis are i.e., how they map onto hours? Any help would be appreciated.
Given sampling rate FSample and transform blocksize N, you can calculate the frequency resolution deltaF, sampling interval deltaT, and total capture time capT using the relationships:
deltaT = 1/FSample = capT/N
deltaF = 1/capT = FSample/N
Keep in mind also that the FFT returns value from 0 to FSample, or equivalently -FSample/2 to FSample/2. In your plot, you're already dropping the -FSample/2 to 0 part. NumPy includes a helper function to calculate all this for you: fftfreq.
For your values of deltaT = 1 hour and N = 576, you get deltaF = 0.001736 cycles/hour = 0.04167 cycles/day, from -0.5 cycles/hour to 0.5 cycles/hour. So if you have a magnitude peak at, say, bin 48 (and bin 528), that corresponds to a frequency component at 48*deltaF = 0.0833 cycles/hour = 2 cycles/day.
In general, you should apply a window function to your time domain data before calculating the FFT, to reduce spectral leakage. The Hann window is almost never a bad choice. You can also use the rfft function to skip the -FSample/2, 0 part of the output. So then, your code would be:
ft = np.fft.rfft(signal*np.hanning(len(signal)))
mgft = abs(ft)
xVals = np.fft.fftfreq(len(signal), d=1.0) # in hours, or d=1.0/24 in days
plot(xVals[:len(mgft)], mgft)
Result of fft transformation doesn't map to HOURS, but to frequencies contained in your dataset. It would be beneficial to have your transformed graph so we can see where the spikes are.
You might be having spike at the beginning of the transformed buffer, since you didn't do any windowing.
In general, the dimensional units of frequency from an FFT are the same as the dimensional units of the sample rate attributed to the data fed to the FFT, for example: per meter, per radian, per second, or in your case, per hour.
The scaled units of frequency, per FFT result bin index, are N / theSampleRate, with the same dimensional units as above, where N is the length of the full FFT (you might only be plotting half of this length in the case of strictly real data).
Note that each FFT result peak bin represents a filter with a non-zero bandwidth, so you might want to add some uncertainty or error bounds to the result points you map onto frequency values. Or even use an interpolation estimation method, if needed and appropriate for the source data.

Faster way to perform point-wise interplation of numpy array?

I have a 3D datacube, with two spatial dimensions and the third being a multi-band spectrum at each point of the 2D image.
H[x, y, bands]
Given a wavelength (or band number), I would like to extract the 2D image corresponding to that wavelength. This would be simply an array slice like H[:,:,bnd]. Similarly, given a spatial location (i,j) the spectrum at that location is H[i,j].
I would also like to 'smooth' the image spectrally, to counter low-light noise in the spectra. That is for band bnd, I choose a window of size wind and fit a n-degree polynomial to the spectrum in that window. With polyfit and polyval I can find the fitted spectral value at that point for band bnd.
Now, if I want the whole image of bnd from the fitted value, then I have to perform this windowed-fitting at each (i,j) of the image. I also want the 2nd-derivative image of bnd, that is, the value of the 2nd-derivative of the fitted spectrum at each point.
Running over the points, I could polyfit-polyval-polyder each of the x*y spectra. While this works, this is a point-wise operation. Is there some pytho-numponic way to do this faster?
If you do least-squares polynomial fitting to points (x+dx[i],y[i]) for a fixed set of dx and then evaluate the resulting polynomial at x, the result is a (fixed) linear combination of the y[i]. The same is true for the derivatives of the polynomial. So you just need a linear combination of the slices. Look up "Savitzky-Golay filters".
EDITED to add a brief example of how S-G filters work. I haven't checked any of the details and you should therefore not rely on it to be correct.
So, suppose you take a filter of width 5 and degree 2. That is, for each band (ignoring, for the moment, ones at the start and end) we'll take that one and the two on either side, fit a quadratic curve, and look at its value in the middle.
So, if f(x) ~= ax^2+bx+c and f(-2),f(-1),f(0),f(1),f(2) = p,q,r,s,t then we want 4a-2b+c ~= p, a-b+c ~= q, etc. Least-squares fitting means minimizing (4a-2b+c-p)^2 + (a-b+c-q)^2 + (c-r)^2 + (a+b+c-s)^2 + (4a+2b+c-t)^2, which means (taking partial derivatives w.r.t. a,b,c):
4(4a-2b+c-p)+(a-b+c-q)+(a+b+c-s)+4(4a+2b+c-t)=0
-2(4a-2b+c-p)-(a-b+c-q)+(a+b+c-s)+2(4a+2b+c-t)=0
(4a-2b+c-p)+(a-b+c-q)+(c-r)+(a+b+c-s)+(4a+2b+c-t)=0
or, simplifying,
22a+10c = 4p+q+s+4t
10b = -2p-q+s+2t
10a+5c = p+q+r+s+t
so a,b,c = p-q/2-r-s/2+t, (2(t-p)+(s-q))/10, (p+q+r+s+t)/5-(2p-q-2r-s+2t).
And of course c is the value of the fitted polynomial at 0, and therefore is the smoothed value we want. So for each spatial position, we have a vector of input spectral data, from which we compute the smoothed spectral data by multiplying by a matrix whose rows (apart from the first and last couple) look like [0 ... 0 -9/5 4/5 11/5 4/5 -9/5 0 ... 0], with the central 11/5 on the main diagonal of the matrix.
So you could do a matrix multiplication for each spatial position; but since it's the same matrix everywhere you can do it with a single call to tensordot. So if S contains the matrix I just described (er, wait, no, the transpose of the matrix I just described) and A is your 3-dimensional data cube, your spectrally-smoothed data cube would be numpy.tensordot(A,S).
This would be a good point at which to repeat my warning: I haven't checked any of the details in the few paragraphs above, which are just meant to give an indication of how it all works and why you can do the whole thing in a single linear-algebra operation.