Stata - Multiple rotated plots on graph (including distributions on sides of axes) - data-visualization

I would like to produce a single graph containing both: (1) a scatter plot (2) either histograms or kernel density functions of the Y and X variables to the left of the Y axis and below the X axis.
I found a graph that does this in MATLAB -- I would just like to produce something similar in Stata:
That graph was produced using the following MATLAB code:
n = 1000;
rho = .7;
Z = mvnrnd([0 0], [1 rho; rho 1], n);
U = normcdf(Z);
X = [gaminv(U(:,1),2,1) tinv(U(:,2),5)];
[n1,ctr1] = hist(X(:,1),20);
[n2,ctr2] = hist(X(:,2),20);
subplot(2,2,2); plot(X(:,1),X(:,2),'.'); axis([0 12 -8 8]); h1 = gca;
title('1000 Simulated Dependent t and Gamma Values');
xlabel('X1 ~ Gamma(2,1)'); ylabel('X2 ~ t(5)');
subplot(2,2,4); bar(ctr1,-n1,1); axis([0 12 -max(n1)*1.1 0]); axis('off'); h2 = gca;
subplot(2,2,1); barh(ctr2,-n2,1); axis([-max(n2)*1.1 0 -8 8]); axis('off'); h3 = gca;
set(h1,'Position',[0.35 0.35 0.55 0.55]);
set(h2,'Position',[.35 .1 .55 .15]);
set(h3,'Position',[.1 .35 .15 .55]);
colormap([.8 .8 1]);
UPDATE: The Stata13 manual entry for "graph combine" has precisely this example (http://www.stata.com/manuals13/g-2graphcombine.pdf). Here is the code:
use http://www.stata-press.com/data/r13/lifeexp, clear
generate loggnp = log10(gnppc)
label var loggnp "Log base 10 of GNP per capita"
scatter lexp loggnp, ysca(alt) xsca(alt) xlabel(, grid gmax) fysize(25) saving(yx)
twoway histogram lexp, fraction xsca(alt reverse) horiz fxsize(25) saving(hy)
twoway histogram loggnp, fraction ysca(alt reverse) ylabel(,nogrid) xlabel(,grid gmax) saving(hx)
graph combine hy.gph yx.gph hx.gph, hole(3) imargin(0 0 0 0) graphregion(margin(l=22 r=22)) title("Life expectancy at birth vs. GNP per capita") note("Source: 1998 data from The World Bank Group")

There's probably a better a way to do it, but this is my quick attempt to take up the challenge.
sysuse auto,clear
set obs 1000
twoway scatter mpg price, saving(sct,replace) ///
xsc(r(0(5000)20000) off ) ysc(r(10(10)50) off) ///
xti("") yti("") xlab(,nolab) ylab(,nolab)
kdensity mpg, n(1000) k(gauss) gen(x0 d0)
line x0 d0, xsc(rev off) ysc(alt) xlab(,nolab) xtick(,notick) saving(hist0, replace)
kdensity price, n(1000) k(gauss) gen(x1 d1)
line d1 x1, xsc(alt) ysc(rev off) ylab(,nolab) ytick(,notick) saving(hist1, replace)
graph combine hist0.gph sct.gph hist1.gph, cols(2) holes(3)
I'd also like to know if there are ways to improve on it. The codes are not very neat, and I had trouble properly aligning the line plot and the scatter plot without removing the ticks and labels of the scatter plot (xcommon and ycommon did not really do the job for me).
credit to this post on the Statalist.

Related

How to calculate approximate fourier coefficients using np.trapz

I have a dataset which looks roughly as follows (and is sinusoidal in nature):
TW-240-run1.txt
Point Number Temperature
0 51.504781
1 51.487722
2 51.487722
3 51.828893
4 51.828893
5 51.436547
6 51.368312
7 51.726542
8 51.368312
9 51.317137
10 51.317137
11 51.283020
12 51.590073
.
.
.
9599 51.675366
I am tasked with finding the fundamental/first fourier coefficients, a_n and b_n for this dataset, by means of a numerical integration technique. In this case, I am simply using numpy.trapz from numpy, which aims to implement the trapezium rule. The fourier coefficients, a_n and b_n can be calculated with the following formulae:
where tau (𝛕) is the time period of the sine function. For my case, 𝛕 = 240 seconds (referring to the point number 240 on the data sheet), and thus the bounds of integration are 0 to 240. T(t) from the above formulae is the data set and n = 1.
My current code for trying to calculate the fourier coefficients is as follows:
# Packages
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
#input data from datasheet, the loadtxt below takes in the data from t = 0s to t = 240s
x1, y1 = np.loadtxt(r'C:\Users\Sidharth\Documents\y2python\y2python\thermal_4min_a.txt', unpack=True, skiprows=3)
tau_4min = 240.0
def cosine(period, t, n):
return np.cos((2*np.pi*n*t)/(period)) #defines the cos term for the a_n formula
def sine(period, t, n): #defines the sin term for the a_n formula
return np.sin((2*np.pi*n*t)/(period))
a_1_4min = (2/tau_4min)*np.trapz((y1*cos_term_4min), x1) #implement a_n formula (trapezium rule for T(t)*cos)
print('a_1 is', a_1_4min)
b_1_4min = (2/tau_4min)*np.trapz((y1*sin_term_4min), x1) #implement b_n formula (trapezium rule for T(t)*cos)
print('b_1 is', b_1_4min)
Essentially what this is doing is, it takes in the data, but only up to the row index 241 (point number 240), and then multiplies it by the sine/cosine term from each of the above formulae. However, I realise this isn't calculating the fourier coefficients properly.
My question(s) are as follows:
Will my code work if I can find a way to set limits of integration for np.trapz and then importing the entire data set, instead of only importing the data points from 0 to 240 and multiplying it by the cos or sine term, then using np trapz on that product, as I am currently doing (0 and 240 are supposed to be my limits of integration)

How to implement a method to generate Poincaré sections for a non-linear system of ODEs?

I have been trying to work out how to calculate Poincaré sections for a system of non-linear ODEs, using a paper on the exact system as reference, and have been wrestling with numpy to try and make it run better. This is intended to run within a bounded domain.
Currently, I have the following code
import numpy as np
from scipy.integrate import odeint
X = 0
Y = 1
Z = 2
def generate_poincare_map(function, initial, plane, iterations, delta):
intersections = []
p_i = odeint(function, initial.flatten(), [0, delta])[-1]
for i in range(1, iterations):
p_f = odeint(function, p_i, [i * delta, (i+1) * delta])[-1]
if (p_f[Z] > plane) and (p_i[Z] < plane):
intersections.append(p_i[:2])
if (p_f[Z] > plane) and (p_i[Z] < plane):
intersections.append(p_i[:2])
p_i = p_f
return np.stack(intersections)
This is pretty wasteful due to the integration solely between successive time steps, and seems to produce incorrect results. The original reference includes sections along the lines of
whereas mine tend to result in something along the lines of
Do you have any advice on how to proceed to make this more correct, and perhaps a little faster?
To get a Pointcaré map of the ABC flow
def ABC_ode(u,t):
A, B, C = 0.75, 1, 1 # matlab parameters
x, y, z = u
return np.array([
A*np.sin(z)+C*np.cos(y),
B*np.sin(x)+A*np.cos(z),
C*np.sin(y)+B*np.cos(x)
])
def mysolver(u0, tspan): return odeint(ABC_ode, u0, tspan, atol=1e-10, rtol=1e-11)
you have first to understand that the dynamical system is really about the points (cos(x),sin(x)) etc. on the unit circle. So values different by multiples of 2*pi represent the same point. In the computation of the section one has to reflect this, either by computing it on the Cartesian product of the 3 circles. Let's stay with the second variant, and chose [-pi,pi] as the fundamental period to have the zero location well in the center. Keep in mind that jumps larger pi are from the angle reduction, not from a real crossing of that interval.
def find_crosssections(x0,y0):
u0 = [x0,y0,0]
px = []
py = []
u = mysolver(u0, np.arange(0, 4000, 0.5)); u0 = u[-1]
u = np.mod(u+pi,2*pi)-pi
x,y,z = u.T
for k in range(len(z)-1):
if z[k]<=0 and z[k+1]>=0 and z[k+1]-z[k]<pi:
# find a more exact intersection location by linear interpolation
s = -z[k]/(z[k+1]-z[k]) # 0 = z[k] + s*(z[k+1]-z[k])
rx, ry = (1-s)*x[k]+s*x[k+1], (1-s)*y[k]+s*y[k+1]
px.append(rx);
py.append(ry);
return px,py
To get a full picture of the Poincare cross-section and avoid duplicate work, use a grid of squares and mark if one of the intersections already fell in it. Only start new iterations from the centers of free squares.
N=20
grid = np.zeros([N,N], dtype=int)
for i in range(N):
for j in range(N):
if grid[i,j]>0: continue;
x0, y0 = (2*i+1)*pi/N-pi, (2*j+1)*pi/N-pi
px, py = find_crosssections(x0,y0)
for rx,ry in zip(px,py):
m, n = int((rx+pi)*N/(2*pi)), int((ry+pi)*N/(2*pi))
grid[m,n]=1
plt.plot(px, py, '.', ms=2)
You can now play with the density of the grid and the length of the integration interval to get the plot a little more filled out, but all characteristic features are already here. But I'd recommend re-programming this in a compiled language, as the computation will take some time.

Transform a vector to another frame of reference

I have a green vehicle which will shortly collide with a blue object (which is 200 away from the cube)
It has a Kinect depth camera D at [-100,0,200] which sees the corner of the cube (grey sphere)
The measured depth is 464 at 6.34° in the X plane and 12.53° in the Y plane.
I want to calculate the position of the corner as it would appear if there was a camera F at [150,0,0], which would see this:
in other words transform the red vector into the yellow vector. I know that this is achieved with a transformation matrix but I can't find out how to compute the matrix from the D-F vector [250,0,-200] or how to use it; my high-school maths dates back 40 years.
math.se has a similar question but it doesn't cover my problem and I can't find anything on robotices.se either.
I realise that I should show some code that I've tried, but I don't know where to start. I would be very grateful if somebody could help me to solve this.
ROS provides the tf library which allows you to transform between frames. You can simply set a static transform between the pose of your camera and the pose of your desired location. Then, you can get the pose of any point detected by your camera in the reference frame of your desired point on your robot. ROS tf will do everything you need and everything I explain below.
The longer answer is that you need to construct a transformation tree. First, compute the static transformation between your two poses. A pose is a 7-dimensional transformation including a translation and orientation. This is best represented as a quaternion and a 3D vector.
Now, for all poses in the reference frame of your kinect, you must transform them to your desired reference frame. Let's call this frame base_link and your camera frame camera_link.
I'm going to go ahead and decide that base_link is the parent of camera_link. Technically these transformations are bidirectional, but because you may need a transformation tree, and because ROS cares about this, you'll want to decide who is the parent.
To convert rotation from camera_link to base_link, you need to compute the rotational difference. This can be done by multiplying the quaternion of base_link's orientation by the conjugate of camera_link's orientation. Here's a super quick Python example:
def rotDiff(self,q1: Quaternion,q2: Quaternion) -> Quaternion:
"""Finds the quaternion that, when applied to q1, will rotate an element to q2"""
conjugate = Quaternion(q2.qx*-1,q2.qy*-1,q2.qz*-1,q2.qw)
return self.rotAdd(q1,conjugate)
def rotAdd(self, q1: Quaternion, q2: Quaternion) -> Quaternion:
"""Finds the quaternion that is the equivalent to the rotation caused by both input quaternions applied sequentially."""
w1 = q1.qw
w2 = q2.qw
x1 = q1.qx
x2 = q2.qx
y1 = q1.qy
y2 = q2.qy
z1 = q1.qz
z2 = q2.qz
w = w1 * w2 - x1 * x2 - y1 * y2 - z1 * z2
x = w1 * x2 + x1 * w2 + y1 * z2 - z1 * y2
y = w1 * y2 + y1 * w2 + z1 * x2 - x1 * z2
z = w1 * z2 + z1 * w2 + x1 * y2 - y1 * x2
return Quaternion(x,y,z,w)
Next, you need to add the vectors. The naive approach is to simply add the vectors, but you need to account for rotation when calculating these. What you really need is a coordinate transformation. The position of camera_link relative to base_link is some 3D vector. Based on your drawing, this is [-250, 0, 200]. Next, we need to reproject the vectors to your points of interest into the rotational frame of base_link. I.e., all the points your camera sees at 12.53 degrees that appear at the z = 0 plane to your camera are actually on a 12.53 degree plane relative to base_link and you need to find out what their coordinates are relative to your camera as if your camera was in the same orientation as base_link.
For details on the ensuing math, read this PDF (particularly starting at page 9).
To accomplish this, we need to find your vector's components in base_link's reference frame. I find that it's easiest to read if you convert the quaternion to a rotation matrix, but there is an equivalent direct approach.
To convert a quaternion to a rotation matrix:
def Quat2Mat(self, q: Quaternion) -> rotMat:
m00 = 1 - 2 * q.qy**2 - 2 * q.qz**2
m01 = 2 * q.qx * q.qy - 2 * q.qz * q.qw
m02 = 2 * q.qx * q.qz + 2 * q.qy * q.qw
m10 = 2 * q.qx * q.qy + 2 * q.qz * q.qw
m11 = 1 - 2 * q.qx**2 - 2 * q.qz**2
m12 = 2 * q.qy * q.qz - 2 * q.qx * q.qw
m20 = 2 * q.qx * q.qz - 2 * q.qy * q.qw
m21 = 2 * q.qy * q.qz + 2 * q.qx * q.qw
m22 = 1 - 2 * q.qx**2 - 2 * q.qy**2
result = [[m00,m01,m02],[m10,m11,m12],[m20,m21,m22]]
return result
Now that your rotation is represented as a rotation matrix, it's time to do the final calculation.
Following the MIT lecture notes from my link above, I'll arbitrarily name the vector to your point of interest from the camera A.
Find the rotation matrix that corresponds with the quaternion that represents the rotation between base_link and camera_link and simply perform a matrix multiplication. If you're in Python, you can use numpy to do this, but in the interest of being explicit, here is the long form of the multiplication:
def coordTransform(self, M: RotMat, A: Vector) -> Vector:
"""
M is my rotation matrix that represents the rotation between my frames
A is the vector of interest in the frame I'm rotating from
APrime is A, but in the frame I'm rotating to.
"""
APrime = []
i = 0
for component in A:
APrime.append(component * M[i][0] + component * M[i][1] + component * m[i][2])
i += 1
return APrime
Now, the vectors from camera_link are represented as if camera_link and base_link share an orientation.
Now you may simply add the static translation between camera_link and base_link (or subtract base_link -> camera_link) and the resulting vector will be your point's new translation.
Putting it all together, you can now gather the translation and orientation of every point your camera detects relative to any arbitrary reference frame to gather pose data relevant to your application.
You can put all of this together into a function simply called tf() and stack these transformations up and down a complex transformation tree. Simply add all the transformations up to a common ancestor and subtract all the transformations down to your target node in order to find the transformation of your data between any two arbitrary related frames.
Edit: Hendy pointed out that it's unclear what Quaternion() class I refer to here.
For the purposes of this answer, this is all that's necessary:
class Quaternion():
def __init__(self, qx: float, qy: float, qz: float, qw: float):
self.qx = qx
self.qy = qy
self.xz = qz
self.qw = qw
But if you want to make this class super handy, you can define __mul__(self, other: Quaternion and __rmul__(self, other: Quaternion) to perform quaternion multiplication (order matters, so make sure to do both!). conjugate(self), toEuler(self), toRotMat(self), normalize(self) may also be handy additions.
Note that due to quirks in Python's typing, the above other: Quaternion is only for clarity. You'll need a longer-form if type(other) != Quaternion: raise TypeError('You can only multiply quaternions with other quaternions) error handling block to make that into valid python :)
The following definitions are not necessary for this answer, but they may prove useful to the reader.
import numpy as np
def __mul__(self, other):
if type(other) != Quaternion:
print("Quaternion multiplication only works with other quats")
raise TypeError
r1 = self.qw
r2 = other.qw
v1 = [self.qx,self.qy,self.qz]
v2 = [other.qx,other.qy,other.qz]
rPrime = r1*r2 - np.dot(v1,v2)
vPrimeA = np.multiply(r1,v2)
vPrimeB = np.multiply(r2,v1)
vPrimeC = np.cross(v1,v2)
vPrimeD = np.add(vPrimeA, vPrimeB)
vPrime = np.add(vPrimeD,vPrimeC)
x = vPrime[0]
y = vPrime[1]
z = vPrime[2]
w = rPrime
return Quaternion(x,y,z,w)
def __rmul__(self, other):
if type(other) != Quaternion:
print("Quaternion multiplication only works with other quats")
raise TypeError
r1 = other.qw
r2 = self.qw
v1 = [other.qx,other.qy,other.qz]
v2 = [self.qx,self.qy,self.qz]
rPrime = r1*r2 - np.dot(v1,v2)
vPrimeA = np.multiply(r1,v2)
vPrimeB = np.multiply(r2,v1)
vPrimeC = np.cross(v1,v2)
vPrimeD = np.add(vPrimeA, vPrimeB)
vPrime = np.add(vPrimeD,vPrimeC)
x = vPrime[0]
y = vPrime[1]
z = vPrime[2]
w = rPrime
return Quaternion(x,y,z,w)
def conjugate(self):
return Quaternion(self.qx*-1,self.qy*-1,self.qz*-1,self.qw)

Formatting a txt file of equations into the same format and then manipulating them for linear algebra calculations in Python

I'm looking for an universal way of transforming equations in Python 3.2. I've only recently begun playing around with it and stumbled upon some of my old MATLAB homework. I'm able to calculate this in MATLAB but pylab is still a bit of a mystery to me.
So, I have a text file of equations that I'm trying to convert into the the same form of A x = b and then solve some linear algebra problems associated with them in PYLAB.
The text file, "equations.txt",contains collections of linear equations in the following format:
-38 y1  +  35 y2  +  31 y3  = -3047
11 y1  + -13 y2  + -34 y3  = 784
34 y1  + -21 y2  +  19 y3  = 2949
etc.
The file contains the equations for four sets of equations, each set with a different number of variables. Each set of equations is of the exact form shown (3 examples above) with one empty line between each set.
I want to write a program to read all the sets of equations in the files, convert sets of equations into a matrix equation A x = b, and solve the set of equations for the vector x.
My approach has been very "MATLABy", which is a problem because I want to be able to write a program that will solve for all of the variables.
I've tried reading a single equation as a text line, stripped of the carriage return at the end, and splitting line at the = sign because as we know the 2nd element in the split is the right hand side of the equation, that goes into the vector b.
The first element in the split is the part you have to get the coefficients that go in the A matrix.  If you split this at white space ' ', you will get a list like
['-38', 'y1', '+', '35', 'y2', '+', '31', 'y3']
Note now that you can pull every 3rd element and get the coefficients that go into the matrix A.
Partial answers would be:
y1 = 90; c2 = 28; x4 = 41; z100 = 59
I'm trying to manipulate them to give me the sum of the entries of the solutions y1, ..., y3 for the first block of equations, the sum of the entries of the solutions c1, ..., c6 for the second block of equations, the sum of the entries of the solutions x1, ..., x13 for the third block of equations, and the sum of the entries of the solutions z1, ..., z100 for the fourth block of equations.
Like, I said - I'm able to do this in MATLAB but not in Python so I'm probably approaching this from the wrong way but this is what I have so far:
import pylab
f = open('equations.txt', 'r')
L=f.readlines()
list_final = []
for line in L:
line_l = line.rstrip()
list_l = line_l.split(";")
list_l = filter(None, list_l)
for expression in list_l:
and ending it with
f.close()
This was just my go at trying to format the equations to all look the same. I realise it's not a lot but I was really hoping someone could get my started because even though I know some python I normally don't use it for math because I have MATLAB for that.
I think this could be useful for many of us who have prior MATLAB experience but not pylab.
How would you go around this? Thank you!
For your example format, it's very easy to process it by numpy.loadtxt():
import numpy as np
data = np.loadtxt("equations.txt", dtype=str)[:, ::3].astype(np.float)
a = data[:, :-1]
b = data[:, -1]
x = np.linalg.solve(a, b)
The steps are:
An alternative approach that is possibly more robust to unstructured input is to use a combination of the Python symbolic math package (sympy), and a few parsing tricks. This scales to the variables in the equations being written in an arbitrary order.
Although sympy has some tools for parsing, (your input is very close in appearance to Mathematica), it appears that the sympy.parsing.mathematica module can't deal with some of the input (particularly leading minus signs).
import sympy
from sympy.parsing.sympy_parser import parse_expr
import re
def text_to_equations(text):
lines = text.split('\n')
lines = [line.split('=') for line in lines]
eqns = []
for lhs, rhs in lines:
# clobber all the spaces
lhs = lhs.replace(' ','')
# *assume* that a number followed by a letter is an
# implicit multiplication
lhs = re.sub(r'(\d)([a-z])', r'\g<1>*\g<2>', lhs)
eqns.append( (parse_expr(lhs), parse_expr(rhs)) )
return eqns
def get_all_symbols(eqns):
symbs = set()
for lhs, rhs in eqns:
for sym in lhs.atoms(sympy.Symbol):
symbs.add(sym)
return symbs
def text_to_eqn_matrix(text):
eqns = text_to_equations(text)
symbs = get_all_symbols(eqns)
n = len(eqns)
m = len(symbs)
A = numpy.zeros((m, n))
b = numpy.zeros((m, 1))
for i, (lhs, rhs) in enumerate(eqns):
d = lhs.as_coefficients_dict()
b[i] = int(rhs)
for j, s in enumerate(symbs):
A[i, j] = d[s]
x = sympy.Matrix([list(symbs)]).T
return sympy.Matrix(A), x, sympy.Matrix(b)
s = '''-38 y1 + 35 y2 + 31 y3 = -3047
11 y1 + -13 y2 + -34 y3 = 784
34 y1 + -21 y2 + 19 y3 = 2949'''
A, x, b = text_to_eqn_matrix(s)
print A
print x
print b

Is it possible to optimize this Matlab code for doing vector quantization with centroids from k-means?

I've created a codebook using k-means of size 4000x300 (4000 centroids, each with 300 features). Using the codebook, I then want to label an input vector (for purposes of binning later on). The input vector is of size Nx300, where N is the total number of input instances I receive.
To compute the labels, I calculate the closest centroid for each of the input vectors. To do so, I compare each input vector against all centroids and pick the centroid with the minimum distance. The label is then just the index of that centroid.
My current Matlab code looks like:
function labels = assign_labels(centroids, X)
labels = zeros(size(X, 1), 1);
% for each X, calculate the distance from each centroid
for i = 1:size(X, 1)
% distance of X_i from all j centroids is: sum((X_i - centroid_j)^2)
% note: we leave off the sqrt as an optimization
distances = sum(bsxfun(#minus, centroids, X(i, :)) .^ 2, 2);
[value, label] = min(distances);
labels(i) = label;
end
However, this code is still fairly slow (for my purposes), and I was hoping there might be a way to optimize the code further.
One obvious issue is that there is a for-loop, which is the bane of good performance on Matlab. I've been trying to come up with a way to get rid of it, but with no luck (I looked into using arrayfun in conjunction with bsxfun, but haven't gotten that to work). Alternatively, if someone know of any other way to speed this up, I would be greatly appreciate it.
Update
After doing some searching, I couldn't find a great solution using Matlab, so I decided to look at what is used in Python's scikits.learn package for 'euclidean_distance' (shortened):
XX = sum(X * X, axis=1)[:, newaxis]
YY = Y.copy()
YY **= 2
YY = sum(YY, axis=1)[newaxis, :]
distances = XX + YY
distances -= 2 * dot(X, Y.T)
distances = maximum(distances, 0)
which uses the binomial form of the euclidean distance ((x-y)^2 -> x^2 + y^2 - 2xy), which from what I've read usually runs faster. My completely untested Matlab translation is:
XX = sum(data .* data, 2);
YY = sum(center .^ 2, 2);
[val, ~] = max(XX + YY - 2*data*center');
Use the following function to calculate your distances. You should see an order of magnitude speed up
The two matrices A and B have the columns as the dimenions and the rows as each point.
A is your matrix of centroids. B is your matrix of datapoints.
function D=getSim(A,B)
Qa=repmat(dot(A,A,2),1,size(B,1));
Qb=repmat(dot(B,B,2),1,size(A,1));
D=Qa+Qb'-2*A*B';
You can vectorize it by converting to cells and using cellfun:
[nRows,nCols]=size(X);
XCell=num2cell(X,2);
dist=reshape(cell2mat(cellfun(#(x)(sum(bsxfun(#minus,centroids,x).^2,2)),XCell,'UniformOutput',false)),nRows,nRows);
[~,labels]=min(dist);
Explanation:
We assign each row of X to its own cell in the second line
This piece #(x)(sum(bsxfun(#minus,centroids,x).^2,2)) is an anonymous function which is the same as your distances=... line, and using cell2mat, we apply it to each row of X.
The labels are then the indices of the minimum row along each column.
For a true matrix implementation, you may consider trying something along the lines of:
P2 = kron(centroids, ones(size(X,1),1));
Q2 = kron(ones(size(centroids,1),1), X);
distances = reshape(sum((Q2-P2).^2,2), size(X,1), size(centroids,1));
Note
This assumes the data is organized as [x1 y1 ...; x2 y2 ...;...]
You can use a more efficient algorithm for nearest neighbor search than brute force.
The most popular approach are Kd-Tree. O(log(n)) average query time instead of the O(n) brute force complexity.
Regarding a Maltab implementation of Kd-Trees, you can have a look here