Swarm in particles swarm optimization (PSO) algorithm - particle-swarm

In particles swarm optimization (PSO) algorithm, is it possible to use dataset to initialize the position of particles, instead of use uniform random numbers?

Yes, it is possible to initialize swarm particles with the dataset values instead of random initialization. You need to select random samples from dataset and need to assign it to the position vector of the swarm particle.
Initialisation of swarm particles
class Particle:
position=[]
velocity=[]
pbest=[]
def __init__(self):
for i in range(rows): // rows is number of sample in your dataset
self.position=Training_dataset[i,:]
self.velocity=np.random.rand(rows,columns)
// columns is number of dimensions or features in dataset
self.pbest=self.position
for i in range(no_of_sample_in_dataset):
p=Particle()
swarm.append(p)

Related

Importance of seed and num_runs in the KMeans clustering

New to ML so trying to make sense of the following code. Specifically
In for run in np.arange(1, num_runs+1), what is the need for this loop? Why didn't the author use setMaxIter method of KMeans?
What is the importance of seeding in clustering?
Why did the author chose to set the seed explicitly rather than using the default one?
from pyspark.ml.clustering import KMeans
from pyspark.ml.evaluation import ClusteringEvaluator
def optimal_k(df_in,index_col,k_min, k_max,num_runs):
'''
Determine optimal number of clusters by using Silhoutte Score Analysis.
:param df_in: the input dataframe
:param index_col: the name of the index column
:param k_min: the train dataset
:param k_min: the minmum number of the clusters
:param k_max: the maxmum number of the clusters
:param num_runs: the number of runs for each fixed clusters
:return k: optimal number of the clusters
:return silh_lst: Silhouette score
:return r_table: the running results table
:author: Wenqiang Feng
:email: von198#gmail.com.com
'''
start = time.time()
silh_lst = []
k_lst = np.arange(k_min, k_max+1)
r_table = df_in.select(index_col).toPandas()
r_table = r_table.set_index(index_col)
centers = pd.DataFrame()
for k in k_lst:
silh_val = []
for run in np.arange(1, num_runs+1):
# Trains a k-means model.
kmeans = KMeans()\
.setK(k)\
.setSeed(int(np.random.randint(100, size=1)))
model = kmeans.fit(df_in)
# Make predictions
predictions = model.transform(df_in)
r_table['cluster_{k}_{run}'.format(k=k, run=run)]= predictions.select('prediction').toPandas()
# Evaluate clustering by computing Silhouette score
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(predictions)
silh_val.append(silhouette)
silh_array=np.asanyarray(silh_val)
silh_lst.append(silh_array.mean())
elapsed = time.time() - start
silhouette = pd.DataFrame(list(zip(k_lst,silh_lst)),columns = ['k', 'silhouette'])
print('+------------------------------------------------------------+')
print("| The finding optimal k phase took %8.0f s. |" %(elapsed))
print('+------------------------------------------------------------+')
return k_lst[np.argmax(silh_lst, axis=0)], silhouette , r_table
I'll try to answer your questions based on my reading of the material.
The reason for this loop is that the author sets a new seed for every loop using int(np.random.randint(100, size=1)). If the feature variables exhibit patterns that automatically group them into visible clusters, then the starting seed should not have an impact on the final cluster memberships. However, if the data is evenly distributed, then we might end up with different cluster members based on the initial random variable. I believe the author is changing these seeds for each run to test different initial distributions. Using setMaxIter would set maximum iterations for the same seed (initial distribution).
Similar to the above - the seed defines the initial distribution of k points around which you're going to cluster. Depending on your underlying data distribution, the clusters can converge in different final distributions.
The author has control over the seed, as discussed in points 1 and 2. You can see for what seed your code converges around clusters as desired, and for which you might not get convergence. Also, if you iterate for, say, 100 different seeds and your code still converges into the same final clusters, you can remove the default seed as it likely doesn't matter. Another use is from a more software engineering perspective, setting explicit seed is super important if you want to, for example, write tests for your code and don't want it to randomly fail.

Using Kmeans to initialize EM-Algorithm

I've reading recently on Expectation Maximization (EM) and it keeps coming up that Initializing EM using K-Means is a good idea but i'm having difficulties in grasping this notion.
So as far as i know when using kmeans, the result you get is coordinates of the clusters' centroids according to the pre-defined numberof clusters, so how can this be used in order to initialize EM. To make things clearer this is the problem i'm currently trying to solve:
I have a dataset of noisy data points Y who originates from Samples X taken from an 8-ASK set. Now i loaded my dataset and have used a kmeans algorithm in order to identify the centroids but can't seem to know what's the next step. The EM algorithm that i use requires the parameters: the initial start values for the centroids and their probability distribution as well as the initial mean and variance but i do not understand how can get those exactly.
To summarize my question is basically how can i calculate the mean, variance and initial diribution of the centroids generated by kmeans algorithm when i ran him on my data Y ?

What is meaning of "parameter optimization of SVM by PSO"?

I can change parameters C and epsilon manually to obtain an optimised result, but I found that there is parameter optimization of SVM by PSO (or any other optimization algorithm). There is no algorithm. What does it mean: how can PSO automatically optimize the SVM parameters? I read several papers on this topic, but I'm still not sure.
Particle Swarm Optimization is a technique that uses the ML parameters (SVM parameters, in your case) as its features.
Each "particle" in the swarm is characterized by those parameter values. For instance, you might have initial coordinates of
degree epsilon gamma C
p1 3 0.001 0.25 1.0
p2 3 0.003 0.20 0.9
p3 2 0.0003 0.30 1.2
p4 4 0.010 0.25 0.5
...
pn ...........................
The "fitness" of each particle (p1-p4 shown here out of a population of n particles) is measured by the accuracy of the resulting model: the PSO algorithm trains and tests a model for each particle, returning that model's error rate as the value analogous to that from the training loss function (which it how the value is computed).
On each iteration, particles move toward the fittest neighbours. The process repeats until a maximum (hopefully the global one) appears as a convergence point. This process is simply one from the familiar gradient descent family.
There are two basic PSO variants. In gbest (global best), every particle affects every other particle, sort of a universal gravitation principle. It converges quickly, but may well miss a global max in favor of a local max that happened to be nearer to the swarm's original center. In lbest (local best), a particle responds to only its k closest neighbors. This can form localized clusters; it converges more slowly, but is more likely to find the global max in a non-convex space.
I'll try to briefly explain enough to answer your clarification questions. If that doesn't work, I'm afraid you'll probably have to find someone to discuss this in front of a white board.
To use PSO, you have to decide which SVM parameters you'll try to optimize, and how many particles you want to use. PSO is a meta-algorithm, so its features are the SVM parameters. The PSO parameters are population (how many particles you want to use, update neighbourhood (lbest size and a distance function; gbest is the all-inclusive case), and velocity (learning rate for the SVM parameters).
For a bit of illustration, let's assume the particle table above, extended to a population of 20 particles. We'll use lbest with a neighbourhood of 4, and a velocity of 0.1. We choose (randomly, in a grid, or however we think might give us nice results) the initial values of degree, epsilon, gamma, and C for each of the 20 particles.
Each iteration of PSO works like this:
# Train the model described by each particle's "position"
For each of the 20 particles:
Train an SVM with the SVM input and the given parameters.
Test the SVM; return the error rate as the PSO loss function value.
# Update the particle positions
for each of the 20 particles:
find the nearest 4 neighbours (using the PSO distance function)
identify the neighbour with the lowest loss (SVM's error rate).
adjust this particle's features (degree, epsilon, gamma, C) 0.1 of the way toward that neighbour's features. 0.1 is our learning rate / velocity. (Yes, I realize that changing degree is not likely to happen (it's a discrete value) without a special case in the update routine.
Continue iterating through PSO until the particles have converged to your liking.
gbest is simply lbest with an infinite neighbourhood; in that case, you don't need a distance function on the particle space.

I understand the algorithm for k means.But don't understand on how to apply it on testing data

I am having problems not on understanding the k means algorithm but on how to apply it on training ,validation and testing data.Is it like this:
Training phase: Apply k-means on the input data and then we get centroid value (in my case three).For each centroid value assign a label say 1,2,3.Suppose in training phase I input sixty such samples .So in total i get 60*3 centroids each with label 1,2,3..
Testing phase:Apply k means on the input signal.We get centroids.Compare this with centroid obtained from training phase centroids.Which ever is closest to it assign the same label for it?
k-means does not have a "training" and a "testing" phase. It is an unsupervised algorithm.
At most, it is only applied to testing data.
Do not approach it like a classificator. It is not a classification algorithm.
The objective of k-means is:
Split my input data set into k convex partitions, such that the sum of all squared deviations in all dimensions from the mean of each partition is smallest.
There are no labels.

PyMC: How can I describe a state space model?

I used to code my MCMC using C. But I'd like to give PyMC a try.
Suppose X_n is the underlying state whose dynamics following a Markov chain and Y_n is the observed data. In particular,
Y_n has Poisson distribution with mean depending on X_n and a multidimensional unknown parameter theta
X_n | X_{n-1} has distribution depending on theta
How should I describe this model using PyMC?
Another question: I can find conjugate priors for theta but not for X_n. Is it possible to specify which posteriors are updated using conjugate priors and which using MCMC?
Here is an example of a state-space model in PyMC on the PyMC wiki. It basically involves populating a list and allowing PyMC to treat it as a container of PyMC nodes.
As for the second part of the question, you could certainly calculate some of your conjugate posteriors ahead of time and put them into the model. For example, if you observed binomial data x=4, n=10 you could insert a Beta node p = Beta('p', 5, 7) to represent that posterior (its really just a prior, as far as the model is concerned, but it is the posterior given data x). Then PyMC would draw a sample for this posterior at every iteration to be used wherever it is needed in the model.