K-means as specialized case of generalized EM Algorithm - k-means

I am using a dataset to make 2 clusters using EM and then K-means. I already have implemented K-means and EM Algorithm separately. Now I am trying to derive k-means from my implementation of EM Algorithm to do clustering. I have 2 questions in mind.
K-means is viewed as a special case of the generalized EM algorithm. But what assumptions do we need to make to derive k-means from EM algorithm?
Also, in coding perspective, what changes do we need to make in implementation of EM algorithm so that it starts behaving exactly like k-means algorithm? I assume that we need to share same co-variance matrix between both clusters. Is that right to assume?
This is what I am getting using k-means.
This is clustering using EM.

K-means and EM clustering are very related, but not exactly the same. Two changes to EM would make it very, very similar to K-means:
EM uses multi-dimensional distributions. Constrain the standard deviations of the distributions to be the same in all dimensions.
Revise the output of EM to produce only the most probably cluster. EM produces soft-clusters (a separate probability that a point is in each cluster), whereas K-means produces hard clusters (a single cluster choice).
I don't know how these "fixes" translate into your particular code.
I'm not 100% sure that under all circumstances, such an EM approach would converge to exactly the same clusters as K-means. I am confident that the two methods would produce very comparable results under most circumstances.

Related

Why differential evolution works so well?

What is the idea behind the mutation in differential evolution and why should this kind of mutation perform well?
I do not see any good geometric reason behind it.
Could anyone point me to some technical explanation of this?
Like all evolutionary algorithms, DE uses a heuristic, so my explanation is going to be a bit hand-wavy. What DE is trying to do, like all evolutionary algorithms, is to do a random search that's not too random. DE's mutation operator first computes the vector between two random members of the population, then adds that vector to a third random member of the population. This works well because it uses the current population as a way of figuring out how large of a step to take, and in what direction. If the population is widely dispersed, then it's reasonable to take big steps; if it's tightly concentrated, then it's reasonable to take small steps.
There are many reasons DE works better than Goldberg's GA, but focusing on the variation operators I'd say that the biggest difference is that DE uses real-coded variables and GA uses binary encoding. When optimizing on a continuous space, binary encoding is not a good choice. This has been known since the early 1990s, and one of the first things to come out of the encounter between the primarily German Evolution Strategy community and the primarily American Genetic Algorithm community was Deb's Simulated Binary Crossover. This operator acts like the GA's crossover operator, but on real-coded variables.

Is there any unsupervised clustering technique which can identify numbers clusters itself?

I checked unsupervised clsutering on gensim, fasttext, sklearn but did not find any documentation where I can cluster my text data using unsupervised learn without mentioning numbers of cluster to be identified
for example in sklearn KMneans clustering
km = KMeans(n_clusters=true_k, init='k-means++', max_iter=100)
Where I have to provide n_clusters.
In my case, I have text and it should be automatically identify numbers of clusters in it and cluster the text. Any reference article or link much appreciated.
DBSCAN is a density-based clustering method that we don't have to specify the number of clusters beforehand.
sklearn implementation : http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
Here is a good tutorial that gives an intuitive understanding on DBSCAN: http://mccormickml.com/2016/11/08/dbscan-clustering/
I extracted following from the above tutorial, which may be useful for you.
k-means requires specifying the number of clusters, ‘k’. DBSCAN does not, but does require specifying two parameters which influence the decision of whether two nearby points should be linked into the same cluster.
These two parameters are a distance threshold, ε (epsilon), and “MinPts” (minimum number of points), to be explained.
There are other methods (follow the link given in the comments) also, However, DBSCAN is a popular choice.

What parameters to optimize in KNN?

I want to optimize KNN. There is a lot about SVM, RF and XGboost; but very few for KNN.
As far as I know the number of neighbors is one parameter to tune.
But what other parameters to test? Is there any good article?
Thank you
KNN is so simple method that there is pretty much nothing to tune besides K. The whole method is literally:
for a given test sample x:
- find K most similar samples from training set, according to similarity measure s
- return the majority vote of the class from the above set
Consequently the only thing used to define KNN besides K is the similarity measure s, and that's all. There is literally nothing else in this algorithm (as it has 3 lines of pseudocode). On the other hand finding "the best similarity measure" is equivalently hard problem as learning a classifier itself, thus there is no real method of doing so, and people usually end up using either simple things (Euclidean distance) or use their domain knowledge to adapt s to the problem at hand.
Lejlot, pretty much summed it all. K-NN is so simple that it's an instance based nonparametric algorithm, that's what makes it so beautiful, and works really well for certain specific examples. Most of K-NN research is not in K-NN itself but in the computation and hardware that goes into it. If you'd like some readings on K-NN and machine learning algorithms Charles Bishop - Pattern Recognition and Machine Learning. Warning: it is heavy in the mathematics, but, Machine Learning and real computer science is all math.
By optimizing if you are also focusing on the reduction of prediction time (you should) then there are other aspects which you can implement to make the algorithm more efficient (But these are not parameter tuning). The major draw back with the KNN is that with the increasing number of training examples, the prediction time also goes high thus performance go low.
To optimize, you can check on the KNN with KD-trees, KNN with inverted lists(index) and KNN with locality sensitive hashing (KNN with LSH).
These will reduce the search space during the prediction time thus optimizing the algorithm.

Best multi-objective 3D path optimization algorithm?

I would like to calculate 3D balanced paths to building exits at once, for huge amount of people (2000). Since the problem is related to evacuation and solutions for 3D paths (the fastest and others) can be precalcualted and I am going to store 3D paths in database to accelerate the process. As I see it, there are two solutions so far:
Calculation of a number of passing through nodes, in graph environment representation, but probably the time calculation will be intolerable.
Using GA. However, I cannot find a good described optimization example, where is used genetic algorithm.
Can you tell me a way of using GA for multiobjective optimization, because I found only implementation of GA for finding shortest path? and Which algorithm is the best for multi-object optimization?
Genetic Algorithm as it is cannot be easily used for multi-objective optimisation directly. If you want to use the purest GA you have to combine the objectives into a single objective, e.g. by summing weighted values of each objective. But this often does not work very well, especially when there is a strong tradeoff between the objectives.
However, there are genetic (evolutionary) algorithms designed specifically for multi-objective optimisation. Probably the most famous one and one of the best is the NSGA-II which stands for Nondominated Sorting Genetic Algorithm II. It's also relatively easy to implement (I did once). But there other MOEAs (Multi-Objective Evolutionary Algorithm) too, just google it. Some of them also use the nondomination idea, others not.

What are the typical use cases of Genetic Programming?

Today I read this blog entry by Roger Alsing about how to paint a replica of the Mona Lisa using only 50 semi transparent polygons.
I'm fascinated with the results for that particular case, so I was wondering (and this is my question): how does genetic programming work and what other problems could be solved by genetic programming?
There is some debate as to whether Roger's Mona Lisa program is Genetic Programming at all. It seems to be closer to a (1 + 1) Evolution Strategy. Both techniques are examples of the broader field of Evolutionary Computation, which also includes Genetic Algorithms.
Genetic Programming (GP) is the process of evolving computer programs (usually in the form of trees - often Lisp programs). If you are asking specifically about GP, John Koza is widely regarded as the leading expert. His website includes lots of links to more information. GP is typically very computationally intensive (for non-trivial problems it often involves a large grid of machines).
If you are asking more generally, evolutionary algorithms (EAs) are typically used to provide good approximate solutions to problems that cannot be solved easily using other techniques (such as NP-hard problems). Many optimisation problems fall into this category. It may be too computationally-intensive to find an exact solution but sometimes a near-optimal solution is sufficient. In these situations evolutionary techniques can be effective. Due to their random nature, evolutionary algorithms are never guaranteed to find an optimal solution for any problem, but they will often find a good solution if one exists.
Evolutionary algorithms can also be used to tackle problems that humans don't really know how to solve. An EA, free of any human preconceptions or biases, can generate surprising solutions that are comparable to, or better than, the best human-generated efforts. It is merely necessary that we can recognise a good solution if it were presented to us, even if we don't know how to create a good solution. In other words, we need to be able to formulate an effective fitness function.
Some Examples
Travelling Salesman
Sudoku
EDIT: The freely-available book, A Field Guide to Genetic Programming, contains examples of where GP has produced human-competitive results.
Interestingly enough, the company behind the dynamic character animation used in games like Grand Theft Auto IV and the latest Star Wars game (The Force Unleashed) used genetic programming to develop movement algorithms. The company's website is here and the videos are very impressive:
http://www.naturalmotion.com/euphoria.htm
I believe they simulated the nervous system of the character, then randomised the connections to some extent. They then combined the 'genes' of the models that walked furthest to create more and more able 'children' in successive generations. Really fascinating simulation work.
I've also seen genetic algorithms used in path finding automata, with food-seeking ants being the classic example.
Genetic algorithms can be used to solve most any optimization problem. However, in a lot of cases, there are better, more direct methods to solve them. It is in the class of meta-programming algorithms, which means that it is able to adapt to pretty much anything you can throw at it, given that you can generate a method of encoding a potential solution, combining/mutating solutions, and deciding which solutions are better than others. GA has an advantage over other meta-programming algorithms in that it can handle local maxima better than a pure hill-climbing algorithm, like simulated annealing.
I used genetic programming in my thesis to simulate evolution of species based on terrain, but that is of course the A-life application of genetic algorithms.
The problems GA are good at are hill-climbing problems. Problem is that normally it's easier to solve most of these problems by hand, unless the factors that define the problem are unknown, in other words you can't achieve that knowledge somehow else, say things related with societies and communities, or in situations where you have a good algorithm but you need to fine tune the parameters, here GA are very useful.
A situation of fine tuning I've done was to fine tune several Othello AI players based on the same algorithms, giving each different play styles, thus making each opponent unique and with its own quirks, then I had them compete to cull out the top 16 AI's that I used in my game. The advantage was they were all very good players of more or less equal skill, so it was interesting for the human opponent because they couldn't guess the AI as easily.
http://en.wikipedia.org/wiki/Genetic_algorithm#Problem_domains
You should ask yourself : "Can I (a priori) define a function to determine how good a particular solution is relative to other solutions?"
In the mona lisa example, you can easily determine if the new painting looks more like the source image than the previous painting, so Genetic Programming can be "easily" applied.
I have some projects using Genetic Algorithms. GA are ideal for optimization problems, when you cannot develop a fully sequential, exact algorithm do solve a problem. For example: what's the best combination of a car characteristcs to make it faster and at the same time more economic?
At the moment I'm developing a simple GA to elaborate playlists. My GA has to find the better combinations of albums/songs that are similar (this similarity will be "calculated" with the help of last.fm) and suggests playlists for me.
There's an emerging field in robotics called Evolutionary Robotics (w:Evolutionary Robotics), which uses genetic algorithms (GA) heavily.
See w:Genetic Algorithm:
Simple generational genetic algorithm pseudocode
Choose initial population
Evaluate the fitness of each individual in the population
Repeat until termination: (time limit or sufficient fitness achieved)
Select best-ranking individuals to reproduce
Breed new generation through crossover and/or mutation (genetic
operations) and give birth to
offspring
Evaluate the individual fitnesses of the offspring
Replace worst ranked part of population with offspring
The key is the reproduction part, which could happen sexually or asexually, using genetic operators Crossover and Mutation.