Is it possible for K Mean cluster has no member?

Is it possible for K Mean cluster has no member? - k-means

I'm currently using K Mean for clustering files. Some question occur to me, is it possible that the cluster has no member at all? If so, what is happen to the centroid of the cluster? Is it equal as the value before?
Thanks

Yes this can happen.
It depends on your implementation what is then happening. Some leave the cluster center as is, some reduce k, some choose a new cluster center, some crash badly.

Related

Why does CPUUtilization is more than EngineCPUUtilization for ElastiCache and Even though memory is available, SwapUages is above zero?

I am trying to understand the use case in which EngineCPUUtilization Matric is more relevant than CPUUtilization for AWS ElastiCache. Since Redis is single threaded, it is expected that EngineCPUUtilization would peak apriori the CPUUtilization, but in my monitoring I can see both these matrices having similar values. Can any one help me understand the exact use of these two matrices. Also even though, the memory is available, SwapUsages matric has non zero value. why is this? and does SwapUsages matric is used even if physical memory is available which seems contrary to usual swap definition.

Can zookeeper coordinate across clusters

Background:
We have a single cluster containing 2 app svr nodes n 3 replicated db nodes. Our application is a .net app, deployed on Linux app svrs.
We will move to a multi-cluster architecture in separate continents in the near future. Those clusters will replicate our existing cluster.
Question 1:
Can I use Zookeeper as means to achieve consistency. Example: I would like to avoid an operation in NY n a similar operation in EU occurring simultaneously n are inconsistent.
Everything I read about Zookeeper points out to a single cluster solution n I would like to avoid implementing my own distributed locks.
Question 2:
Do you have a suggestion different than implementing Zookeeper?
Many thx

The reason I did not get any responses is, more than likely, due to the fact that what I am asking for is impossible at worse, or will bring the clusters to their knees due to the synchronization demands, at best.
I now believe that I will be better off allowing inconsistencies and dealing with them during the eventual consistency sync step.

How to solve hot partition problem in consistent hashing Load Balancing

I am trying to understand how different Load Balancing strategy works.
One way is to use consistent hashing algorithm where we divide the entire space into multiple virtual nodes and each physical node takes a set of vnodes.
I am not able to understand how would hot partition problem will be solved? It could happen that one particular node is busier than the rest?
Can someone add their experience handling similar use cases? Any pointer to the right document/literature would be helpful.

why should a producer write to odd number of servers in case of a distributed message queue

In a recent interview, i was asked to design a distributed message queue. I modeled it as a multi-partitioned system where each partition has a replica set with one primary and one or more secondaries for high availability. The writes from the producer are processed by the primary and are replicated synchronously, which means a message is not committed unless a quorum of the replica set has applied it. He then identified the potential availability problem when the primary of a replica set dies (which means a producer writing to that partition won't be able to write until a new primary is elected for the replica set) and asked me about the solution where the producer writes to the same message to multiple servers (favoring availability instead of consistency). He then asked me what would be the difference if the client wrote to 2 servers vs 3 servers, a question i failed to answer. In general, i thought it was more of an Even vs Odd question and I guessed it had something to do with quorums (i.e. majority) but failed to see how it would impact a consumer reading data. Needless to say, this question cost me the job and still continues to puzzle me to this day. I would appreciate any solutions and/or insights and/or suggestions for one.

Ok, this is what I understood from your question about the new system:
You won't have a primary replica anymore so you don't need to elect one and instead will work simply on a quorum based system to have a higher availability? - if that is correct than maybe this will give you some closure :) - otherwise feel free to correct me.
Assuming you read and write from / to multiple random nodes and those nodes don't replicate the data on their own, the solution lies in the principle of quorums. In simple cases that means that you need to write and read always at least to/from n/2 + 1 nodes. So if you would write to 3 nodes you could have up to 5 servers, while if you'd write to 2 nodes you could only have up to 3 servers.
The slightly more complicated quorum is based on the rules:
R + W > N
W > N / 2
(R - read quorum, W - write quorum, N - number of nodes)
This would give you some more variations for
from how many servers you need to read
how many servers you can have in general
From my understanding for the question, that is what I would have used to formulate an answer and I don't think that the difference between 2 and 3 has anything to do with even or odd numbers. Do you think this is the answer your interviewer was looking for or did I miss something?
Update:
To clarify as the thoughts in the comment are, which value would be accepted.
In the quorum as I've described it, you would accept the latest value. The can be determined with a simple logical clock. The quorums guarantee that you will retrieve at least one item with the latest information. And in case of a network partitioning or failure when you can't read the quorum, you will know that it's impossible guarantee retrieving the latest value.
On the other hand you suggested to read all items and accept the most common one. I'm not sure, this alone will guarantee to have always the latest item.

Can one have a Variable Length Chromosome for Particle Swarm Optimization?

Can the particles have different lengths. For instance some have 10 genes and others have 20?
And if so, how would one go about updating the velocity since the global beast, local best and current could all be of different length?

It seems like you are looking for a solution with multiple swarms.
You can run a few optimizations for each number of "genes" you want to use.
Another option would be to add a variable holding the number of additional "genes" to the decision vector,and communicate only between particles that have that number equal.
Then one needs a way to communicate between the swarms, and possibly ability for particles to join other of the swarms.
I refer to a paper by Niu et al. 2006 - "MCPSO: A multi-swarm cooperative particle swarm optimizer".
Hope that helps.
Cheers!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas