I am reading "paxos" on wiki, and it reads:
"Rounds fail when multiple Proposers send conflicting Prepare messages, or when the Proposer does not receive a Quorum of responses (Promise or Accepted). In these cases, another round must be started with a higher proposal number."
But I don't understand how the proposer tells the difference between its proposal not being approved and it just takes more time for the message to transmit?
One of the tricky parts to understanding Paxos is that the original paper and most others, including the wiki, do not describe a full protocol capable of real-world use. They only focus on the algorithmic necessities. For example, they say that a proposer must choose a number "n" higher than any previously used number. But they say nothing about how to actually go about doing that, the kinds of failures that can happen, or how to resolve the situation if two proposers simultaneously try to use the same proposal number (as in both choosing n=2). That actually completely breaks the protocol and would lead to incorrect results but I'm not sure I've ever seen that specifically called out. I guess it's just supposed to be "obvious".
Specifically to your question, there's no perfect way to tell the difference using the raw algorithm. Practical implementations typically go the extra mile by sending a Nack message to the Proposer rather than just silently ignoring it. There are plenty of other tricks that can be used but all of them, including the nacks, come with varying downsides. Which approach is best generally depends on both the kind of application employing Paxos and the environment it's intended to run in.
If you're interested, I put together a much longer-winded description of Paxos that includes many of issues practical implementations must address in addition to the core components. It covers this issue along with several others.
Specific to your question it isn't possible for a proposer to distinguish between lost messages, delayed messages, crashed acceptors or stalled acceptors. In each case you get no response. Typically an implementation will timeout on getting less than a quorum response and resend the proposal on the assumption messages were dropped or acceptors are rebooting.
Often implementations add "nack" messages as negative acknowledgement as an optimisation to speed up recovery. The proposer only gets "nack" responses from nodes that are reachable that have accepted a higher promise. The ”nack” can show both the highest promise and also the highest instance known to be fixed. How this helps will be outlined below.
I wrote an implementation of Paxos called TRex with some of these techniques sticking as closely as possible to the description of the algorithm in the paper Paxos Made Simple. I wrote up a description of the practical considerations of timeouts and nacks on a blog post.
One of the interesting techniques it uses is for a timed out node to make the first proposal with a very low number. This will always get "nack" messages. Why? Consider a three node cluster where one network link breaks between a stable proposer and one other node. The other node will timeout and issue a prepare. If it issues a high prepare it will get a promise from the third node. This will interrupt the stable leader. You then have symmetry where the two nodes that cannot message one another can fight with the leadership swapping with no forward progress.
To avoid this a timed out node can start with a low prepare. It can then look at the "nack" messages to learn from the third node that there is a leader who is making progress. It will see this as the highest instance known to be fixed in the nack will be greater than the local value. The timed out node can then not issue a high prepare and instead ask the third node to send it the latest fixed and accepted values. With that enhancement a timed out node can now distinguish between a stable proposer crashing or the connection failing. Such ”nack” based techniques don't affect the correctness of the implementation they are only an optimisation to ensure fast failover and forward progress.
Part of federated learning research is based on operations performed on the communications between the server and clients such as dropping part of the updates (drop some gradients describing a model) exchanged between clients and server or discarding an update from a specific client in a certain communication round. I want to know if such capabilities are supported by Tensorflow-federated (TFF) framework and how they are supported because, from a first look, it seems to me the level of abstraction of TFF API does not allow such operations. Thank you.
TFF's language design intentionally avoids a notion of client identity; there is desire to avoid making a "Client X" addressable and discarding its update or sending it different data.
However, there may be a way to run simulations of the type of computations mentioned. TFF does support expressing the following:
Computations that condition on properties of tensors, for example ignore an update that has nan values. One way this could be accomplished would be by writing a tff.tf_computation that conditionally zeros out the weight of updates before tff.federated_mean. This technique is used in tff.learning.build_federated_averaing_process()
Simulations that run a different computations on different sets of clients (where a set maybe a single client). Since the reference executor parameterizes clients by the data they posses, a writer of TFF could write two tff.federated_computations, apply them to different simulation data, and combine the results.
How should one decide between rabbit mq, kafka , akka and vertx or chose a combination of a few of these?
I have a use case where I want to get huge (half a TB each day) market data using a java client API provided by an upstream.
We have currently implemented a distributed etl using Akka but want to know what other improvements or better choices or combination of choices(like akka+kafka) can be considered.
Regarding the choice between akka and vert.x the following Devoxx talk is to the point:
https://www.youtube.com/watch?v=EMv_8dxSqdE
It compares concurrency models, among them event bus (vert.x being the example) and actor systems (akka being the example).
In the summary slide (1h00m40s into the talk), the difference is summarised in that akka provides hierarchical supervision for error-handling, that being presented as an advantage over vert.x
akka-stream-kafka (formerly reactive-kafka) feels like a natural fit to bridge the two and we are happy users of it, but cannot comment on how it compares to rabbitmq.
I am trying to construct a small application that will run on a robot with very limited sensory capabilities (NXT with gyroscope/ultrasonic/touch) and the actual AI implementation will be based on hierarchical perceptual control theory. I'm just looking for some guidance regarding the implementation as I'm confused when it comes to moving from theory to implementation.
The scenario
My candidate scenario will have 2 behaviors, one is to avoid obstacles, second is to drive in circular motion based on given diameter.
The problem
I've read several papers but could not determine how I should classify my virtual machines (layers of behavior?) and how they should communicating to lower levels and solving internal conflicts.
These are the list of papers I've went through to find my answers but sadly could not
pct book
paper on multi-legged robot using hpct
pct alternative perspective
and the following ideas are the results of my brainstorming:
The avoidance layer would be part of my 'sensation layer' and that is because it only identifies certain values like close objects e.g. ultrasonic sensor specific range of values. The other second layer would be part of the 'configuration layer' as it would try to detect the pattern in which the robot is driving like straight line, random, circle, or even not moving at all, this is using the gyroscope and motor readings. 'Intensity layer' represents all sensor values so it's not something to consider as part of the design.
Second idea is to have both of the layers as 'configuration' because they would be responding to direct sensor values from 'intensity layer' and they would be represented in a mesh-like design where each layer can send it's reference values to the lower layer that interface with actuators.
My problem here is how conflicting behavior would be handled (maneuvering around objects and keep running in circles)? should be similar to Subsumption where certain layers get suppressed/inhibited and have some sort of priority system? forgive my short explanation as I did not want to make this a lengthy question.
/Y
Here is an example of a robot which implements HPCT and addresses some of the issues relevant to your project, http://www.youtube.com/watch?v=xtYu53dKz2Q.
It is interesting to see a comparison of these two paradigms, as they both approach the field of AI at a similar level, that of embodied agents exhibiting simple behaviors. However, there are some fundamental differences between the two which means that any comparison will be biased towards one or the other depending upon the criteria chosen.
The main difference is of biological plausibility. Subsumption architecture, although inspired by some aspects of biological systems, is not intended to theoretically represent such systems. PCT, on the hand, is exactly that; a theory of how living systems work.
As far as PCT is concerned then, the most important criterion is whether or not the paradigm is biologically plausible, and criteria such as accuracy and complexity are irrelevant.
The other main difference is that Subsumption concerns action selection whereas PCT concerns control of perceptions (control of output versus control of input), which makes any comparison on other criteria problematic.
I had a few specific comments about your dissertation on points that may need
clarification or may be typos.
"creatures will attempt to reach their ultimate goals through
alternating their behaviour" - do you mean altering?
"Each virtual machine's output or error signal is the reference signal of the machine below it" - A reference signal can be a function of one or more output signals from higher-level systems, so more strictly this would be, "Each virtual machine's output or error signal contributes to the reference signal of a machine at a lower level".
"The major difference here is that Subsumption does not incorporate the ideas of 'conflict' " - Well, it does as the purpose of prioritising the different layers, and sub-systems, is to avoid conflict. Conflict is implicit, as there is not a dedicated system to handle conflicts.
"'reorganization' which require considering the goals of other layers." This doesn't quite capture the meaning of reorganisation. Reorganisation happens when there is prolonged error in perceptual control systems, and is a process whereby the structure of the systems changes. So rather than just the reference signals changing the connections between systems or the gain of the systems will change.
"Design complexity: this is an essential property for both theories." Rather than an essential property, in the sense of being required, it is a characteristic, though it is an important property to consider with respect to the implementation or usability of a theory. Complexity, though, has no bearing on the validity of the theory. I would say that PCT is a very simple theory, though complexity arises in defining the transfer functions, but this applies to any theory of living systems.
"The following step was used to create avoidance behaviour:" Having multiple nodes for different speeds seem unnecessarily complex. With PCT it should only be necessary to have one such node, where the distance is controlled by varying the speed (which could be negative).
Section 4.2.1 "For example, the avoidance VM tries to respond directly to certain intensity values with specific error values." This doesn't sound like PCT at all. With PCT, systems never respond with specific error (or output) values, but change the output in order to bring the intensity (in this case) input in to line with the reference.
"Therefore, reorganisation is required to handle that conflicting behaviour. I". If there is conflict reorganisation may be necessary if the current systems are not able to resolve that conflict. However, the result of reorganisation may be a set of systems that are able to resolve conflict. So, it can be possible to design systems that resolve conflict but do not require reorganisation. That is usually done with a higher-level control system, or set of systems; and should be possible in this case.
In this section there is no description of what the controlled variables are, which is of concern. I would suggest being clear about what are goal (variables) of each of the systems.
"Therefore, the designed behaviour is based on controlling reference values." If it is only reference values that are altered then I don't think it is accurate to describe this as 'reorganisation'. Such a node would better be described as a "conflict resolution" node, which should be a higher-level control system.
Figure 4.1. The links annotated as "error signals" are actually output signals. The error signals are the links between the comparator and the output.
"the robot never managed to recover from that state of trying to reorganise the reference values back and forth." I'd suggest the way to resolve this would be to have a system at a level above the conflicted systems, and takes inputs from one or both of them. The variable that it controls could simply be something like, 'circular-motion-while-in-open-space', and the input a function of the avoidance system perception and then a function of the output used as the reference for the circular motion system, which may result in a low, or zero, reference value, essentially switching off the system, thus avoiding conflict, or interference. Remember that a reference signal may be a weighted function of a number of output signals. Those weights, or signals, could be negative so inhibiting the effect of a signal resulting in suppression in a similar way to the Subsumption architecture.
"In reality, HPCT cannot be implemented without the concept of reorganisation because conflict will occur regardless". As described above HPCT can be implemented without reorganisation.
"Looking back at the accuracy of this design, it is difficult to say that it can adapt." Provided the PCT system is designed with clear controlled variables in mind PCT is highly adaptive, or resistant to the effects of disturbances, which is the PCT way of describing adaption in the present context.
In general, it may just require clarification in the text, but as there is a lack of description of controlled variables in the model of the PCT implementation and that, it seems, some 'behavioural' modules used were common to both implementations it makes me wonder whether PCT feedback systems were actually used or whether it was just the concept of the hierarchical architecture that was being contrasted with that of the Subsumption paradigm.
I am happy to provide more detail of HPCT implementation though it looks like this response is somewhat overdue and you've gone beyond that stage.
Partial answer from RM of the CSGnet list:
https://listserv.illinois.edu/wa.cgi?A2=ind1312d&L=csgnet&T=0&P=1261
Forget about the levels. They are just suggestions and are of no use in building a working robot.
A far better reference for the kind of robot you want to develop is the CROWD program, which is documented at http://www.livingcontrolsystems.com/demos/tutor_pct.html.
The agents in the CROWD program do most of what you want your robot to do. So one way to approach the design is to try to implement the control systems in the CROWD programs using the sensors and outputs available for the NXT robot.
Approach the design of the robot by thinking about what perceptions should be controlled in order to produce the behavior you want to see the robot perform. So, for example, if one behavior you want to see is "avoidance" then think about what avoidance behavior is (I presume it is maintaining a goal distance from obstacles) and then think about what perception, if kept under control, would result in you seeing the robot maintain a fixed distance from objects. I suspect it would be the perception of the time delay between sending and receiving of the ultrasound pulses.Since the robot is moving in two-space (I presume) there might have to be two pulse sensors in order to sense the two D location of objects.
There are potential conflicts between the control systems that you will need to build; for example, I think there could be conflicts between the system controlling for moving in a circular path and the system controlling for avoiding obstacles. The agents in the CROWD program have the same problem and sometimes get into dead end conflicts. There are various ways to deal with conflicts of this kind;for example, you could have a higher level system monitoring the error in the two potentially conflicting systems and have it make reduce the the gain in one system or the other if the conflict (error) persists for some time.
Is there a limit to the number of Topics that may be created for a particular Domain in DDS? Is this implementation-dependent?
What is the maximum for RTI Connext DDS 5.0.0? I don't see it specified in the documentation.
The 'magic' limit of 240 you recalled was most likely either the maximum number of DomainParticipants that can run on a single computer on the same domain ID, which is 120. Or else it is the maximum number of DDS domain IDs, which is 233. See http://community.rti.com/kb/what-maximum-number-participants-domain
As Reinier mentioned there are no intrinsic limits to the number of endpoints.
Gerardo
With Connext, the limiting factor is not so much the number of Topics, but more the number of DataReaders and DataWriters created in a particular Domain. Of course, each DataReader and DataWriter is associated with exactly one Topic, so indirectly there is a dependency on the number of Topics.
With regard to the maximum number of DataReader and DataWriters in a Domain (often collectively indicated by Endpoints), the practical limitations depend on the resources in your system. Memory consumption due to administration of the topology of your DDS system will increase with the number of Endpoints. There is no hard or hard-coded limit on the number of Endpoints though.
If you have any particular scale in mind, I could indicate where you are in comparison to other users of the product.
This answer is indeed implementation dependent. My remarks apply to RTI Connext DDS and are not necessarily true for other DDS implementations.