What is the difference between Byzantine fault tolerance and Practical Byzantine fault tolerance - system

As the title says, what is the difference between Byzantine fault tolerance and Practical Byzantine fault tolerance. I tried finding answers online but was not successful.

You can achieve Byzantine fault tolerance in several possible ways, the most straight forward is all nodes talk to all nodes, telling what value they plan to accept. This required n * (n-1) messages in the network, with each on n nodes sending n-1 messages to other n-1 nodes.
Later down the line, someone proposed the pBFT algorithm that can achieve the Byzantine fault tolerance with less number of message exchange. Their paper says:
"We implemented a Byzantine-fault-tolerant NFS service using our
algorithm and measured its performance. The results show that
our service is only 3% slower than a standard unreplicated NFS."
https://pmg.csail.mit.edu/papers/osdi99.pdf
Later there were some attacks found on their algorithm, so the solution is not fully Byzantine fault tolerant.
So in short practical byzantine fault tolerant systems use pBFT protocol, which is mostly byzantine fault tolerant, with a couple of possible attacks.

Related

Which part of the CAP Theorem does CRDT Sacrifice?

CRDT or Conflict-Free Replicated Data Type follows a strong eventual consistency guarantee, essentially meaning consistency is guaranteed to be achieved at some point in time in the future.
My question is, is the Consistency part of the CAP theorem sacrificed or else which one is?
CRDTs sacrifice consistency to achieve availability at least in the most straightforward utilization of them that does nothing to check that you have received inputs from all potential clients (nodes in the network).
However CRDT is a kind of data structure and is not a distributed algorithm so its behavior in a distributed environment will depend on the full distributed algorithm that they participated in.
Some similar ideas are discussed in https://blog.acolyer.org/2017/08/17/on-the-design-of-distributed-programming-models/:
Lasp is an example of an AP model that sacrifices consistency for availability. In Lasp, all data structures are CRDTs...

How a proposer know its propose is not approved by a quorum of acceptors?

I am reading "paxos" on wiki, and it reads:
"Rounds fail when multiple Proposers send conflicting Prepare messages, or when the Proposer does not receive a Quorum of responses (Promise or Accepted). In these cases, another round must be started with a higher proposal number."
But I don't understand how the proposer tells the difference between its proposal not being approved and it just takes more time for the message to transmit?
One of the tricky parts to understanding Paxos is that the original paper and most others, including the wiki, do not describe a full protocol capable of real-world use. They only focus on the algorithmic necessities. For example, they say that a proposer must choose a number "n" higher than any previously used number. But they say nothing about how to actually go about doing that, the kinds of failures that can happen, or how to resolve the situation if two proposers simultaneously try to use the same proposal number (as in both choosing n=2). That actually completely breaks the protocol and would lead to incorrect results but I'm not sure I've ever seen that specifically called out. I guess it's just supposed to be "obvious".
Specifically to your question, there's no perfect way to tell the difference using the raw algorithm. Practical implementations typically go the extra mile by sending a Nack message to the Proposer rather than just silently ignoring it. There are plenty of other tricks that can be used but all of them, including the nacks, come with varying downsides. Which approach is best generally depends on both the kind of application employing Paxos and the environment it's intended to run in.
If you're interested, I put together a much longer-winded description of Paxos that includes many of issues practical implementations must address in addition to the core components. It covers this issue along with several others.
Specific to your question it isn't possible for a proposer to distinguish between lost messages, delayed messages, crashed acceptors or stalled acceptors. In each case you get no response. Typically an implementation will timeout on getting less than a quorum response and resend the proposal on the assumption messages were dropped or acceptors are rebooting.
Often implementations add "nack" messages as negative acknowledgement as an optimisation to speed up recovery. The proposer only gets "nack" responses from nodes that are reachable that have accepted a higher promise. The ”nack” can show both the highest promise and also the highest instance known to be fixed. How this helps will be outlined below.
I wrote an implementation of Paxos called TRex with some of these techniques sticking as closely as possible to the description of the algorithm in the paper Paxos Made Simple. I wrote up a description of the practical considerations of timeouts and nacks on a blog post.
One of the interesting techniques it uses is for a timed out node to make the first proposal with a very low number. This will always get "nack" messages. Why? Consider a three node cluster where one network link breaks between a stable proposer and one other node. The other node will timeout and issue a prepare. If it issues a high prepare it will get a promise from the third node. This will interrupt the stable leader. You then have symmetry where the two nodes that cannot message one another can fight with the leadership swapping with no forward progress.
To avoid this a timed out node can start with a low prepare. It can then look at the "nack" messages to learn from the third node that there is a leader who is making progress. It will see this as the highest instance known to be fixed in the nack will be greater than the local value. The timed out node can then not issue a high prepare and instead ask the third node to send it the latest fixed and accepted values. With that enhancement a timed out node can now distinguish between a stable proposer crashing or the connection failing. Such ”nack” based techniques don't affect the correctness of the implementation they are only an optimisation to ensure fast failover and forward progress.

What is the need for dynamic consensus in hyperledger projects

I read hyperledger sawtooth supports dynamic consensus, mean the consensus algorithm can be changed dynamically. My question is what is the need or when it is necessary to change the consensus dynamically ?. What forces us to change the consensus dynamically ?
I read the Fabric and Sawtooth documentation. Could not find the necessity for dynamic consensus
Nothing forces any blockchain to change consensus--you can keep the same consensus algorithm forever.
However, consensus algorithms are an active area of research. New and more efficient algorithms are being proposed. It may be that a blockchain may want to switch to a new algorithm. Or perhaps the current algorithm is not suitable. For example, some algorithms are efficient with a few nodes (e.g., PBFT) but are O(n^2), meaning they create an exponentially increasing number of messages as nodes increase and do not scale.
Some consensus algorithms are BFT, Byzantine Fault Tolerant, meaning they withstand bad or malicious actors (nodes). Other algorithms are just CFT, Crash Fault Tolerant, meaning they can withstand a node crashing, but not a bad actor. So one may want to change from a BFT-friendly algorithms (such as PoET SGX).
Hyperledger Sawtooth, by the way, supports PoET, RAFT, and DevMode consensus. The last is for experimental and testing use only--not production. Soon to be added is PBFT consensus. For more detail on Sawtooth consensus, see https://github.com/danintel/sawtooth-faq/blob/master/consensus.rst

Can rabbitmq suggest prefetch/QoS value based on its metrics?

RabbitMQ allows QoS.
https://www.rabbitmq.com/consumer-prefetch.html
The question is more about the optimal values.
Can RabbitMQ propose optimal on its metrics value?
No, it doesn't. However, you can estimate it based on your use case.
I suggest you read this blog post from CloudAMQP: https://www.cloudamqp.com/blog/2017-12-29-part1-rabbitmq-best-practice.html
This is really well written and provides a lot of useful advice. In their section "How to set correct prefetch value?" they describe three cases:
Single/few consumers and short processing time: prefetch ~= round_trip / processing_time
Many consumers and short processing time: prefetch < round_trip / processing_time
Many consumers and long processing time: prefetch set to 1

What is cyclic and acyclic communication?

So I've already searched if there was a question like this posted before, but I wasn't able to find the answer I liked.
I've been working with some PLCs and variable frequency drives lately and thought it was about time I finally found out what cyclic and non-cyclic communication is.
So correct me if I'm wrong, but when I think of cyclic data, I think of data that is continuously being updated and is able to be sent/sampled to other devices. With relation to what I'm doing, I'm thinking that the variable frequency drive is able to update information such as speed and frequency that can be sampled/read from a PLC. This is what I would consider cyclic communication, something that is always updating a certain type of information that can be sent as data.
So I might be completely wrong with this assumption, and that leaves me with the question of what exactly would be considered non-cyclic or acyclic communication.
Any help?
Forenote: This is mostly a programming based site, and while your question does have an answer within the contexts of programming, I happen to know that in your industrial application, the importance of cyclic vs acyclic tends to be very hardware/protocol specific, and is really more of a networking problem than a programming one.
Cyclic data is not simply "continuous" data. In industry, it refers to data delivered on a guaranteed (or at least highly predictable) schedule. If the data stream were to violate the schedule, it could have disastrous consequences (a VFD misses its shutdown command by a fraction of a second, and you lose your arm!).
Acyclic data is still reliable for machine control, it is just delivered in a less deterministic way (on the order of milliseconds, sometimes up to several seconds). When accessing a single VFD with a single PLC, you will probably never notice this bursting behavior, and in fact, you may perceive smoother and quicker data transmissions. From the hardware interface perspective, acyclic data transfer does not provide as strong of a guarantee about if or when one machine will respond to the request of another.
Both forms of data transfer deliver data at speeds much faster than humans can deal with, but in certain applications they will each have their own consequences.
Cyclic networks usually must take the form of master/slave, where only one device is allowed speak at a time, and answers are always returned, even if just to confirm that the message was received. Cyclic networks usually do not allow as many devices on the same wire, and often they will pass larger amounts of data at slower rates.
Acyclic networks might be thought of as a bit more choatic, but since they skip handshaking formalities, they can often cheat more devices onto the network and get higher speeds all at the same time. This comes at the cost of occasional data collisions/bottlenecks, and sometimes even, requests for critical data are simply ignored/lost with no indication of failure or success from the target ( in the case the sender will likely be sitting and waiting desperately for a message it will not get, and often then trigger process watchdogs that will shutdown the system).
From a programmer perspective, not much is different between these two transmission types.
What will usually dictate a situation,
how many devices are running on the wire (sometimes this forces the answer right away)
how sensitive/volatile is the data they want to share (how useful are messages if they are a little late)
how much data they might be required to send at any given time ( shifting demands on a network that already produces race conditions can be hard to anticipate/avoid if you don't see it coming before hand).
Hope that helps :)