What happens to blocks in a deleted blockchain? - bitcoin

A chain has blocks 0..n. simultaneously and topologically distant on the network, miner A adds block x as n+1 and miner B adds block y as n+1. Then miner A adds another block z as block n+2.
As I understand it, the longer block, containing blocks x and z as n+1 and n+2 will survive and the chain containing block y as n+1 will be deleted.
What happens to block y in the shorter chain that was a valid transaction? If its chain is deleted and it is not in the longer chain, is it lost? That doesn't make sense to me.
In a large, active network these asynchronous actions must always be happening, like routing in IP.
How is this handled?
Thanks,
David

This happens every day in the Bitcoin Network. The nodes will follow the longest valid chain.
Even the miner who propossed the block y as n+1 will reorganize his chain if he received a longer valid chain than his.
To avoid the dangers of a chain reorganization most exchanges wait 3 or 4 blocks before accepting a transaction.

As blockchain is dependable technology, if one of the blocks gets deleted then it may cause an issue with validation in ledger part, the entire upcoming chain becomes invalidated. its explained better in IBM Hyperledger project docs

Related

How a proposer know its propose is not approved by a quorum of acceptors?

I am reading "paxos" on wiki, and it reads:
"Rounds fail when multiple Proposers send conflicting Prepare messages, or when the Proposer does not receive a Quorum of responses (Promise or Accepted). In these cases, another round must be started with a higher proposal number."
But I don't understand how the proposer tells the difference between its proposal not being approved and it just takes more time for the message to transmit?
One of the tricky parts to understanding Paxos is that the original paper and most others, including the wiki, do not describe a full protocol capable of real-world use. They only focus on the algorithmic necessities. For example, they say that a proposer must choose a number "n" higher than any previously used number. But they say nothing about how to actually go about doing that, the kinds of failures that can happen, or how to resolve the situation if two proposers simultaneously try to use the same proposal number (as in both choosing n=2). That actually completely breaks the protocol and would lead to incorrect results but I'm not sure I've ever seen that specifically called out. I guess it's just supposed to be "obvious".
Specifically to your question, there's no perfect way to tell the difference using the raw algorithm. Practical implementations typically go the extra mile by sending a Nack message to the Proposer rather than just silently ignoring it. There are plenty of other tricks that can be used but all of them, including the nacks, come with varying downsides. Which approach is best generally depends on both the kind of application employing Paxos and the environment it's intended to run in.
If you're interested, I put together a much longer-winded description of Paxos that includes many of issues practical implementations must address in addition to the core components. It covers this issue along with several others.
Specific to your question it isn't possible for a proposer to distinguish between lost messages, delayed messages, crashed acceptors or stalled acceptors. In each case you get no response. Typically an implementation will timeout on getting less than a quorum response and resend the proposal on the assumption messages were dropped or acceptors are rebooting.
Often implementations add "nack" messages as negative acknowledgement as an optimisation to speed up recovery. The proposer only gets "nack" responses from nodes that are reachable that have accepted a higher promise. The ”nack” can show both the highest promise and also the highest instance known to be fixed. How this helps will be outlined below.
I wrote an implementation of Paxos called TRex with some of these techniques sticking as closely as possible to the description of the algorithm in the paper Paxos Made Simple. I wrote up a description of the practical considerations of timeouts and nacks on a blog post.
One of the interesting techniques it uses is for a timed out node to make the first proposal with a very low number. This will always get "nack" messages. Why? Consider a three node cluster where one network link breaks between a stable proposer and one other node. The other node will timeout and issue a prepare. If it issues a high prepare it will get a promise from the third node. This will interrupt the stable leader. You then have symmetry where the two nodes that cannot message one another can fight with the leadership swapping with no forward progress.
To avoid this a timed out node can start with a low prepare. It can then look at the "nack" messages to learn from the third node that there is a leader who is making progress. It will see this as the highest instance known to be fixed in the nack will be greater than the local value. The timed out node can then not issue a high prepare and instead ask the third node to send it the latest fixed and accepted values. With that enhancement a timed out node can now distinguish between a stable proposer crashing or the connection failing. Such ”nack” based techniques don't affect the correctness of the implementation they are only an optimisation to ensure fast failover and forward progress.

Who dictates the 'global rules' of a Cryptocurrency such as block reward amount, how many zeroes the hash must start with, block size, etc?

I've watched a lot of Cryptocurrency lectures on how they work and I think I am about 75% of the way of understanding completely how they work. One question has been bothering me though.
When a miner solves a block, he gets a block reward made out of thin air. For Bitcoin this is currently around 12.5 BTC. What dictates this specific amount of money? Is the the locally ran software? If so, can't that be tampered with? Does the miner ask other clients what the current block reward amount is? If so how does it know it's being fed the right updated information?
Same goes for the number of zeroes found on the hash. If a miner finds a hash value like 00000000000000000000000000000000000000000000000000000000010101111110110101010101 he would then check how many zeroes it starts with. Let's say the current solve requires 30 zeroes. Who makes that rule? How is it updated? At what points does it change from 30 -> 31? Who makes that decision to increase or decrease it. What if one computer thinks it's 29 and not 30. What stops people from gaming the system?
Same with block sizes. What stops miners from sending blocks with increased maximum sizes? Would clients reject the block if they don't match a certain size? If so, how do they know what are the maximum amount of transactions? Who told them?
A single miner can tamper with a block as much as they want, changing block award or difficulty or double-spending, but such a block will not be accepted by the rest of the network.
Bitcoin network needs a consensus to accept a specific block. As long as more than half of the nodes of the network are "good" ones, the tampered block will be rejected.
This functionality is implemented by Bitcoin P2P protocol.

In distributed systems, why do we need 2n+1 nodes to handle n failures?

I recently read that in order to handle the failure of n-nodes, the data has to be replicated on 2n+1 nodes. I could not understand the reasoning behind that. Could someone please explain?
This is the valid quorum configuration that requires the least number n of processes to tolerate f faults.
In detail, for fault tolerance, you can never wait for reading or writing to all processes, otherwise you'll block when at least one of them crashes. You need to read and write from sub-sets.
Given that you're not writing and reading all of them, you have to be sure that (1) you read from at least one process that has the latest version of data and that (2) every two writes intersect, such that one of them aborts. These are the quorum rules.
Finally, having n = 2f+1 processes and writing to f+1 is the configuration where you need the least n for f. You might still obey the quorum with a larger write quorum with a smaller read quorum, but then you need more processes to ensure that writes never block waiting for failed processes.
Ok, so think about it like this. A polynomial of degree n is defined uniquely by n+1 points. The proof for this is rather long and requires some knowledge of linear algebra so I will just link it here. Thus, if you want to send a message, you can derive the polynomial that encodes the message ( optimally through some mutually agreed standard so the person who receives the message will know what to do ). But how many points do you send through your channel? If you know the channel will drop n packets and the person receiving requires n+1 packets to read the message, you will need to interpolate your polynomial using the n+1 points you want to send and then calculate n additional points that lie on that polynomial and send the whole set of 2n+1 points so that the person receiving will always be able to reconstruct your polynomial and read the message.

why should a producer write to odd number of servers in case of a distributed message queue

In a recent interview, i was asked to design a distributed message queue. I modeled it as a multi-partitioned system where each partition has a replica set with one primary and one or more secondaries for high availability. The writes from the producer are processed by the primary and are replicated synchronously, which means a message is not committed unless a quorum of the replica set has applied it. He then identified the potential availability problem when the primary of a replica set dies (which means a producer writing to that partition won't be able to write until a new primary is elected for the replica set) and asked me about the solution where the producer writes to the same message to multiple servers (favoring availability instead of consistency). He then asked me what would be the difference if the client wrote to 2 servers vs 3 servers, a question i failed to answer. In general, i thought it was more of an Even vs Odd question and I guessed it had something to do with quorums (i.e. majority) but failed to see how it would impact a consumer reading data. Needless to say, this question cost me the job and still continues to puzzle me to this day. I would appreciate any solutions and/or insights and/or suggestions for one.
Ok, this is what I understood from your question about the new system:
You won't have a primary replica anymore so you don't need to elect one and instead will work simply on a quorum based system to have a higher availability? - if that is correct than maybe this will give you some closure :) - otherwise feel free to correct me.
Assuming you read and write from / to multiple random nodes and those nodes don't replicate the data on their own, the solution lies in the principle of quorums. In simple cases that means that you need to write and read always at least to/from n/2 + 1 nodes. So if you would write to 3 nodes you could have up to 5 servers, while if you'd write to 2 nodes you could only have up to 3 servers.
The slightly more complicated quorum is based on the rules:
R + W > N
W > N / 2
(R - read quorum, W - write quorum, N - number of nodes)
This would give you some more variations for
from how many servers you need to read
how many servers you can have in general
From my understanding for the question, that is what I would have used to formulate an answer and I don't think that the difference between 2 and 3 has anything to do with even or odd numbers. Do you think this is the answer your interviewer was looking for or did I miss something?
Update:
To clarify as the thoughts in the comment are, which value would be accepted.
In the quorum as I've described it, you would accept the latest value. The can be determined with a simple logical clock. The quorums guarantee that you will retrieve at least one item with the latest information. And in case of a network partitioning or failure when you can't read the quorum, you will know that it's impossible guarantee retrieving the latest value.
On the other hand you suggested to read all items and accept the most common one. I'm not sure, this alone will guarantee to have always the latest item.

In OOP, if objects send each other messages, won't there be easily an infinite loop happening?

In an Apple paper about Object Oriented Programming, it depicts objects sending messages to each other. So Appliance can send a message to Valve, saying requesting for water, and the Valve object can then send a message to the Appliance, for "giving the water".
(to send a message is actually calling the method of the other object)
So I wonder, won't this cause subtle infinite loop in some way that even the programmer did not anticipate? For example, one is if we program two objects, each one of them pulling each other by gravity, so one is sending to the other object, that there is a "pull" force, and the other object's method got called, and in turn sends a message to the first object, and they will go into an infinite loop. So if the computer program only has 1 process or 1 thread, it will simply go into an infinite loop, and never run anything else in that program (even if the two object finally collide together, they still continue to pull each other). How does this programming paradigm work in reality to prevent this?
Update: this is the Apple paper: http://developer.apple.com/library/mac/documentation/cocoa/conceptual/OOP_ObjC/OOP_ObjC.pdf
Update: for all the people who just look at this obvious example and say "You are wrong! Programs should be finite, blah blah blah", well, what I am aiming at is, what if there are hundreds or thousands of objects, and they send each other messages, and when getting a message, they might in turn send other messages to other objects. Then, how can you be sure there can't be infinite loop and the program cannot go any further.
On the other hand, for people who said, "a program must be finite", well, what about a simple GUI program? It has the event loop, and it is an infinite loop, running UNTIL the user explicitly asks the program to stop. And what about a program that keep on looking for prime numbers? It can keep looking (with BigNum such as in Ruby so that there can be any number of digits for an integer), so the program is just written to keep on running, and write the next larger prime number into the hard disk (or write to hard disk once every million time it find greater prime number -- so it find 1 million prime number and write that 1 millionth to the hard drive and then keep on looking for the next million prime numbers and write the 2 millionth number to hard drive (write only 1 number, not 1 million of them). Well, for a computer with 12GB or RAM and 2TB of hard drive, maybe you can say it can take 20 years for the program to exceed the capability of the computer, when hard disk is full or when the 12GB of RAM cannot fit all the variables (it might be billion of years that an integer cannot fit in 1GB of RAM), but as far as the program is concerned, it just keep running, unless the memory manager cannot allocate another BigNum, or the hard drive is full, that the exception is raised and the program is forced to stop, but the program is written to run indefinitely. So not all programs HAS TO BE written to be finite.
Why should Appliance request for water repeatedly?
Why should Valve bombard Appliance saying that water is being provided?
In theory - it's likely to create infinite loop, but in practice - it comes down to proper modeling of Your objects.
Appliance should send ICanHasWater message only once, wait for response, receive water or receive an answer that water cannot be provided, or will be in future when Applicance might want to try requesting water once again.
that's why I went into the 2 objects and gravity example instead.
Infinite loop of calculation of gravity effects between objects would happen only if You would trigger this calculation on calculation.
I think that common approach is to introduce Time concept and calculate gravitation for particular TimeFrame and then move on to next one for next round of calculation. That way - Your World would have control over thread between TimeFrames and Your application might do something more useful than endless calculations of gravity effects.
Without OOP it is as easy to create infinite loops unintentionally, using imperative programming languages or functional programming maybe. Thus I cannot see what is special about OOP in this case.
If you think of your objects as actors sending each other messages, it's not necessarily wrong to go into an infinite loop. GUI toolkits work this way. Dependend on the programming language used this is made obvious by a call to toolKit.mainLoop()or the like.
I think that even your example of modelling gravity by objects pulling at each other is not wrong per se. You have to ensure that something is happening as a result to the message (i.e. the object being accelerated and moving a little) and you will get a rough discretization of the underlying formulae. You want to check for collision nevertheless to make your model more complete :-)
Using this model requires some level of concurrency in your program to ensure that messages are processed in proper order.
In real life implementations there's no infinite loop, there's infinite indirect recursion instead - A() calls B(), B() calls C() and on some branch C() calls A() again. In your example if Appliance sends GetWater, Valve sends HeresYourWaterSir inmmediately and Appliance's handler of HeresYouWaterSir for whatever reason sends GetWater again - infinite indirect recursion will begin.
So yes, you're right, in some cases problems can happen. The OOP paradigm itself doesn't protect against that - it's up to the developer.