How does a bitcoin SPV client knows it didn't miss a block from a full node? - bitcoin

A bitcoin SPV (Simplified Payment Verification) client doesn't have to trust any full node. It submits a bloom filter to a full node for processing only a small subset of all blocks.
But what if an SPV client connected to a malicious full node, and this malicious full node intended not to send a block to this SPV client, which actually contains a transaction that this SPV client is interested in.
How does an SPV client protect itself that it didn't miss any transaction for its wallet address from a full node?

The SPV node connects to multiple, semi-trusted nodes to prevent this attack. Only a single node needs to be honest.
First, while the SPV client can not be easily fooled into thinking a transaction is in a block when it is not, the reverse is not true. A full node can simply lie by omission, leading an SPV client to believe a transaction has not occurred. This can be considered a form of Denial of Service. One mitigation strategy is to connect to a number of full nodes, and send the requests to each node. However this can be defeated by network partitioning or Sybil attacks, since identities are essentially free, and can be bandwidth intensive. Care must be taken to ensure the client is not cut off from honest nodes.
From https://bitcoin.org/en/operating-modes-guide#potential-spv-weaknesses

Related

Hashgraph, what's it and how does it work?

Does anyone know what's a hashgraph and the difference from blockchain? If you could explain it to me how it works, I would be pleased!
Thank you in advance!
Hedera Hashgraph is very much different from a regular Blockchain (which is a chain of blocks as its name suggests), it is a DAG (Directed Acyclic Graph).
You can read more about what a DAG is in the link provided but apart from that how Hedera is different is that it is solving the below problems that any DLT should overcome to become Scalable, Secure and Fast without losing Trust:
(1) Performance,
(2) Security,
(3) Governance,
(4) Stability,
(5) Regulatory Compliance.
How is it solving all this simultaneously without losing Trust is something that you can read more in their whitepaper which explains beautifully or refer this article from Rejolut.
One primary problem with any Public Blockchain is the TPS or Transaction per second it can handle and Hedera Hashgraph is miles ahead of the others with being able to clock 100,000 TPS which makes it valuable and preferred option as it offers services like Crypto Service, File Service, Smart Contract Service and Consensus Service.
You can have a look at some of the explorers like Kabuto and Dragonglass to get an idea of the transaction speed and other details.
If you want to play around with Hedera Hashgraph, their docs are very well written or you can use console playground to work on Smart contract and Consensus services directly and maybe use fairorder to do the account management for you to begin with.
Hedera does not use blockchain. It uses DAG, "Directed Acyclic Graph. it keeps the records of communication of all the nodes in Dag and time stamps all these messages. Sending messages between nodes is called gossiping`. Its source code is not open source.
How does the gossip protocol work
In hashgraph, when sending information between nodes, "Alice" will
choose another member at random, such as "Bob". Then Alice will tell
Bob all of the information she knows so far. Alice then repeats this
information with a different random member. Bob repeatedly does the
same, and all other members do the same. In this way, if a single
member becomes aware of new information, it will spread exponentially
fast through the community, until every member is aware of it.
The synchronization of information between two members through the
gossip protocol is called a gossip sync. Upon completion of a gossip
sync, each participating member commemorates the gossip sync with an
event. An event is stored in memory as a data structure composed of a
timestamp, an array of zero or more transactions, two parent hashes,
and a cryptographic signature. The two parent hashes are the hash of
the last event created by the self-parent prior to the gossip sync and
the hash of the last event created by the other-parent prior to the
gossip sync.
Gossiping is the basis of the hashgrap consensus algorithm which uses asynchronous byzantine fault tolerant (abft) . this indicates that there is no specific time when all the nodes reach consensus. In other words abft consensus mechanism has no block time. (there is no blockchain here. neither a block reward)
Since gossiping the fastest way of sperading information, it takes 3-5 seconds for the entire network to reach a consensus.
Hedera claims that the network achieves 10,000 HBAR cryptocurrency transactions per second. (i personally do not believe those claims)
You can also write smart contracts on Hedera. But you cannot execute 10000 smart contract operations per second because Hedera uses EVM and EVM is very slow.
Hedera has no slashing. If validators misbehave, they will not lose any of their crypto holdings. because they claim that hedera hashgraph gives validators less opportunity to misbehave. validators receive the 90% of the network fees.

How to prevent fake data from being sent to a Blockchain

I am developing a blockchain for IoT applications, where there are a number of gateways (miners) spread throughout the city and several nodes (sensors) connected to each of them. Each gateway can be added by an end user so this is a untrusted environment. How can I make sure that there isn't fake data being sent to the chain by one of the miners?
I have looked up some consensus protocols by find that none fit this specific problem since there is no value being exchanged.
Every miners sends a ping to a master server and receives from it the list of miners on the network. Then they connect to each other by p2p.
Any ideas of how could I solve this?
Blockchain can be used in both cases permission-less or permissionned, if you want to prevent that anyone can broadcast data, then you have to authenticate the nodes before they can join the network. If even after authenticating the nodes there is a chance that an authenticated node send "fake data" then a trust mechanism must be implemented, nodes verify the trustworthiness of the data's source and decide if the node is trusted and accept the data or not.
In order to prevent spamming or fake data being posted, it has to be added as a consensus rule to the protocol. Otherwise, it requires another layer that validates data based on off-chain data (but doesn't prevent data from being stored in blocks). Blockchain is for achieving distributed consensus, in a permission-less system. Restricting who can participate is not a permission-less system, and would be a centralized system because someone has to determine who is allowed to participate.
The answer to the query lies in Blockchain Oracles.
Oracles to-date are centralized services, meaning any smart contract using such services has a single point of failure, which nullifies any benefits gained from the decentralized nature of smart contracts.
To fill this gap, Chainlink was developed as the first decentralized oracle that can provide external data to smart contracts. As a result, the security and determinism of smart contracts can be combined with the knowledge and breadth of real-world external events. Chainlink will provide a smart contract with access to any external API needed.
As according to chainlink here and here
Blockchains and smart contracts cannot access data from outside of
their network. In order to know what to do, a smart contract often
needs access to information from the outside world that is relevant to
the contractual agreement, in the form of electronic data, also
referred to as oracles. These oracles are services that send and
verify real world occurrences and submit this information to smart
contracts, triggering state changes on the blockchain.

Peers vs Members - Consul

Peer set - The peer set is the set of all members participating in log replication. For Consul's purposes, all server nodes are in the peer set of the local datacenter.
~ Quote from Official docs
What is the difference between peers and members then?
Why do we have following two APIs then? (one is enough?)
i. /status/peers
ii. /agent/members
Could you please shed light on the internal details?
Is there a possibility of inconsistency in results of above APIs?
Here is a comparison of /agent/members/, status/peers/ and catalog/nodes.
The possible difference in response is because each of the API end point get data from different sources.
/catalog/nodes: The request recieved by any agent is redirected to the leader, and leader provides the response from catalog.
/agent/members/: Agent receives the request and return member information obtained from gossip. This can be different from catalog end point (as obvious from log replication mechanism (Consul uses Raft Prorocol) ).
/status/peers/ : This API return the nodes participating in 'log replication'.
Ideally, this should be same as /catalog/node. But if there is a partition in the cluster, it is possible that, until the cluster recover, all members are not taking part in log replication. In this case /catalog/nodes/ and /status/peers/ can give different results.
To understand this proper, you need to know the raft protocol properly. Reference.

RabbitMQ clustering and mirror queues behavior behind the scenes

Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node?
From what I read, it seems that all actions other than publishes go only to the master and the master then broadcasts the effect of the actions to the slaves(this is from the documentation). Form my understanding it means a consumer will always consume message from the master queue. Also, if I send a request to a slave for consuming a message, that slave will do an extra hop by getting to the master for fetching that message.
But what happens when I publish to a slave node? Will this node do the same thing of sending first the message to the master?
It seems there are so many extra hops when dealing with slaves, so it seems you could have a better performance if you know only the master. But how do you handle master failure? Then one of the slaves will be elected master, so you have to know where to connect to?
Asking all of this because we are using RabbitMQ cluster with HAProxy in front, so we can decouple the cluster structure from our apps. This way, whenever a node goes done, the HAProxy will redirect to living nodes. But we have problems when we kill one of the rabbit nodes. The connection to rabbit is permanent, so if it fails, you have to recreate it. Also, you have to resend the messages in this cases, otherwise you will lose them.
Even with all of this, messages can still be lost, because they may be in transit when I kill a node (in some buffers, somewhere on the network etc). So you have to use transactions or publisher confirms, which guarantee the delivery after all the mirrors have been filled up with the message. But here another issue. You may have duplicate messages, because the broker might have sent a confirmation that never reached the producer (due to network failures, etc). Therefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.
Is there a way of avoiding this? Or I have to decide whether I can lose couple of messages versus duplication of some messages?
Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node?
This blog outlines exactly what happens.
But what happens when I publish to a slave node? Will this node do the same thing of sending first the message to the master?
The message will be redirected to the master Queue - that is, the node on which the Queue was created.
But how do you handle master failure? Then one of the slaves will be elected master, so you have to know where to connect to?
Again, this is covered here. Essentially, you need a separate service that polls RabbitMQ and determines whether nodes are alive or not. RabbitMQ provides a management API for this. Your publishing and consuming applications need to refer to this service either directly, or through a mutual data-store in order to determine that correct node to publish to or consume from.
The connection to rabbit is permanent, so if it fails, you have to recreate it. Also, you have to resend the messages in this cases, otherwise you will lose them.
You need to subscribe to connection-interrupted events to react to severed connections. You will need to build in some level of redundancy on the client in order to ensure that messages are not lost. I suggest, as above, that you introduce a service specifically designed to interrogate RabbitMQ. You client can attempt to publish a message to the last known active connection, and should this fail, the client might ask the monitor service for an up-to-date listing of the RabbitMQ cluster. Assuming that there is at least one active node, the client may then establish a connection to it and publish the message successfully.
Even with all of this, messages can still be lost, because they may be in transit when I kill a node
There are certain edge-cases that you can't cover with redundancy, and neither can RabbitMQ. For example, when a message lands in a Queue, and the HA policy invokes a background process to copy the message to a backup node. During this process there is potential for the message to be lost before it is persisted to the backup node. Should the active node immediately fail, the message will be lost for good. There is nothing that can be done about this. Unfortunately, when we get down to the level of actual bytes travelling across the wire, there's a limit to the amount of safeguards that we can build.
herefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.
You can handle this a number of ways. For example, setting the message-ttl to a relatively low value will ensure that duplicated messages don't remain on the Queue for extended periods of time. You can also tag each message with a unique reference, and check that reference at the consumer level. Of course, this would require storing a cache of processed messages to compare incoming messages against; the idea being that if a previously processed message arrives, its tag will have been cached by the consumer, and the message can be ignored.
One thing that I'd stress with AMQP and Queue-based solutions in general is that your infrastructure provides the tools, but not the entire solution. You have to bridge those gaps based on your business needs. Often, the best solution is derived through trial and error. I hope my suggestions are of use. I blog about a number of RabbitMQ design solutions here, including the issues you mentioned, here if you're interested.

Why is transport security only hop to hop?

I have read at several places that transport security is only hop to hop (vs. endpoint to endpoint), and thus has limited use in internet scenarios where there may be several hops in-between your endpoints. First, is this correct? Second, why is transport security only hop to hop? What is preventing the intermediary nodes from simply relaying what they have gotten from their respective receivers?
what they mean when they say that transport security provides only hop-by-hop protection is that at the intermediate steps, the incoming data stream is unencrypted and the intermediary can see the message in plain-text if it so wishes. The intermediary does encrypt the message again before passing it over to next node. So if the intermediary nodes are trusted nodes (your own servers) there is no harm in using transport security, but if the intermediary nodes are not owned by you, these nodes can see the plain message and your data is vulnerable.
This brings me to the question: what are intermediary nodes? Are these the nodes specified in clientVia? So if I don't have any clientVia's (as is mostly the case) can I safely use transport security without the need for message-level security?
References:
http://www.silverlighthack.com/post/2008/12/10/WCF-101-Understanding-Transfer-Security-Visually.aspx. There is a nice diagram here, but per my understanding above, I think part of it is wrong.
http://msdn.microsoft.com/en-us/library/ff647370.aspx