Which part of the CAP Theorem does CRDT Sacrifice? - system

CRDT or Conflict-Free Replicated Data Type follows a strong eventual consistency guarantee, essentially meaning consistency is guaranteed to be achieved at some point in time in the future.
My question is, is the Consistency part of the CAP theorem sacrificed or else which one is?

CRDTs sacrifice consistency to achieve availability at least in the most straightforward utilization of them that does nothing to check that you have received inputs from all potential clients (nodes in the network).
However CRDT is a kind of data structure and is not a distributed algorithm so its behavior in a distributed environment will depend on the full distributed algorithm that they participated in.
Some similar ideas are discussed in https://blog.acolyer.org/2017/08/17/on-the-design-of-distributed-programming-models/:
Lasp is an example of an AP model that sacrifices consistency for availability. In Lasp, all data structures are CRDTs...

Related

What is the need for dynamic consensus in hyperledger projects

I read hyperledger sawtooth supports dynamic consensus, mean the consensus algorithm can be changed dynamically. My question is what is the need or when it is necessary to change the consensus dynamically ?. What forces us to change the consensus dynamically ?
I read the Fabric and Sawtooth documentation. Could not find the necessity for dynamic consensus
Nothing forces any blockchain to change consensus--you can keep the same consensus algorithm forever.
However, consensus algorithms are an active area of research. New and more efficient algorithms are being proposed. It may be that a blockchain may want to switch to a new algorithm. Or perhaps the current algorithm is not suitable. For example, some algorithms are efficient with a few nodes (e.g., PBFT) but are O(n^2), meaning they create an exponentially increasing number of messages as nodes increase and do not scale.
Some consensus algorithms are BFT, Byzantine Fault Tolerant, meaning they withstand bad or malicious actors (nodes). Other algorithms are just CFT, Crash Fault Tolerant, meaning they can withstand a node crashing, but not a bad actor. So one may want to change from a BFT-friendly algorithms (such as PoET SGX).
Hyperledger Sawtooth, by the way, supports PoET, RAFT, and DevMode consensus. The last is for experimental and testing use only--not production. Soon to be added is PBFT consensus. For more detail on Sawtooth consensus, see https://github.com/danintel/sawtooth-faq/blob/master/consensus.rst

Data model design guide lines with GEODE

We are soon going to start something with GEODE regarding reference data. I would like to get some guide lines for the same.
As you know in financial reference data world there exists complex relationships between various reference data entities like Instrument, Account, Client etc. which might be available in database as 3NF.
If my queries are mostly read intensive which requires joins across
tables (2-5 tables), what's the best way to deal with the same with in
memory grid?
Case 1:
Separate regions for all tables in your database and then do a similar join using OQL as you do in database?
Even if you do so, you will have to design it with solid care that related entities are always co-located within same partition.
Modeling 1-to-many and many-many relationship using object graph?
Case 2:
If you know how your join queries look like, create a view model per join query having equi join characteristics.
Confusion:
(1) I have 1 join query requiring Employee,Department using emp.deptId = dept.deptId [OK fantastic 1 region with such view model exists]
(2) I have another join query requiring, Employee, Department, Salary, Address joins to address different requirement
So again I have to create a view model to address (2) which will contain similar Employee and Department data as (1). This may soon reach to memory threshold.
Changes in database can still be managed by event listeners, but what's the recommendations for that?
Thanks,
Dharam
I think your general question is pretty broad and there isn't just one recommended approach to cover all UCs (primarily all your analytical views/models of your data as required by your application(s)).
Such questions involve many factors, such as the size of individual data elements, the volume of data, the frequency of access or access patterns originating from the application or applications, the timely delivery of information, how accurate the data needs to be, the size of your cluster, the physical resources of each (virtual) machine, and so on. Thus, any given approach will undoubtedly require application tuning, tuning GemFire accordingly and JVM tuning regardless of your data model. Still, a carefully crafted data model can determine the extent of such tuning.
In GemFire specifically, such tuning will involve different configuration such as, but not limited to: data management policies, eviction (Overflow) and expiration (LRU, or perhaps custom) settings along with different eviction/expiration thresholds, maybe storing data in Off-Heap memory, employing different partition strategies (PartitionResolver), and so on and so forth.
For example, if your Address information is relatively static, unchanging (i.e. actual "reference" data) then you might consider storing Address data in a REPLICATE Region. Data that is written to frequently (typically "transactional" data) is better off in a PARTITION Region.
Of course, as you know, any PARTITION data (managed in separate Regions) you "join" in a query (using OQL) must be collocated. GemFire/Geode does not currently support distributed joins.
Additionally, certain nodes could host certain Regions, thus dividing your cluster into "transactional" vs. "analytical" nodes, where the analytical-based nodes are updated from CacheListeners on Regions in transactional nodes (be careful of this), or perhaps better yet, asynchronously using an AEQ with AsyncEventListeners. AEQs can be separately made highly available and durable as well. This transactional vs analytical approach is the basis for CQRS.
The size of your data is also impacted by the form in which it is stored, i.e. serialized vs. not serialized, and GemFire's proprietary serialization format (PDX) is quite optimal compared with Java Serialization. It all depends on how "portable" your data needs to be and whether you can keep your data in serialized form.
Also, you might consider how expensive it is to join the data on-the-fly. Meaning, if your are able to aggregate, transform and enrich data at runtime relatively cheaply (compute vs. memory/storage), then you might consider using GemFire's Function Execution service, bringing your logic to the data rather than the data to your logic (the fundamental basis of MapReduce).
You should know, and I am sure you are aware, GemFire is a Key-Value store, therefore mapping a complex object graph into separate Regions is not a trivial problem. Dividing objects up by references (especially many-to-many) and knowing exactly when to eagerly vs. lazily load them is an overloaded problem, especially in a distributed, replicated data store such as GemFire where consistency and availability tradeoffs exist.
There are different APIs and frameworks to simplify persistence and querying with GemFire. One of the more notable approaches is Spring Data GemFire's extension of Spring Data Commons Repository abstraction.
It also might be a matter of using the right data model for the job. If you have very complex data relationships, then perhaps creating analytical models using a graph database (such as Neo4j) would be a simpler option. Spring also provides great support for Neo4j, led by the Neo4j team.
No doubt any design choice you make will undoubtedly involve a hybrid approach. Often times the path is not clear since it really "depends" (i.e. depends on the application and data access patterns, load, all that).
But one thing is for certain, make sure you have a good cursory knowledge and understanding of the underlying data store and it' data management capabilities, particularly as it pertains to consistency and availability, beginning with this.
Note, there is also a GemFire slack channel as well as a Apache DEV mailing list you can use to reach out to the GemFire experts and community of (advanced) GemFire/Geode users if you have more specific problems as you proceed down this architectural design path.

What is Commutative Replicated Data Type, and how does it do Replication without requiring consensus?

I am trying to do research on Commutative Replicated Data Type, and do not find any good definitions that aren't mired down in a ton of technical terms that makes it hard to understand how this allows for replication of data in a distributed environment without using consensus.
In Layman's terms you can think of CRDTs as follows:
CRDTS are a datatype to achieve strong eventual consistency in distributed environments without explicit synchronization. The attractive property of CRDTs is that they are both conflict-free and don't require synchronization, which is a bit confusing at first since you'd think there must be some sort of synchrony, e.g what happens if I write 2 and then 3 to the replicas and replica A receives update 3 before 2 and replica B receives the correct order, first 2 then 3, then we have a conflict?
The key to CRDT is that they are delimited to specific operations in which case the scenario above would not yield any conflict. The simplest scenario is to increment an aggregated value, if A and B just add all the incoming values they will both eventually end up with the value 5, without conflict, relying only on the weak assumptions of eventual consistency. Specifically the operation requirements are typically that the operations are commutative and that operations don't violate causal order.
Basically, CRDTs guarantees that all concurrent operations commute with each other.
Of course if CRDTs could just implement simple summation it would not be very interesting. However, it turns out that clever people have developed CRDT algorithms for more useful things such editing a shared document (see "CRDT Logoot"), grow-only sets, etc.
But still, bare in mind that by removing the need for Consensus, CRDTs are inherently limited and there are many simple things they cannot do.
Hope this description made any sense for you, for a more exact description I think the mathematical definition is the most intuitive.

Clustering: Cluster validation

I want to use some clustering method for large social network dataset. The problem is how to evaluate the clustering method. yes, I can use some external ,internal and relative cluster validation methods. I used Normalized mutual information(NMI) as external validation method for cluster validation based on synthetic data. I produced some synthetic dataset by producing 5 clusters with equal number of nodes and some strongly connected links inside each cluster and weak links between clusters to check the clustering method, Then I analysed the spectral clustering and modularity based community detection methods on this synthetic datasets. I use the clustering with the best NMI for my real world dataset and check the error(cost function) of my algorithm and the result was good. Is my testing method for my cost function is good? or I should also validate clusters of my real word clusters again?
Thanks.
Try more than one measure.
There are a dozen cluster validation measures, and it's hard to predict which one is most appropriate for a problem. The differences between them are not really understood yet, so it's best if you consult more than one.
Also note that if you don't use a normalized measure, the baseline may be really high. So the measures are mostly useful to say "result A is more similar to result B than result C", but should not be taken as an absolute measure of quality. They are a relative measure of similarity.

Travelling Salesman and Map/Reduce: Abandon Channel

This is an academic rather than practical question. In the Traveling Salesman Problem, or any other which involves finding a minimum optimization ... if one were using a map/reduce approach it seems like there would be some value to having some means for the current minimum result to be broadcast to all of the computational nodes in some manner that allows them to abandon computations which exceed that.
In other words if we map the problem out we'd like each node to know when to give up on a given partial result before it's complete but when it's already exceeded some other solution.
One approach that comes immediately to mind would be if the reducer had a means to provide feedback to the mapper. Consider if we had 100 nodes, and millions of paths being fed to them by the mapper. If the reducer feeds the best result to the mapper than that value could be including as an argument along with each new path (problem subset). In this approach the granularity is fairly rough ... the 100 nodes will each keep grinding away on their partition of the problem to completion and only get the new minimum with their next request from the mapper. (For a small number of nodes and a huge number of problem partitions/subsets to work across this granularity would be inconsequential; also it's likely that one could apply heuristics to the sequence in which the possible routes or problem subsets are fed to the nodes to get a rapid convergence towards the optimum and thus minimize the amount of "wasted" computation performed by the nodes).
Another approach that comes to mind would be for the nodes to be actively subscribed to some sort of channel, or multicast or even broadcast from which they could glean new minimums from their computational loop. In that case they could immediately abandon a bad computation when notified of a better solution (by one of their peers).
So, my questions are:
Is this concept covered by any terms of art in relation to existing map/reduce discussions
Do any of the current map/reduce frameworks provide features to support this sort of dynamic feedback?
Is there some flaw with this idea ... some reason why it's stupid?
that's a cool theme, that doesn't have that much literature, that was done on it before. So this is pretty much a brainstorming post, rather than an answer to all your problems ;)
So every TSP can be expressed as a graph, that looks possibly like this one: (taken it from the german Wikipedia)
Now you can run a graph algorithm on it. MapReduce can be used for graph processing quite well, although it has much overhead.
You need a paradigm that is called "Message Passing". It was described in this paper here: Paper.
And I blog'd about it in terms of graph exploration, it tells quite simple how it works. My Blogpost
This is the way how you can tell the mapper what is the current minimum result (maybe just for the vertex itself).
With all the knowledge in the back of the mind, it should be pretty standard to think of a branch and bound algorithm (that you described) to get to the goal. Like having a random start vertex and branching to every adjacent vertex. This causes a message to be send to each of this adjacents with the cost it can be reached from the start vertex (Map Step). The vertex itself only updates its cost if it is lower than the currently stored cost (Reduce Step). Initially this should be set to infinity.
You're doing this over and over again until you've reached the start vertex again (obviously after you visited every other one). So you have to somehow keep track of the currently best way to reach a vertex, this can be stored in the vertex itself, too. And every now and then you have to bound this branching and cut off branches that are too costly, this can be done in the reduce step after reading the messages.
Basically this is just a mix of graph algorithms in MapReduce and a kind of shortest paths.
Note that this won't yield to the optimal way between the nodes, it is still a heuristic thing. And you're just parallizing the NP-hard problem.
BUT a little self-advertising again, maybe you've read it already in the blog post I've linked, there exists an abstraction to MapReduce, that has way less overhead in this kind of graph processing. It is called BSP (Bulk synchonous parallel). It is more freely in the communication and it's computing model. So I'm sure that this can be a lot better implemented with BSP than MapReduce. You can realize these channels you've spoken about better with it.
I'm currently involved in an Summer of Code project which targets these SSSP problems with BSP. Maybe you want to visit if you're interested. This could then be a part solution, it is described very well in my blog, too. SSSP's in my blog
I'm excited to hear some feedback ;)
It seems that Storm implements what I was thinking of. It's essentially a computational topology (think of how each compute node might be routing results based on a key/hashing function to the specific reducers).
This is not exactly what I described, but might be useful if one had a sufficiently low-latency way to propagate current bounding (i.e. local optimum information) which each node in the topology could update/receive in order to know which results to discard.