Practical examples of Client-centric consistency - system

In the Distributed systems : principles and paradigms : by Maarten van Steen and Tanenbaum. 3rd edition
Chapter 7 : Consistency and Replication.
I am trying to understand the difference between Client-centric consistency and Data-centric consistency.
Practical examples of client-centric consistency.
- Monotonic Reads.
- Monotonic Writes.
- Read your writes.
- Writes follow your reads.
There are already examples in the book about web-based mail application for explaining the eventual consistency.
But I'm still not convinced as to where are these exactly used.
If anyone could point me to any material or explain with some practical scenarios where these consistency models are used. It'll be very helpful.
Thanks!

Related

Which part of the CAP Theorem does CRDT Sacrifice?

CRDT or Conflict-Free Replicated Data Type follows a strong eventual consistency guarantee, essentially meaning consistency is guaranteed to be achieved at some point in time in the future.
My question is, is the Consistency part of the CAP theorem sacrificed or else which one is?
CRDTs sacrifice consistency to achieve availability at least in the most straightforward utilization of them that does nothing to check that you have received inputs from all potential clients (nodes in the network).
However CRDT is a kind of data structure and is not a distributed algorithm so its behavior in a distributed environment will depend on the full distributed algorithm that they participated in.
Some similar ideas are discussed in https://blog.acolyer.org/2017/08/17/on-the-design-of-distributed-programming-models/:
Lasp is an example of an AP model that sacrifices consistency for availability. In Lasp, all data structures are CRDTs...

Performance benchmark between boost graph and tigerGraph, Amazon Neptune, etc

This might be a controversial topic, but I am concerned about the performance of boost graph vs commercial software such as TigerGraph, since we need to choose one.
I am inclined to choose Boost, but I am concerned whether performance-wise, boost is good enough.
Disregarding anything around persistence and management, I am concerned with boost graph's core performance of algorithms.
If it is good enough, we can build our application logic on top of it without worry.
Also, I got below benchmarks of LDBC SOCIAL NETWORK BENCHMARK.
LDBC benchmark
seems that TuGraph is the fastest...
Is LDBC's benchmark authoritative in the realm of graph analysis software?
Thank you
I would say that any benchmark request is a controversial topic as they tend to represent a singular workload, which may or may not be representative of your workload. Additionally, performance is only one of the aspects you should look at as each option is built to target different workloads and offers different features:
Boost is a library, not a database, so anything around persistence and management would fall on the application to manage.
TigerGraph is an analytics platform that is focused on running real-time graph analytics, such as deep link analysis.
Amazon Neptune is a fully managed service focused on highly concurrent transactional graph workloads.
All three have strong capabilities and will perform well when used in the manner intended. I'd suggest you figure out which option best matches the type of workload you are looking to run, the type of support you need, and the amount of operational work you are willing to onboard to make the choice more straightforward.

MongoDB / Redis / SQL concurrency pattern: read-modify-write by multiple processes

Relatively DB newbie here.
So I'm facing a recurring problem that multiple processes attempts Read-Modify-Write operations to the same DB instance, be it MongoDB, Redis, or SQL.
In Redis, one solution is to leverage the atomicity of the Redis Lua scripting to guarantee atomicity, but may result moving a considerable amount of application logic onto Redis. (whether good or bad?)
In SQL, it seems there are atomic stored procedures that achieves similar results, but also risking moving too much application logic into the DB itself (whether good or bad?)
MongoDB doesn't even really have a concept of internal scripting (the javascript solution seems to be deprecated)
Then in the general sense, as implied above, it might be good (?) to keep the application logic outside of the data store to achieve maximum application logic distribution and scalability across multiple nodes of services.
But making application logic distributed across multiple processes (nodes) and have them concurrently access the shared data store warrants the read-modify-write cycle to be guarded from possible race conditions.
So my questions are:
for Redis or SQL, should I abuse the provided atomic scripting support to totally avoid any possible race, but putting more and more application logic into the data store, or
is the read-modify-write model more common for the majority of the DB concurrency access, and if yes, are there some "standard" guidelines about how to synchronize the concurrent read-modify-write from multiple processes?
Thank!
Nosql databases are not ACID compliant. These are distributed nosql databases. Example - mongodb, redis, cassandra etc These nosql databases satisfy either CP or AP sections of CAP theorem. ACID compliant databases like RDBMS satisfy AC section of CAP theorem.
The usecase for nosql databases are either heavy read , heavy write nit both. Its mostly related to performance i.e speed , high availability.
Hope i am clear

Vertical AND Horizontal scaling in databases

I am new to databases and am currently comparing RDBMS and Key-Value database systems. I understand that Key-Value database systems like NoSQL are optimized for horizontal scaling and relational database systems are optimized for vertical scaling. Is there a reason why vertical scaling is not efficient in K-V database systems? If not, why aren't K-V database systems used everywhere?
It's not as simple as you think.
There are a couple of articles and talks on this controversial issue. While NoSQL Systems have many benefits but obviously they have underlying issues. Just to mention a few, you may consider NoSQL databases are relevantly new and an organization needs to invest in educating its engineers in order to make use of NoSQL. On the other hand, SQL is too old that means it has a handhold of useful tools for monitoring, analyzing, logging, etc. And the most important problem, is due to CAP principle, you may not have a database architecture that can handle Consistency, Availability and Partition tolerance together. So, NoSQLs are losing some features in order to scale, e.g. they might not be consistent but they have gradual consistency.
But I recommend you to go further and not only rely on my answer. Technology is changing fast and tomorrow you may consider my opinion deprecated! While this is not new, but I consider this article as a good starting point.

How would I implement separate databases for reading and writing operations?

I am interested in implementing an architecture that has two databases one for read operations and the other for writes. I have never implemented something like this and have always built single database, highly normalised systems so I am not quite sure where to begin. I have a few parts to this question.
1. What would be a good resource to find out more about this architecture?
2. Is it just a question of replicating between two identical schemas, or would your schemas differ depending on the operations, would normalisation vary too?
3. How do you insure that data written to one database is immediately available for reading from the second?
Any further help, tips, resources would be appreciated. Thanks.
EDIT
After some research I have found this article which I found very informative for those interested..
http://www.codefutures.com/database-sharding/
I found this highscalability article very informative
I'm not a specialist but the read/write master database and read-only slaves pattern is a "common" pattern, especially for big applications doing mostly read accesses or data warehouses:
it allows to scale (you add more read-only slaves if required)
it allows to tune the databases differently (for either efficient reads or efficient writes)
What would be a good resource to find out more about this architecture?
There are good resources available on the Internet. For example:
Highscalability.com has good examples (e.g. Wikimedia architecture, the master-slave category,...)
Handling Data in Mega Scale Systems (starting from slide 29)
MySQL Scale-Out approach for better performance and scalability as a key factor for Wikipedia’s growth
Chapter 24. High Availability and Load Balancing in PostgreSQL documentation
Chapter 16. Replication in MySQL documentation
http://www.google.com/search?q=read%2Fwrite+master+database+and+read-only+slaves
Is it just a question of replicating between two identical schemas, or would your schemas differ depending on the operations, would normalisation vary too?
I'm not sure - I'm eager to read answers from experts - but I think the schemas are identical in traditional replication scenari (the tuning may be different though). Maybe people are doing more exotic things but I wonder if they rely on database replication in that case, it sounds more like "real-time ETL".
How do you insure that data written to one database is immediately available for reading from the second?
I guess you would need synchronous replication for that (which is of course slower than asynchronous). While some databases do support this mode, not all do AFAIK. But have a look at this answer or this one for SQL Server.
You might look up data warehouses.
These serve as 'normalized for reporting' type databases, while you can keep a normalized OLTP style instance for the data maintenance.
I don't think the idea of 'immediate' equivalence will be a reality. There will be some delay while the new data and changes are migrated in to the other system. The schedule and scope will be your big decisions here.
In regards to questions 2:
It really depends on what you are trying to achieve by having two databases. If it is for performance reasons (which i suspect it may be) i would suggest you look into denormalizing the read-only database as needed for performance. If performance isn't an issue then I wouldn't mess with the read-only schema.
I've worked on similar systems where there would be a read/write database that was only lightly used by administrative users. That database would then be replicated to the read only database during a nightly process.
Question 3:
How immediate are we talking here? Less than a second? 10 seconds? Minutes?