Can Distributed SQL tools be applied as alternatives to 2 phase commit or sagas patterns for distributed transaction coordination? - sql

I am currently reading the Microservices Patterns and it says there are mostly two approaches for distributed transactions: two phase commit (2PC) and sagas pattern.
Also, I've heard about currently evolving Distributed SQL (DSQL) tools like CockroachDB, YugabyteDB and YDB, which also support distributed ACID-like transactions via their own low-level db nodes communication.
So the question is, could the latter be applied as an alternative to the former ones?
To illustrate the question, consider the following typical microservices distributed transaction sample. Here we need 2PC or sagas for the red zone coordination.
What I want would be to completely eliminate the need to develop and support coordination from the business logic side moving it to the general DSQL engine:
On the one hand, it is clear that such approach somehow breaks the microservice's responsibility segregation principle. Also, as far as I understand, DSQL tools evolved mostly for replication/sharding tasks, and not for the microservices' business logic coordination. On the other hand, it would very much simplify developing and supporting such solutions.

I think it depends what you want to de-couple.
With distributed SQL databases, many operations on one database has no impact on the other databases in the same cluster. Like rolling upgrades (vs. monolithic databases were you have to take down all applications sharing the same database), scaling-up, scaling-out.
You can also, within the same cluster, dedicate nodes, to specific applications. Or move them to different regions. And with PostgreSQL compatibility, you can serve many use cases (relational, JSON, key-value, timeseries...) with the same database. For these, you benefit from sharing the same infrastructure, manage service provider, skills... and still de-couple applications.
With the need to de-couple more, like replicating asynchronously, YugabyteDB has xCluster replication. And, there are many levels of coordination possible, in the SQL layer, between the application and the data. The PostgreSQL compatibility comes with triggers, which can call external actions, or Foreign Data Wrappers, which can interact with other databases with a standard API.
So I would say distributed databases bring more possibilities between de-coupling everything (like the choice of the database vendor) and full consolidation that would impact applications.

Having separate databases in a microservice architecture has a few different benefits. Some are obviated by using a distributed database--CockroachDB's resilience, scalability, and dynamic load balancing mean that you generally won't need to worry about tuning your database to the workload of an individual service. But others aren't. Having separate databases forces your services to interact through well-defined APIs, which prevents their logic from becoming tightly coupled, allowing them to develop and update independently. If you're using a shared database to enforce consistency, you're necessarily introducing logical coupling no matter the mechanism, in that you need a shared understanding of what needs to be consistent. This may be unavoidable, in which case the key is really to make the consistency logic explicit and legible, which can be a bit trickier when implemented at the Raft level than in an API.

Related

Business rules in DMN or database table?

I'm learning Camunda the workflow engine. I understand that for some long-running processes, process modeling brings many tactical and strategic benefits such as expressiveness, fail-tolerance and observability with additional overhead ofcourse.
The book I'm reading also advocates the use of DMN (decision tables) to bundle business rules inside the process model. The motive is to centralize maintenance and decouple configuration from the code. I'm taking this advice with grain of salt, as decision tables smells somewhat clunky to work with. There is no strong typing and powerful IDE features. I'm used to implementations where business parameters are stored in database table and consumed by the application. The implementation also provides admin GUI to maintain these parameters at runtime.
For what reason I should favor DMN over more solid database based solution?
You are moving the business logic to BPMN to get it out of the code, make it transparent in graphical model, accessible to all stakeholders, support business-IT alignment, empower business to own they business process/logic, support multi-version enactment at runtime, and more...
The same reasoning applies to business rules, which are too complex to be modeled out as graphs in BPMN diagrams. The DMN standard is also aimed at business people and the expression language used is intentionally kept simpler than an Excel formula. It is the "Friendly Enough Expression Language" (FEEL). So you see where this is going.
Database tables
are not well accessible to business users
do not flexible changes to the table structure(s)/schema at runtime
usually do not support multi-version enactment at runtime
do not support a graphical, logical decomposition of rules (into DRDs) unless you work with multiple tables - but db schema are not flexible
cannot be easily deployed to many systems
cannot be easily tested in unit test
likely do not automatically generate audit data, which is accessible for audits and analytics
These are just a few points. So, for business rules, definitely DMN over DB tables.

Private API also should be REST API?

I'm building this website which internally calls some APIs to interact with data in a server,
but not planning to make those APIs officially public.
Even in this case, should I make those RESTful?
There are lots of trade-offs. You say you're prototyping, but may need to implement "seriously".
Firstly, even for prototyping, you get a lot of benefit from sticking with a consistent API approach, ideally based on client and server libraries in your framework of choice. Common choices are "synchronous/asynchronous", "function-based/resource-based", "JSON/XML" etc. Mixing and matching those choices just makes everything much harder.
Some business domains are great for resource-based API structures. Order management systems, social networks, question-and-answer web sites all work well. Some are not so easy to represent as resources - real-time/IoT applications, chat/messaging systems, etc.
If you decide that "synchronous" and "resource based" are a good fit for your business domain, you may as well take advantage of the libraries that exist to build and consume RESTful APIs. You can decide for yourself how "pure" and "future-proof" you want to make those APIs. You may not care about versioning, for instance.
If "synchronous" and "resource-based" are not a good fit, I'd not try to shoe horn them into a RESTful API design.
The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction. -- Fielding, 2000
REST is software design on the scale of decades: every detail is intended to promote software longevity and independent evolution. Many of the constraints are directly opposed to short-term efficiency. -- Fielding, 2008
That doesn’t mean that I think everyone should design their own systems according to the REST architectural style. REST is intended for long-lived network-based applications that span multiple organizations. If you don’t see a need for the constraints, then don’t use them. -- Fielding, 2008
For a private API, you probably control both the clients and the server, and its unlikely that you are going to be facing Web Scale[tm] traffic levels where you will need to offload work to caches.
Which means that you aren't likely to get full return on your investment.

When is tight coupling essential or a good thing?

From all my readings and research on OO design/patterns/principles I've found that the general consensus is that loose coupling (and high cohesion) is the almost always the better design. I completely agree speaking from my past software project experiences.
But let's say some particular software company (which I don't work at) has some questionably designed large scale software that interacts with some hardware. The modules (that I never worked on) are so tightly coupled and function calls that goes 20+ levels deep to manage states. Class boundaries are never clearly defined and use cases poorly thought up. A good software developer (not me) would bring up these issues but only get turned down by the more senior developers that development practices (like SOLID or TDD) doesn't really apply because the software has worked for years using the "traditional" methodology, and it's too late to change. And the biggest complains from the customers (which I don't know who they are) are of the quality of the product.
Because of the above unrealistic scenario (I was never apart of), I thought about if there are cases where tight coupling is preferred or even required? When are there cases where developer needs to cross module boundaries and share states and increase dependency and reduce testability? What are some examples of systems that's so complex that would require this? I couldn't come up with a good case myself so I'm hoping some of the more experienced craftsmen can help me out.
Thanks. Again, I don't know this company.
A tightly coupled architecture integrates enterprise applications around a single point of truth, which is often a single spatially-enabled RDBMS. The types of applications that are linked include engineering design (CAD), facility records management (GIS), asset management, workflow, ERP, CRM, outage management, and other enterprise applications.
A major advantage of a tightly coupled architecture is that it enables the rapid and efficient processing of large volumes of data, provides a single point of truth instead of several, often redundant, data sources, and enables open access to data throughout the organization.
Tightly coupled architectures rely on standards such as SQL, ODBC, JDBC, and OLEDB, SQL/MM, and the Simple Feature Specification for SQL from the OGC, to provide open and secure access to data, including geo-spatial data, throughout the organization.
Loosely coupled Web services require substantial redundancies unlike tight coupling between clients and service, which minimizes redundancies.
One problem with asynchronous loosely coupled Web services is that for some business functions, it can exceed its resource capacity for the message queuing servers or system.
Loosely coupled Web services can be made to switch to tight coupling mode to avoid system overloads of scarce resources.

Does a CQRS project need a messaging framework like NServiceBus?

The last 6 months learning curve have been challenging with CQRS and DDD the main culprits.
It has been fun and we are 1/2 way through our project and the area I have not had time to delve into is a messaging framework.
Currently I don't use DTC so there is a very good likely hood that if my read model is not updated then I will have inconsistency between the read and write databases. Also my read and write database will be on the same machine. I doubt we will ever put them on separate machines.
I don't have a large volume of messages in my system so my concern is more to do with consistency and reliability of the system.
So, do I have to put in a messaging framework like NServiceBus (even though both read and write databases are on the same machine) or do I have other options? Yes there is learning curve but I suppose there would be a hell of a lot to learn if I don't use it.
Also, I don't want to put in a layer if it is not necessary
Thoughts?
Currently I don't use DTC so there is a very good likely hood that if
my read model is not updated then I will have inconsistency between
the read and write databases.
Personally, I dislike the DTC and try to avoid it. Instead, it is often possible to implement a compensation mechanism, especially for something like a read model where eventual consistency is already acceptable and updates are idempotent. For example, you could implement a version on entities and have a background task which ensures versions are in-sync. Having a DTC will provide transactional retry functionality, but it still won't solve cases where failure occurs after retries - you still have to watch the error log and have procedures in place to deal with errors.
So, do I have to put in a messaging framework like NServiceBus (even
though both read and write databases are on the same machine) or do I
have other options?
It depends on a few things. What you often encounter in a CQRS system is need for pub/sub where several sub-systems publish events to which the query/caching system subscribes to. If you see a need for pub/sub beyond basic point-to-point messaging, then go with something like NServiceBus. Also, I wouldn't immediately shy away from using NServiceBus even if you don't need it for scalability purposes because I think the logical partitioning is beneficial on its own. On the other hand, as you point out, adding layers of complexity is costly, therefore first try to see if the simplest possible thing will work.
Another question to ask is whether you need a separate query store at all. If all you have is a single machine, why bother? You could use something simpler like the read-model pattern and still reap a lot of the benefits of CQRS.
Does a CQRS project need a messaging framework like NServiceBus?
The short answer: no.
It is the first time I hear about the 'read-model pattern' mentioned by eulerfx. It is a nice enough name but there is a bit more to it:
The general idea behind the 'query' part is to query a denormalized view of your data. In the 'read-model pattern' link you will notice that the query used to populate the read-model is doing some lifting. In the mentioned example the required data manipulation is not that complex but what if it does become more complex? This is where the denomalization comes in. When you perform your 'command' part the next action is to denormalize the data and store the results for easy reading. All the heavy lifting should be done by your domain.
This is why you are asking about the messaging. There are several techniques here:
denormalized data in same database, same table, different columns
denormalized data in same database, different table
denormalized data in different database
That's the storage. How about the consistency?:
immediately consistent
eventually consistent
The simplest solution (quick win) is to denormalize your data in your domain and then after saving your domain objects through the repository you immediately save the denomarlized data to the same data store, same table(s), different columns. 100% consistent and you can start reading the denormalized data immediately.
If you really want to you can create a separate bunch of objects to transport that data but it is simpler to just write a simple query layer that returns some data carrying object provided by your data-access framework (in the case of .Net that would be a DataRow/DataTable). Absolutely no reason to get fancy. There will always be exceptions but then you can go ahead and write a data container.
For eventual consistency you will need some form of queuing and related processing. You can roll your own solution or your may opt for a service bus. That is up to you and your time / technical constraints :)
BTW: I have a free open-source service bus here:
Shuttle.Esb
documentation
Any feedback would be welcomed. But any old service bus will do (MassTransit / NServiceBus / etc.).
Hope that helps.

Why do O/R Mappers seem to be the favored persistence approach for DDD users?

Why aren't there many DDD stories that use newer nosql tools such as db4o and Cassandra for persistence.
In my view, the effort involved in O/R mapping seems too high for the returned value. Being able to report right off the database is the main advantage I can see for my projects.
On the other hand, db4o seems to almost be the Repository pattern and Cassandra's concept of Column Families and SuperColumns seems to be perfect for defining Aggregates and their value objects (the scalability would just be an added bonus). Yet, most of the online resources giving examples of DDD projects seem to always default to using [N]Hibernate.
I don't have enough time/resources to take big risks by trying these newer tools on my projects which makes me want to opt for a very well documented approach to persistence. Is it possible that O/R mapping remains the norm just because people are afraid to give up the oh so reliable SQL? Should I make the leap?
From what I've seen, DDD is most common in long-lived, business-oriented code bases. That's an area where the SQL database mindset reigns almost unchallenged so far. Some factors that play into that:
People writing long-lived code bases tend to like technologies that have been around a long time.
Large, business-oriented projects often take place in large businesses, which are naturally conservative.
If you are starting your project with any existing data, it's likely to be in an SQL database to start with, and existing code likely tied to that.
Most business projects are not very performance-sensitive, at least not in the same way that purely technical or consumer-focused efforts are.
And I'm sure there are more.
If you can't afford the financial risks that come with trying novel tools, then you should probably stick with the known thing. Some of the alternative persistence approaches are fantastic, and can get you radically more performance depending on need. But they are all early in their lifecycles. Although SQL databases have a lot of limitations, at least those limitations are pretty well known, both by you and the developers who will inherit your code.
Relational databases are designed for a specific category of use cases, particularly in business applications. As such, they have certain capabilities that are valuable in these scenarios. Data retrieval is often accompanied by sophisticated search and analysis. If you use NoSql or object databases, you may be giving up some of these capabilities in favor of others, such as the handling of huge, distributed datasets, a task at which NoSQL databases typically excel.
In other words, you may need more capabilities than just data persistence, capabilities which relational databases already provide. Relational databases are a mature, well-known and predictable technology, with many experts having abundant expertise in them. All of these reasons are good reasons for continuing to choose relational databases over more "exotic" solutions.