How does Two Phase Commit really work at low level? - sql

There are tons of articles on 2 phase commit on the internet.
They are all saying the same thing and I am not getting it. I need a low-level understanding of it.
The orders and Payment service example is the most popular example on the internet.
Let's say we have Orders service and Payments service. When an order is placed, the Order service writes it to its database but the payment service must also write it to its own database in order for the transaction to be complete.
Here is my inadequate understanding:
User sends a place order request to Orchestrator
Orchestrator invokes Orders service to as well as Payment service at the same time. Now according to what I have read, Order and Payment services are supposed to respond to Orchestrator by telling it whether or not they are ready. What does that mean? What does it mean to be ready here?
Order and Payment service respond back, telling Orchestrator that they are "ready" (whatever that means).
Orchestrator sends another request to both the services (commit request).
Order writes the record to its database. The Payment service writes the record to its own database. They both respond back with status 200 to Orchestrator.
Orchestrator checks if both of the participants have returned status code 200. If yes, then it does nothing. If no then it asks them to ABORT?? How? One of the participants already wrote the transaction to its database.

Two-phase commit is all about handling failures and what they mean.
In phase 1, the orchestrator tells the order and payment services to prepare to commit. If everything goes well, they both respond with "prepared", which means:
The transaction is recorded durably, and marked prepared
There is no possibility that it will need to be rolled back due to conflicts or for any other reason, except the failure of some other service to prepare. Once the transaction is prepared, only the orchestrator can normally roll it back.
If both the order and payment processors successfully prepare, then the orchestrator will tell them finalize the commit. Finalized means:
The transaction is recorded durably and marked committed
It cannot be rolled back.
If anything goes wrong during this process, it's possible to check the durably recorded states of the transaction in both the payment and order services in order to determine whether it "really happened" and how to recover:
If not all of the services prepare successfully, then the transaction did not happen. The orchestrator will roll back the transaction in all the services that did prepare it. If things are broken, this may require manual intervention, or the orchestrator may not be able to complete this operation until things come back up.
If all of the services did prepare successfully, then the transaction did happen. The orchestrator will tell all the services that haven't finalized it to go ahead and do that. Again, this might have to wait until systems that are down come back up.
Also, sometimes it's not the orchestrator's job to recover. If the orchestrator gives up, then the individual services can check with each other to see if the transaction happened or not.
The important point is that once the two-phase commit starts, no matter what happens you can return the system to a consistent state by checking the durable transaction records.
In practice, two-phase commit is not used that often, because when transactions are in the prepared-but-not-finalized state, any other transactions that use their data cannot themselves commit, because they might need to be rolled back. This kind of contentions where transactions need to wait for each other slows the whole system down.

I will explain the steps of a successful purchase to you. The customer registers an order and then goes to the payment gateway and makes the payment there and returns to the store site and the payment record is recorded and the operator can track the steps.
Order table
Id int,
CustomerName nvarchar(150),
TotalPrice int,
SendState tinyint,
InsertDate datetime
Payment table
Id int,
OrderID int,
PayState tiniyint,
PayPrice int
The customer now registers an order in the database
Insert into Order values(1,'bill',30000,1,'5/1/2021 8:30:52 AM')
Now the customer connects to the payment gateway and makes the payment successfully and the store site returns.
Insert into Payment values(1,1,200,30000)
The results are now displayed for the operator
select o.*,p.PayState,p.PayPrice
from Order o join Payment p on o.Id = p.OrderId
If you do not pay, an error of 500 will be registered in the database and the operator, seeing the status of 500, will understand that the payment was not successful.
Insert into Payment values(1,1,500,0)

Related

Task queue using rabbitMQ

We have a system that receive payment information for invoices from the bank system. It works like that:
Invoice is created in our system
Payment for the invoice is done through Bank system. Bank system request invoice details from our system and it returns the invoice details
Bank system goes through payment process, and sends payment details to our system and it will wait for confirm message for 30 seconds. If the bank system does not receive confirm message within 30 seconds, bank cancels the payment but does not inform our system about cancellation.
Our system receives the payment info, and saves the payment. Then
sends the confirm message to bank. But sometimes, because of network
or system issue, confirm message won't be delivered within 30
seconds and we became unaware of cancelled message status.
So, the problem is our system saves the payment but sometimes cannot respond on time for payment confirmation request (within 30 secs), in this case, bank cancels the payment and our system doesn't know the payment is cancelled.
I have developed a solution that checks an each payment if the payment is successful (after 30 seconds of receiving the payment) by sending a request to check payment method Bank provided. Tasks (send the payment id to check_payment method of bank system - it returns payment status) are executed in separate threads using thread pool of Spring framework. But I am afraid that it is not the best solution as there is a risk of being full of thread pool when network failure happens .
What solutions would you recommend? Can we use RabbitMQ for this issue?
You are essentially implementing stateful orchestration. I would recommend looking into Cadence Workflow that capable of supporting your use case with minimal effort.
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.

Maintain Consistency in Microservices [duplicate]

What is the best way to achieve DB consistency in microservice-based systems?
At the GOTO in Berlin, Martin Fowler was talking about microservices and one "rule" he mentioned was to keep "per-service" databases, which means that services cannot directly connect to a DB "owned" by another service.
This is super-nice and elegant but in practice it becomes a bit tricky. Suppose that you have a few services:
a frontend
an order-management service
a loyalty-program service
Now, a customer make a purchase on your frontend, which will call the order management service, which will save everything in the DB -- no problem. At this point, there will also be a call to the loyalty-program service so that it credits / debits points from your account.
Now, when everything is on the same DB / DB server it all becomes easy since you can run everything in one transaction: if the loyalty program service fails to write to the DB we can roll the whole thing back.
When we do DB operations throughout multiple services this isn't possible, as we don't rely on one connection / take advantage of running a single transaction.
What are the best patterns to keep things consistent and live a happy life?
I'm quite eager to hear your suggestions!..and thanks in advance!
This is super-nice and elegant but in practice it becomes a bit tricky
What it means "in practice" is that you need to design your microservices in such a way that the necessary business consistency is fulfilled when following the rule:
that services cannot directly connect to a DB "owned" by another service.
In other words - don't make any assumptions about their responsibilities and change the boundaries as needed until you can find a way to make that work.
Now, to your question:
What are the best patterns to keep things consistent and live a happy life?
For things that don't require immediate consistency, and updating loyalty points seems to fall in that category, you could use a reliable pub/sub pattern to dispatch events from one microservice to be processed by others. The reliable bit is that you'd want good retries, rollback, and idempotence (or transactionality) for the event processing stuff.
If you're running on .NET some examples of infrastructure that support this kind of reliability include NServiceBus and MassTransit. Full disclosure - I'm the founder of NServiceBus.
Update: Following comments regarding concerns about the loyalty points: "if balance updates are processed with delay, a customer may actually be able to order more items than they have points for".
Many people struggle with these kinds of requirements for strong consistency. The thing is that these kinds of scenarios can usually be dealt with by introducing additional rules, like if a user ends up with negative loyalty points notify them. If T goes by without the loyalty points being sorted out, notify the user that they will be charged M based on some conversion rate. This policy should be visible to customers when they use points to purchase stuff.
I don’t usually deal with microservices, and this might not be a good way of doing things, but here’s an idea:
To restate the problem, the system consists of three independent-but-communicating parts: the frontend, the order-management backend, and the loyalty-program backend. The frontend wants to make sure some state is saved in both the order-management backend and the loyalty-program backend.
One possible solution would be to implement some type of two-phase commit:
First, the frontend places a record in its own database with all the data. Call this the frontend record.
The frontend asks the order-management backend for a transaction ID, and passes it whatever data it would need to complete the action. The order-management backend stores this data in a staging area, associating with it a fresh transaction ID and returning that to the frontend.
The order-management transaction ID is stored as part of the frontend record.
The frontend asks the loyalty-program backend for a transaction ID, and passes it whatever data it would need to complete the action. The loyalty-program backend stores this data in a staging area, associating with it a fresh transaction ID and returning that to the frontend.
The loyalty-program transaction ID is stored as part of the frontend record.
The frontend tells the order-management backend to finalize the transaction associated with the transaction ID the frontend stored.
The frontend tells the loyalty-program backend to finalize the transaction associated with the transaction ID the frontend stored.
The frontend deletes its frontend record.
If this is implemented, the changes will not necessarily be atomic, but it will be eventually consistent. Let’s think of the places it could fail:
If it fails in the first step, no data will change.
If it fails in the second, third, fourth, or fifth, when the system comes back online it can scan through all frontend records, looking for records without an associated transaction ID (of either type). If it comes across any such record, it can replay beginning at step 2. (If there is a failure in step 3 or 5, there will be some abandoned records left in the backends, but it is never moved out of the staging area so it is OK.)
If it fails in the sixth, seventh, or eighth step, when the system comes back online it can look for all frontend records with both transaction IDs filled in. It can then query the backends to see the state of these transactions—committed or uncommitted. Depending on which have been committed, it can resume from the appropriate step.
I agree with what #Udi Dahan said. Just want to add to his answer.
I think you need to persist the request to the loyalty program so that if it fails it can be done at some other point. There are various ways to word/do this.
1) Make the loyalty program API failure recoverable. That is to say it can persist requests so that they do not get lost and can be recovered (re-executed) at some later point.
2) Execute the loyalty program requests asynchronously. That is to say, persist the request somewhere first then allow the service to read it from this persisted store. Only remove from the persisted store when successfully executed.
3) Do what Udi said, and place it on a good queue (pub/sub pattern to be exact). This usually requires that the subscriber do one of two things... either persist the request before removing from the queue (goto 1) --OR-- first borrow the request from the queue, then after successfully processing the request, have the request removed from the queue (this is my preference).
All three accomplish the same thing. They move the request to a persisted place where it can be worked on till successful completion. The request is never lost, and retried if necessary till a satisfactory state is reached.
I like to use the example of a relay race. Each service or piece of code must take hold and ownership of the request before allowing the previous piece of code to let go of it. Once it's handed off, the current owner must not lose the request till it gets processed or handed off to some other piece of code.
Even for distributed transactions you can get into "transaction in doubt status" if one of the participants crashes in the midst of the transaction. If you design the services as idempotent operation then life becomes a bit easier. One can write programs to fulfill business conditions without XA. Pat Helland has written excellent paper on this called "Life Beyond XA". Basically the approach is to make as minimum assumptions about remote entities as possible. He also illustrated an approach called Open Nested Transactions (http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper142.pdf) to model business processes. In this specific case, Purchase transaction would be top level flow and loyalty and order management will be next level flows. The trick is to crate granular services as idempotent services with compensation logic. So if any thing fails anywhere in the flow, individual services can compensate for it. So e.g. if order fails for some reason, loyalty can deduct the accrued point for that purchase.
Other approach is to model using eventual consistency using CALM or CRDTs. I've written a blog to highlight using CALM in real life - http://shripad-agashe.github.io/2015/08/Art-Of-Disorderly-Programming May be it will help you.

Is good practice to extract request into multiple commands?

At my system when a new transaction added to the system, that request has the following information
A) the client that made the transaction
B) if the transaction will be paid by installments, and the frequency
of the installments (monthly, evvery 15 days etc.)
Also if the transaction will not be paid by installments the analysis data must get updated (clear-in of current month etc).
So when a new transaction submitted by the user the following must be done
1) Add a new client if the client in the request does not exist
2) Add the new transaction into the database
and if there are installments
3) add the installments into the database
else
4) update analysis data
So the solution to this is at my controller AddnewTransactionController extract the request into two separate commands AddNewClientCommand AddNewTransactionCommand and invoke the associated command handlers AddNewClientCommandHadler AddNewTransactionCommandHandler.
Also the AddNewTransactionCommandHandler will have domain services injected like UpdateanalysisData.
Does the above consider a good solution from architectural point of view?
I would normally expect to that approach implemented as a process, rather than as a collection of commands.
The client commits to an order, which is to say some remote entity outside the boundary of our solution offers to us the opportunity to earn some business value. So the immediate priority is to capture that opportunity.
So you write that opportunity to your durable store, and publish a domain event.
In response to the domain event, a bunch of other commands can now be fired (extracting the data that they need from either the domain event, or the representation of the opportunity in the store).

Nservicebus Sequence

We have a requirement for all our messages to be processed in the order of arrival to MSMQ.
We will be exposing a WCF service to the clients, and this WCF service will post the messages using NServiceBus (Sendonly Bus) to MSMQ.
We are going to develop a windows service(MessageHandler), which will use Nservicebus to read the message from MSMQ and save it to the database. Our database will not be available for few hours everyday.
During the db downtime we expect that the process to retry the first message in MSMQ and halt processing other messages until the database is up. Once the database is up we want NServicebus to process in the order the message is sent.
Will setting up MaximumConcurrencyLevel="1" MaximumMessageThroughputPerSecond="1" helps in this scenario?
What is the best way using NServiceBus to handle this scenario?
We have a requirement for all our messages to be processed in the
order of arrival to MSMQ.
See the answer to this question How to handle message order in nservicebus?, and also this post here.
I am in agreement that while in-order delivery is possible, it is much better to design your system such that order does not matter. The linked article outlines the following soltuion:
Add a sequence number to all messages
in the receiver check the sequence number is the last seen number + 1 if not throw an out of sequence exception
Enable second level retries (so if they are out of order they will try again later hopefully after the correct message was received)
However, in the interest of anwering your specific question:
Will setting up MaximumConcurrencyLevel="1"
MaximumMessageThroughputPerSecond="1" helps in this scenario?
Not really.
Whenever you have a requirement for ordered delivery, the fundamental laws of logic dictate that somewhere along your message processing pipeline you must have a single-threaded process in order to guarantee in-order delivery.
Where this happens is up to you (check out the resequencer pattern), but you could certainly throttle the NserviceBus handler to a single thread (I don't think you need to set the MaximumMessageThroughputPerSecond to make it single threaded though).
However, even if you did this, and even if you used transactional queues, you could still not guarantee that each message would be dequeued and processed to the database in order, because if there are any permanent failures on any of the messages they will be removed from the queue and the next message processed.
During the db downtime we expect that the process to retry the first
message in MSMQ and halt processing other messages until the database
is up. Once the database is up we want NServicebus to process in the
order the message is sent.
This is not recommended. The second level retry functionality in NServiceBus is designed to handle unexpected and short-term outages, not planned and long-term outages.
For starters, when your NServiceBus message handler endpoint tries to process a message in it's input queue and finds the database unavailable, it will implement it's 2nd level retry policy, which by default will attempt the dequeue 5 times with increasing infrequency, and then fail permanently, sticking the failed message in it's error queue. It will then move onto the next message in the input queue.
While this doesn't violate your in-order delivery requirement on its own, it will make life very difficult for two reasons:
The permanently failed messages will need to be re-processed with priority once the database becomes available again, and
there will be a ton of unwanted failure logging, which will obfuscate any genuine handling errors.
If you have a regular planned outages which you know about in advance, then the simplest way to deal with them is to implement a service window, which another term for a schedule.
However, Windows services manager does not support the concept of service windows, so you would have to use a scheduled task to stop then start your service, or look at other options such as hangfire, quartz.net or some other cron-type library.
It kinds of depends why you need the messages to arrive in order. If it's like you first receive an Order message and then various OrderLine messages that all belong to a certain order, there are multiple possibilities.
One is to just accept that there can be OrderLine messages without an Order. The Order will come in later anyway. Eventual Consistency.
Another one is to collect messages (and possible state) in an NServiceBus Saga. When normally MessageA needs to arrive first, only to receive MessageB and MessageC later, give all three messages the ability to start the saga. All three messages need to have something that ties them together, like a unique GUID. Then the saga will make sure it collects them properly and when all messages have arrived, perhaps store its final state and mark the saga as completed.
Another option is to just persist all messages directly into the database and have something else figure out what belongs to what. This is a scenario useful for a data warehouse where the data just needs to be collected, no matter what. Some data might not be 100% accurate (or consistent) but that's okay.
Asynchronous messaging makes it hard to process them 100% in order, especially when the client calling the WCF is making mistakes and/or sending them out of order. It wouldn't be the first time I had such a requirement and out-of-order messages.

PayPal API transaction final status

I am using PayPalAPIInterfaceClient (soap service) to get information about transaction (method GetTransactionDetails()) and need to be absolutely sure about transaction status (it means - money has been sent no matter in which direction).
When the transaction is really completed and when is still "on the road"?
For example: I assume, Processed will be followed by InProgress and finally changed to Completed or something like this. On the other hand, Denied or - I don't know - Voided will not change in future.
Can you help me please to decide, which status can be accepted as ultimate (like Completed, but may be even Completed must not mean final money transfer) and which ones are still in one of its sub-state?
I would expect simple "Money finally transferred" & "Money finally not transferred" result, but reality is different.
Shortly, to mirror transaction result into database and manage automatic transactions (from and to client) I need to know this.
I am using the PaymentStatusCodeType enumeration values and my service iterates transaction history to check if the money was transferred or not.
Completed means it's done. You may also want to look into Instant Payment Notification (IPN). It sends real-time updates when transactions hit your PayPal account so you can automate post-transaction tasks accordingly. This includes handling e-checks or other pending payments which won't complete for a few days, refunds, disputes, etc.