Transaction-Style HTTP requests - sql

I recently ran into such problem:
For each user, I need to do the following on server side:
First
(SQL) Insert user's record with a Unique constraint on ID
Then Parallel
(Http) Subscribe user to Service A, get subscription_id_A
(Http) Subscribe user to Service B, get subscription_id_B
Finally
(SQL) Update user's record with both subscription ids
Ideally I want this entire operation to be transactional, eg if any of http requests or sql fails, it would be as if nothing happened. Added: if Request A fails but B succeeds, I would be stuck: Do I cancel the transaction and end up with an untracked subscription or do I commit it and end up with user missing a subcription
Given that this is likely impossible to achieve, what would be the next best thing I can do?
The service A and B does provide APIs to check for existence of subscriptions and to modify, delete a subscription too, but I want to avoid the Check Then Act style. The SQL server has highest isolation level

This is indeed a standard problem. (Often, developers are not aware of this problem and only find out in production.) There is no standard solution. It is impossible to solve in general (see the http://en.wikipedia.org/wiki/Two_Generals%27_Problem - two systems can never agree with 100% certainty on whether they should commit or abort).
Maybe you can perform all the SQL work first. Insert the user but without subscription IDs. You then try to add the subscriptions one by one and add their IDs in separate transactions once you got them.
Install a background job that periodically checks for users that have been created a long time ago but that do not have subscriptions yet. If you find any discrepancies fix them and log this fact.
This periodic cleanup ensures that temporary failures (which will occur due to network glitches, timeouts, redeployments, bugs, ...) are temporary. It also ensures that they are being detected and reported to developers if you like.
This would be an eventually consistent system. The idea is to first transactionally record the target state (the user and the goal to create two subscriptions) and then have a background job try to converge the data to the target state.

Related

Delete a record after a period of time automatically in SQL Firebird 2.5?

We have a table which has Datetime stamp field when that record was created. How can we create a trigger or procedure to delete a record after 30 days?
Is there any advice how we can run this deletion scheduler?
Firebird doesn't have a scheduler. You will need to create an application that executes a clean up routine on a schedule yourself. You could do this as part of the normal application, or you could write a small application specifically for this purpose, and execute it with the scheduler of your OS (e.g. Windows Scheduled Tasks, or Linux Cron).
Firebird 2.1 introduced global triggers fired on database connection/disconnection and on transaction starting/ending.
https://www.firebirdsql.org/file/documentation/chunk/en/refdocs/fblangref30/fblangref30-ddl-trigger.html
While it is not exactly what you need it can be used to achieve similar results. Whether that similarity is good enough for you or not is for you to evaluate.
to delete a record after 30 days?
The question here is what you do specifically mean here. Would it still be okay, if the row is deleted in 31 day, in 40 days?
In our case, for a client-server office application, there was no time pressure and additionally there was no safe deletion as long as the programs had "documents" open.
We had to delete some global data, and while there were some marks in the database, which documents use them and which documents are currently opened - it was not very reliable. Which also meant that existing method of immediate delete occasionally could lead to application crashes.
So we reformulated a problem similar to yours the following way:
We need rows not deleted immediately but pending for deletion for 30 days or more. Those record would be rendered in the application in a special way, as a warning to users and also providing a way for them to cancel deletion, if they changed their mind (or if other users had different ideas).
The deletion would happen, in logic terms, "when there is no connected application". In technical term it could mean either "when first application is connecting, but before it started actual (business-related) work" or "when last application is disconnecting, after it ended doing actual work". We settled on the latter, we used on disconnect global database trigger.
We had not only main business-domain application, but a number of technical helper utilities. From the Firebird point of view there is no difference in them. So we had to modify "login sequence" in our main application: right after successful login it registered it's own CURRENT_CONNECTION into a special table. This is potentially slightly fragile.
ON DISCONNECT trigger used to do three actions:
it checked, if current_connection is in the table, and if it was - it called a special stored procedure, SP_LOCAL_CLEANUP.
it removed the current_connection from the table (it could had been BEFORE DELETE trigger then to call the procedure, but we decided our helper utilities should have a way to hook in, if they would need, so the call was put in the ON DISCONNECT trigger).
it checked if that table (known connected business-domain applications list) became empty, and if it did - called another special stored procedure, SP_GLOBAL_CLEANUP.
Those stored procedures were "umbrella" procedures, solely consisted of calls to different procedures, which did the actual work of checking for inconsistencies and fixing them. Like, removing marks "this document is opened for editing" if an application (or computer, or network) has crashed without removing the lock normal way. This way we could add or remove functionality without breaking Firebird object dependency chains.
In particular, one of the global sub-procedures looked into the "deletion pending" records, and deleted those "kept in recycle bin" for a time span running over 30 days. Actually, the records just had a column of planned deletion date and that could be more or less than 30 days, but that is technicality.
This meant that the actual deletion was happening "sometimes after 30 days" and it only happened when all main apps were shut down. When later those apps would be run again - they would re-read those global dictionaries tables in the updated, pruned state. The applications never again were in inconsistent state, using records removed from the database.
Potential fragile point: if users would not shut down application in the night, but just go home, it could mean there would never be a state "last application disconnected". This, however, would be a maintenance nightmare for their network admins (Windows updates and reboots, antivirus updates and reboots), so we documented the recommendation that those admins have to make sure at least once a week all the users went all together out of the database.
Potential fragile point: if the Firebird server crashes (not applications, but the server engine), then the "known connections" table would have stale values. We considered it not a practical problems, as then CURRENT_CONNECTION would be restarted as 1 value and go upward, eventually cleaning the table. But we also added a function into helper app, to use SYSDBA and monitoring tables and clean the table off non-existing connections.
You can re-use this framework if you do not have time pressure and you are okay if the actual deletion is deferred for a few days.
You can also use ON TRANSACTION START trigger instead, to shorten the delay to mere minutes, but I expect this would slow down your application badly, so would suggest against it.

How to allow just one out of two simultaneous popping (SELECT + DELETE) on the same row of data in a database relation?

I am currently developing a backend system that has two endpoints of concern that interacts with a common relational database table. The main purpose of this system is an after-registration email verification system that has a time limit.
Let's suppose there are three tables that contain the users that are pending verification, already verified, and out of time for verification. These tables will contain similar attributes of the users. One user (represented by a unique ID) should exist in only one of these tables.
The first endpoint is the verification endpoint, which will be triggered by the user through a verification link (e.g., www.hello.com/verify?token=XXXX). The to-be-verified user will be searched through the pending table. If not found, it means that the token is expired and nothing will be done after. Otherwise, it will be moved to the verified table. Moving, in this case, means that the selected row will be removed from the first table, and then will be inserted into the second table. Therefore, at least 3 queries will be executed as below, with the last two could be on a single transaction.
SELECT * FROM pending WHERE pending.id = id;
DELETE FROM pending WHERE pending.id = id;
INSERT INTO verified VALUES (what we get from SELECT);
The second endpoint is the expired users cleaning endpoint, which will be triggered by some kind of scheduler. Let's assume it will be triggered exactly when the user's verification token just expired. The overall task will be similar to the first endpoint, but the data row will be moved into the out of time table instead, and we assume that the user is already verified when we could not find the specified user when using SELECT.
SELECT * FROM pending WHERE pending.id = id;
DELETE FROM pending WHERE pending.id = id;
INSERT INTO outoftime VALUES (what we get from SELECT);
I believe the problem may arise if these two endpoints are unfortunately triggered at the same time (i.e., the user verify themselves right at the expiration time) by two concurrent processes. Both processes might manage to successfully find the user from SELECT before running DELETE. Therefore, they both will also run INSERT, causing the user data to be inserted into two tables, violating our rule (one user should exist in only one of these tables).
An ideal solution for me would be to find a way to detect and "fail" one of the two processes, which will produce a similar result to the more common situation where that process starts after another process has already done its job (i.e., the second process will terminate when it fails to retrieve a user from SELECT). The choice of the process to be failed is not significant in this case; either of the two would work.
I am aware that using locks is one of the possible solutions theoretically, by covering each critical section with a lock acquisition and release. However, I am not sure whether it is a good practice or not in this problem.
Are there any common design patterns or ideas that could solve this problem?
Please note that no specific technology/database stacks have been chosen yet.
Thanks!
Edit: There are multiple tables in this case since I found that the frequency of access in each type of user may not equal, so we could use different system specifications for each table. For example, the out-of-time table is more like an archive--just a big pile of data with minimal access, while the active table will be accessed every time there are changes to the user; so they might require better hardware, etc. Using a status column seems to be one solution though. However, will there be a similar situation in system design where this kind of problem is inevitable? How it will be dealt with?

Akka.net persistence delete messages from a certain sequence number

Is there a way to delete messages after a certain sequence number in Akka.net? I know that DeleteMessages(seqNumber) deletes all messages before a certain sequence number, is there a way to delete after a seqNumber? The main goal would be to revert to a previous state (perhaps those messages were created in error).
It's obviously possible to edit the database manually (or set is_deleted to true for those events) but I'm not sure if that would be a great idea.
Thanks
DeleteMessages(seqNr) exists only for purpose of saving the space in case when you're using eventsourcing with snapshots, and your system can tolerate incomplete history of events.
Deleting events is against eventsourcing as a concept. Purpose of the event is to describe fact, that has already happened. You cannot alter the past, as there might have been some other sources that already read up that event and updated some state / performed an action according to it.
Correcting effects of events in eventsourced systems usually comes down to producing a compensating event, that is going to reverse effects of the one, you want to fix.

how to deal with race conditions among jobs with e.g. beanstalkd

I am wanting to set up a job queue with multiple workers. Right now I am looking at beanstalkd, but this is more of a conceptual problem, I believe: How can you ensure that jobs related to a single entity get handled in order?
Let's say the workers manage an email platform for some UI. For a given mailbox, jobs need to be performed serially. For example, sometimes a user will want to re-push their password into the mail platform while troubleshooting. So, they change their password, then change it back right away. That's two password-change jobs submitted to beanstalkd.
Now, most of the time this will go fine, as beanstalkd will hand those jobs out to workers in order. However, some transient error like a DNS lookup delay could cause the second password change (back to the proper one) to go through before the first, leaving the mailbox with an incorrect password.
I have thought about introducing semophores/mutexes, and having a 1:1 worker-machine:beanstalkd-server ratio, but even that would only work of the locks requests are granted in the order requested, which doesn't seem fully reliable. Having a queue per entity opens some other options, but this needs to support hundreds of thousands of entities.
Judging by how little discussion around this topic I've found, this must not be as common of a scenario as I initially thought. Does anyone have experience dealing with this problem?
A couple of potential methods come to mind.
As you point out, unless you are changing priorities, Beanstalkd is a FIFO queue. This means that, if only one worker is dealing with changing the password, it would handle the jobs in order.
If there are multiple workers, then you could store meta-data alongside the password - a last modified time (more exactly, when the password change request was made). That time would be set from the job, but if the time that is already in the database (alongside the password) is ever newer than the latest request - the new request would be dropped as out of date.
Depending on the user data storage, you may need additional locking around the database (with an SQL database, this is quite easy, but a file-based store would need additional locking to avoid potential file corruption).

How to Avoid SQL Server hangs due to uncommited transaction caused by poor SW design

The problem: a .NET application trying to save many records to SQL Server. BeginTrans was used, and right before commit a warning messages shows to end user to confirm to proceed to save data or not. The user simply left the computer and go away!!!
Now all other users are unable to access the locked records. Sometimes almost the entire system is affected. Almost all transaction are updating the same records; the confirmation message must be shown after data gets updated, and before commit so if user can rollback. What could be the best solution?
If no solution is found, the last thing i might do is to rollback, show the confirmation message, if user accepts then i will again save the data without any confirmation message (which i don't thing the right way)
My question is: What best i can do? any ideas?
This sounds like a WinForms app? It also sounds like you want to confirm the intent of user's action. Are you in a position to only start the transaction once they confirm they intend to save the data?
Ideally, you should
Prompt the user via [OK | Cancel]
Perform the database transaction
If the result of the transaction is deadlock (or any other failure), inform the user the save operation failed
In other words, the update of records should be a synchronous call.
EDIT: after understanding the specifics as mentioned in the comment below, I would recommend some form of server side task queue that all these requests would need to flow through. Your client would submit a request to the server, and the server application would then become the software responsible for updating records in the database. The clients would make their requests to this application and would be processed in the order they were received. I don't have much experience with inventory tracking software, but understand it's need to be absolutely correct. So this is just a rough idea, I'm sure someone with more experience in inventory tracking will have a better pattern. The proposed pattern creates a large bottleneck on the server that is responsible for updating the records. For example, this pattern would be terrible for someone like Amazon.