Akka.net persistence delete messages from a certain sequence number - akka.net

Is there a way to delete messages after a certain sequence number in Akka.net? I know that DeleteMessages(seqNumber) deletes all messages before a certain sequence number, is there a way to delete after a seqNumber? The main goal would be to revert to a previous state (perhaps those messages were created in error).
It's obviously possible to edit the database manually (or set is_deleted to true for those events) but I'm not sure if that would be a great idea.
Thanks

DeleteMessages(seqNr) exists only for purpose of saving the space in case when you're using eventsourcing with snapshots, and your system can tolerate incomplete history of events.
Deleting events is against eventsourcing as a concept. Purpose of the event is to describe fact, that has already happened. You cannot alter the past, as there might have been some other sources that already read up that event and updated some state / performed an action according to it.
Correcting effects of events in eventsourced systems usually comes down to producing a compensating event, that is going to reverse effects of the one, you want to fix.

Related

Fix inconsistent state right away or lazily when data is requested

Our users go through several steps of workflow - the further they go the more objects we create. We also allow users to go back to Step#1 and change one of the existing objects. Which may cause inconsistencies so we must update/delete some of the objects at Step#2. I see 2 options:
Update/delete objects from Step#2 right away. This leads to:
Operation that's supposed to be a simple PATCH of an entity field becomes complicated. And it's a shared object between multiple workflows - so we'll have to add if-statements and do different things depending on the workflow.
Circular dependencies. Operations on Step#1 have to know about objects/operations on Step#2.
On each request in Step#1 we'd have to load data for Step#2 in order to determine whether Step#2 really needs to be updated. Which slows down operations on Step#1. So to change 1 record in DB we'll have to load hundreds (or even thousands) records for Step#2.
Many actions on Step#1 may need fixing state at Step#2. So we have to ensure we don't forget anything today and in the future.
Fix Step#2 lazily - when user goes there (our current approach). Step#2 will recognize that objects are inconsistent and fix them. Which leads to just 1 place where we need to care, but:
Until user opens Step#2 - DB will contain inconsistent objects. This hasn't resulted in any problems so far. But I can imagine it may complicate future SQL migrations.
We update DB state on GET request. This one doesn't seem like that big of a deal since GET stays idempotent anyway. But still it feels awkward.
Anyone knows better approaches? Or maybe improvements to these two?
Update
I haven't found perfect solution, but eventually we implemented an improved version of #1. When updating state on Step#1 we also set a flag "need to rebuild Step#2", when UI opens Step#2 it first checks this flag and issues a PUT to rebuild the state, and only then it GETs Step#2.
This still means that DB state is inconsistent for some period of time. But at least we'll know this for sure from the flag in DB. And if needed - we could write migrations taking this flag into account. This also allows (if needed in the future) to create an async job to fix the state.
I think it is more flexible to separate the state and the context where the objects are stored. Any creation of a new object at any step is accompanied by the preservation of the invariant and consistency of context.
There are separate rules of states - these are rules for transition from one to another and available objects for creation and separate rules for the context, rules for its consistency, which is ensured every time it changes.
What about dirty data asynchronous cleanup?
Whenever user goes back to Step #1 and changes something, mark all related data as "dirty" (e.g. add links to it in "DirtyData" table) and be done for now.
Have a DataCleanup worker (e.g. separate thread or smth) that constantly looks for data to be cleaned up.
Before editing data for Step #2, check if the data is not dirty.
Depending on your logic, 3) might result in user error (e.g. user would need to repeat Step #2). If DataCleanup worker has enough resources (i.e. it processes DirtyData table almost instantaneously), that should happen only on very rare occasions. If that is not OK, you could opt for checking for dirty data on each fetch, but that could be expensive.
It sounds like you're familiar with the HTTP spec regarding GET requests, but for future readers:
Why shouldn't a GET request change data on the server?
Why is using a HTTP GET to update state on the server in a RESTful call incorrect?
For the other bullet under 2, we probably don't need a specification to agree that persisting valid data is preferable to persisting invalid data.
So what can we do for the bullets under 1 to avoid complex branching logic in a particular step and also circular dependencies? My suggestion is an event-driven design. When step #2 changes it should fire a change event. In this scenario, step #2 has no knowledge of the concrete listener(s) who may receive its events, so it remains decoupled from any complex handling logic.
There's probably no way to guarantee you don't forget anything in the future; but if every step in the workflow is defined as a listener, it forces you to consider change events to some extent every time you implement a new step.
One side note on granularity: if a step has many changes, it can batch up its events rather than fire each one individually. You can adjust the size for efficiency.
In summary, I would strongly consider the Observer design pattern.

how to deal with race conditions among jobs with e.g. beanstalkd

I am wanting to set up a job queue with multiple workers. Right now I am looking at beanstalkd, but this is more of a conceptual problem, I believe: How can you ensure that jobs related to a single entity get handled in order?
Let's say the workers manage an email platform for some UI. For a given mailbox, jobs need to be performed serially. For example, sometimes a user will want to re-push their password into the mail platform while troubleshooting. So, they change their password, then change it back right away. That's two password-change jobs submitted to beanstalkd.
Now, most of the time this will go fine, as beanstalkd will hand those jobs out to workers in order. However, some transient error like a DNS lookup delay could cause the second password change (back to the proper one) to go through before the first, leaving the mailbox with an incorrect password.
I have thought about introducing semophores/mutexes, and having a 1:1 worker-machine:beanstalkd-server ratio, but even that would only work of the locks requests are granted in the order requested, which doesn't seem fully reliable. Having a queue per entity opens some other options, but this needs to support hundreds of thousands of entities.
Judging by how little discussion around this topic I've found, this must not be as common of a scenario as I initially thought. Does anyone have experience dealing with this problem?
A couple of potential methods come to mind.
As you point out, unless you are changing priorities, Beanstalkd is a FIFO queue. This means that, if only one worker is dealing with changing the password, it would handle the jobs in order.
If there are multiple workers, then you could store meta-data alongside the password - a last modified time (more exactly, when the password change request was made). That time would be set from the job, but if the time that is already in the database (alongside the password) is ever newer than the latest request - the new request would be dropped as out of date.
Depending on the user data storage, you may need additional locking around the database (with an SQL database, this is quite easy, but a file-based store would need additional locking to avoid potential file corruption).

Transaction-Style HTTP requests

I recently ran into such problem:
For each user, I need to do the following on server side:
First
(SQL) Insert user's record with a Unique constraint on ID
Then Parallel
(Http) Subscribe user to Service A, get subscription_id_A
(Http) Subscribe user to Service B, get subscription_id_B
Finally
(SQL) Update user's record with both subscription ids
Ideally I want this entire operation to be transactional, eg if any of http requests or sql fails, it would be as if nothing happened. Added: if Request A fails but B succeeds, I would be stuck: Do I cancel the transaction and end up with an untracked subscription or do I commit it and end up with user missing a subcription
Given that this is likely impossible to achieve, what would be the next best thing I can do?
The service A and B does provide APIs to check for existence of subscriptions and to modify, delete a subscription too, but I want to avoid the Check Then Act style. The SQL server has highest isolation level
This is indeed a standard problem. (Often, developers are not aware of this problem and only find out in production.) There is no standard solution. It is impossible to solve in general (see the http://en.wikipedia.org/wiki/Two_Generals%27_Problem - two systems can never agree with 100% certainty on whether they should commit or abort).
Maybe you can perform all the SQL work first. Insert the user but without subscription IDs. You then try to add the subscriptions one by one and add their IDs in separate transactions once you got them.
Install a background job that periodically checks for users that have been created a long time ago but that do not have subscriptions yet. If you find any discrepancies fix them and log this fact.
This periodic cleanup ensures that temporary failures (which will occur due to network glitches, timeouts, redeployments, bugs, ...) are temporary. It also ensures that they are being detected and reported to developers if you like.
This would be an eventually consistent system. The idea is to first transactionally record the target state (the user and the goal to create two subscriptions) and then have a background job try to converge the data to the target state.

Broker Queue - Move Poisoned Messages to Table

Currently I have a queue that stores merge queries which are run once it is read off the queue. This all works well, and currently if there is an error with the merge the queue will disable and I have to manually remove the message (or fix the merge, as it were).
I was wondering whether it was possible to simply move the poisoned message to a table? The queues run important (and different) merges that must continually run to ensure data is updated. It is not beneficial to me for the queue to, say, become disabled over night and gain a huge backlog.
Is there any way for me to simply push the bad message into a table? I have attempted this myself however I wound up having a TRY...CATCH inside a TRANSACTION, which performs a rollback on the error anyway (thus invoking the 5 rollbacks to disable rule). Most solutions online mention only manually removing the message.
Any suggestions? Is this just a bad idea? If so, why?
Thanks.
The disable-after-5-rollbacks can be switched off by setting POISON_MESSAGE_HANDLING status to OFF in the CREATE/ALTER QUEUE statement. You can then use TRY...CATCH to manually deal with transactions that fail.
Like you I don't find this feature very useful, so almost always turn it off in my applications and deal with problem messages in whatever way seems best.

Undo in a transactional database

I do not know how to implement undo property of user-friendly interfaces using a transactional database.
On one hand it is advisable that the user has multilevel (infinite) undoing possibility as it is stated here in the answer. Patterns that may help in this problem are Memento or Command.
However, using a complex database including triggers, ever-growing sequence numbers, and uninvertable procedures it is hard to imagine how an undo action may work at different points than transaction boundaries.
In other words, undo to a point when the transaction committed for the last time is simply a rollback, but how is it possible to go back to different moments?
UPDATE (based on the answers so far): I do not necessarily want that the undo works when the modification is already committed, I would focus on a running application with an open transaction. Whenever the user clicks on save it means a commit, but before save - during the same transaction - the undo should work. I know that using database as a persistent layer is just an implementation detail and the user should not bother with it. But if we think that "The idea of undo in a database and in a GUI are fundamentally different things" and we do not use undo with a database then the infinite undoing is just a buzzword.
I know that "rollback is ... not a user-undo".
So how to implement a client-level undo given the "cascading effects as the result of any change" inside the same transaction?
The idea of undo in a database and in a GUI are fundamentally different things; the GUI is going to be a single user application, with low levels of interaction with other components; a database is a multi-user application where changes can have cascading effects as the result of any change.
The thing to do is allow the user to try and apply the previous state as a new transaction, which may or may not work; or alternatively just don't have undos after a commit (similar to no undos after a save, which is an option in many applications).
Some (all?) DBMSs support savepoints, which allow partial rollbacks:
savepoint s1;
insert into mytable (id) values (1);
savepoint s2;
insert into mytable (id) values (2);
savepoint s3;
insert into mytable (id) values (3);
rollback to s2;
commit;
In the above example, only the first insert would remain, the other two would have been undone.
I don't think it is practical in general to attempt undo after commit, for the reasons you gave* and probably others. If it is essential in some scenario then you will have to build a lot of code to do it, and take into account the effects of triggers etc.
I don't see any problem with ever-increasing sequences though?
We developped such a possibility in our database by keeping track of all transactions applied to the data (not really all, just the ones that are less than 3 months old). The basic idea was to be able to see who did what and when. Each database record, uniquely identified by its GUID, can then be considered as the result of one INSERT, multiple UPDATEs statements, and finally one DELETE statement. As we keep tracks of all these SQL statements, and as INSERTs are global INSERTS (a track of all fields value is kept in the INSERT statement), it is then possible to:
Know who modified which field and when: Paul inserted a new line in the proforma invoice, Bill renegociated the unit price of the item, Pat modified the final ordered quantity, etc)
'undo' all previous transactions with the following rules:
an 'INSERT' undo is a 'DELETE' based on the unique identifier
an 'UPDATE' undo is equivalent to the previous 'UPDATE'
a 'DELETE' undo is equilavent to the first INSERT followed by all updates
As we do not keep tracks of
transactions older than 3 months,
UNDO's are not allways available.
Access to these functionalities are strictly limited to database managers only, as other users are not allowed to do any data update outside of the business rules (example: what would be the meaning of an 'undo' on a Purchase Order line once the Purchase Order has been agreed by the supplier?). To tell you the truth, we use this option very rarely (a few times a year?)
It's nearly the same like William's post (which I actually voted up), but I try to point out a little more detailed, why it is neccessary to implement a user undo (vs. using a database rollback).
It would be helpful to know more about your application, but I think for a user(-friendly) Undo/Redo the database is not the adequate layer to implement the feature.
The user wants to undo the actions he did, independent of if these lead to no/one/more database transactions
The user wants to undo the actions HE did (not anyone else's)
The database from my point of view is implementation detail, a tool, you use as a programmer for storing data. The rollback is a kind of undo that helps YOU in doing so, it's not a user-undo. Using the rollback would mean to involve the user in things he doesn't want to know about and does not understand (and doesn't have to), which is never a good idea.
As William posted, you need an implementation inside the client or on server side as part of a session, which tracks the steps of what you define as user-transaction and is able to undo these. If database transactions were made during those user transactions you need other db transactions to revoke these (if possible). Make sure to give valuable feedback if the undo is not possible, which again means, explain it business-wise not database-wise.
To maintain arbitrary rollback-to-previous semantics you will need to implement logical deletion in your database. This works as follows:
Each record has a 'Deleted' flag, version number and/or 'Current Indicator' flag, depending on how clever you need your reconstruction to be. In addition, you need a per entity key across all versions of that entity so you know which records actually refer to which specific entity. If you need to know the times a version was applicable to, you can also have 'From' and 'To' columns.
When you delete a record, you mark it as 'deleted'. When you change it, you create a new row and update the old row to reflect its obsolescence. With the version number, you can just find the previous number to roll back to.
If you need referential integrity (which you probably do, even if you think you don't) and can cope with the extra I/O you should also have a parent table with the key as a placeholder for the record across all versions.
On Oracle, clustered tables are useful for this; both the parent and version tables can be co-located, minimising the overhead for the I/O. On SQL Server, a covering index (possibly clustered on the entity key) including the key will reduce the extra I/O.