Fix inconsistent state right away or lazily when data is requested - oop

Our users go through several steps of workflow - the further they go the more objects we create. We also allow users to go back to Step#1 and change one of the existing objects. Which may cause inconsistencies so we must update/delete some of the objects at Step#2. I see 2 options:
Update/delete objects from Step#2 right away. This leads to:
Operation that's supposed to be a simple PATCH of an entity field becomes complicated. And it's a shared object between multiple workflows - so we'll have to add if-statements and do different things depending on the workflow.
Circular dependencies. Operations on Step#1 have to know about objects/operations on Step#2.
On each request in Step#1 we'd have to load data for Step#2 in order to determine whether Step#2 really needs to be updated. Which slows down operations on Step#1. So to change 1 record in DB we'll have to load hundreds (or even thousands) records for Step#2.
Many actions on Step#1 may need fixing state at Step#2. So we have to ensure we don't forget anything today and in the future.
Fix Step#2 lazily - when user goes there (our current approach). Step#2 will recognize that objects are inconsistent and fix them. Which leads to just 1 place where we need to care, but:
Until user opens Step#2 - DB will contain inconsistent objects. This hasn't resulted in any problems so far. But I can imagine it may complicate future SQL migrations.
We update DB state on GET request. This one doesn't seem like that big of a deal since GET stays idempotent anyway. But still it feels awkward.
Anyone knows better approaches? Or maybe improvements to these two?
Update
I haven't found perfect solution, but eventually we implemented an improved version of #1. When updating state on Step#1 we also set a flag "need to rebuild Step#2", when UI opens Step#2 it first checks this flag and issues a PUT to rebuild the state, and only then it GETs Step#2.
This still means that DB state is inconsistent for some period of time. But at least we'll know this for sure from the flag in DB. And if needed - we could write migrations taking this flag into account. This also allows (if needed in the future) to create an async job to fix the state.

I think it is more flexible to separate the state and the context where the objects are stored. Any creation of a new object at any step is accompanied by the preservation of the invariant and consistency of context.
There are separate rules of states - these are rules for transition from one to another and available objects for creation and separate rules for the context, rules for its consistency, which is ensured every time it changes.

What about dirty data asynchronous cleanup?
Whenever user goes back to Step #1 and changes something, mark all related data as "dirty" (e.g. add links to it in "DirtyData" table) and be done for now.
Have a DataCleanup worker (e.g. separate thread or smth) that constantly looks for data to be cleaned up.
Before editing data for Step #2, check if the data is not dirty.
Depending on your logic, 3) might result in user error (e.g. user would need to repeat Step #2). If DataCleanup worker has enough resources (i.e. it processes DirtyData table almost instantaneously), that should happen only on very rare occasions. If that is not OK, you could opt for checking for dirty data on each fetch, but that could be expensive.

It sounds like you're familiar with the HTTP spec regarding GET requests, but for future readers:
Why shouldn't a GET request change data on the server?
Why is using a HTTP GET to update state on the server in a RESTful call incorrect?
For the other bullet under 2, we probably don't need a specification to agree that persisting valid data is preferable to persisting invalid data.
So what can we do for the bullets under 1 to avoid complex branching logic in a particular step and also circular dependencies? My suggestion is an event-driven design. When step #2 changes it should fire a change event. In this scenario, step #2 has no knowledge of the concrete listener(s) who may receive its events, so it remains decoupled from any complex handling logic.
There's probably no way to guarantee you don't forget anything in the future; but if every step in the workflow is defined as a listener, it forces you to consider change events to some extent every time you implement a new step.
One side note on granularity: if a step has many changes, it can batch up its events rather than fire each one individually. You can adjust the size for efficiency.
In summary, I would strongly consider the Observer design pattern.

Related

Keeping multi-user state across DB sessions

The situation
Suppose we have a web application connected to a (Postgre)SQL database whose task can be summarized as:
A SELECT operation to visualize the data.
An UPDATE operation that stores modifications based on the visualized data.
Simple, but... the data involved isn't user specific, so it might potentially be changed during the process by other users. The editing task may take long time (perhaps more than an hour), meaning that the probability of these collisions happening isn't low: it makes sense to implement a robust solution to the problem.
The approach
The idea would be that, once the user tries to submit the changes (i.e. firing the UPDATE operation), a number of database checks will be triggered to ensure that the involved data didn't change in the meantime.
Assuming we have timestamped every change on the data, it would be as easy as keeping the access time when the data was SELECTed and ensuring that no new changes were done after that time on the involved data.
The problem
We could easily just keep that access time in the frontend application while the user performs the editing, and later provide it as an argument to the trigger function when performing the UPDATE, but that's not desirable for security reasons. The database should store the user's access time.
An intuitive solution could be a TEMPORARY TABLE associated to the database session. But, again, the user might take a long time doing the task, so capturing a connection from the pool and keeping it idle for such a long time doesn't seem like a good option either. The SELECT and the UPDATE operations will be performed under different sessions.
The question
Is there any paradigm or canonical way to address and solve this problem efficiently?
This problem is known as the "lost update" problem.
There are several solutions that depend on whether a connection pool is used or not and on the transaction isolation level used:
pessimistic locking with SELECT ... FOR UPDATE without connection pool
optimistic locking with timestamp column if connection pool is used.

Multiple application on network with same SQL database

I will have multiple computers on the same network with the same C# application running, connecting to a SQL database.
I am wondering if I need to use the service broker to ensure that if I update record A in table B on Machine 1, the change is pushed to Machine 2. I have seen applications that need to use messaging servers to accomplish this before but I was wondering why this is necessary, surely if they connect to the same database, any changes from one machine will be reflected on the other?
Thanks :)
This is mostly about consistency and latency.
If your applications always perform atomic operations on the database, and they always read whatever they need with no caching, everything will be consistent.
In practice, this is seldom the case. There's plenty of hidden opportunities for caching, like when you have an edit form - it has the values the entity had before you started the edit process, but what if someone modified those in the mean time? You'd just rewrite their changes with your data.
Solving this is a bunch of architectural decisions. Different scenarios require different approaches.
Once data is committed in the database, everyone reading it will see the same thing - but only if they actually get around to reading it, and the two reads aren't separated by another commit.
Update notifications are mostly concerned with invalidating caches, and perhaps some push-style processing (e.g. IM client might show you a popup saying you got a new message). However, SQL Server notifications are not reliable - there is no guarantee that you'll get the notification, and even less so that you'll get it in time. This means that to ensure consistency, you must not depend on the cached data, and you have to force an invalidation once in a while anyway, even if you didn't get a change notification.
Remember, even if you're actually using a database that's close enough to ACID, it's usually not the default setting (for performance and availability, mostly). You need to understand what kind of guarantees you're getting, and how to write code to handle this. Even the most perfect ACID database isn't going to help your consistency if your application introduces those inconsistencies :)

Core data : how to undo operations once managed objects are saved with context

I am trying to implement downloading of bulk data from several tables on the server.
In my case there are 16 tables. For all these tables I will be firing 10 requests to the server. This means I have done a bit of logical groupings for related tables, but it is like all tables are inter-related with each other through one or the other relationship.
I need to consider three cases while doing downloading:
Saving data to each table at local.
Managing relationships between inserted objects.
Handling situation when one of the requests fails during download, say 8th request failed.
I will be following this approach for each response:
Inserting data in managed object context.
Managing relationships by firing NSPredicate and associating the related objects.
Saving the context.
In case of a response failure, I have two options:
Next time continue from the failed response.
Revert all saved data to its previous state.
1st approach may lead to some data inconsistency, so I am going with 2nd approach.
I know that if a managed object context is not saved, we can revert the changes, but
is it possible to revert the changes, if the managed object context is
saved?
I require some useful answers from the community.
Please suggest.
Is it possible to revert the changes, if the managed object context is saved?
After saving? Maybe, but it could be tricky. If you set up a separate managed object context for your network operations, and give it an NSUndoManager, you could later on tell the undo manager to roll everything back to the previous state.
It would be simpler to just not save changes until you're finished, though. Using an undo manager doesn't really help much-- the memory needed to store up all the undo actions will at least match the memory use from keeping all of the unsaved changes around until you're finished. If you're working on a separate managed object context (whether a child context or a completely separate context), handling the error case is as simple as letting the MOC get deallocated without saving changes first.

Good ways to decouple GUIs from SOAP/WS-API update/write calls?

Let's assume we have some configuration GUI that in its current form uses direct DB transactions to submit new configurations for more than one configurable component in a consistent manner.
Now let's move the data (DB) stuff behind some SOAP/WS API. The GUI has no direct DB access anymore. The transactional behaviour must remain, but the API should NOT be designed to explcitly accommodate the GUI form submissions. In fact, I don't even know how the new GUI will work or how the user input will be structured. Therefore I need to provide something like WS-AtomicTransaction on the API server side. However, there are (at least) two caveats:
The GUI is written in PHP: I don't think there is any WS-Transaction support in PHP available.
I don't want to keep DB transactions open on the server side while waiting for additional client requests.
Solutions I can think of:
using Camel's aggregation. However, that would make things more complicated in at least two ways:
You cannot use DB row ids of newly inserted rows in the subsequent calls inside the same transaction. You need to use some sort of symbolic back-referencing because there would be no communication between client and server while processing the aggregated messages.
call replies would not be immediate (or the immediate and separate reply to each single call would only be some sort of a stub, ie. not containing any useful information beyond "your message has been attached to TX xyz" -- if that's at all possible in the Camel aggregation case).
the two disadvantages of the previous solution make me think of request batches where possibly the WS standards provide means for referencing call results in subsequent calls inside the batch transaction. Is there any such thing already available? Maybe even as a PHP client?
trying to eliminate lock contention in the database by carefully using row-level locks etc. However, when inserting new elements, my guess is that usually pages and index pages need to be locked by the DB.
maybe some server-side persistence layer using optimistic locking? But again, that would not return any DB IDs back to the client before the final commit if DB writes would be postponed until the commit (don't know if that's possible at all).
What do YOU think?
Transactions are a powerful tool and we easily get into a thinking pattern in which we see every problem as a nail we hit with this big hammer. I can relate to your confusion because I've experienced it myself. Unfortunately I have no better advice for you than to try not think in terms of transactions but of atomic API calls.
When I think in terms of transactions, my thought pattern usually goes like this:
start transaction
read (repeat as required)
update (repeat as required)
commit/roll back
It takes some time to realize that we overuse this pattern. Actual conflicts are rare and there are many other ways of dealing with them. Here is a commonly used one in APIs
read and send data to client (atomic API call)
update data (on the client)
send original + updates back to the server (atomic API call)
start transaction (on server)
read
compare with original from client
if not same, return error (client should retry)
if same, update
commit
The last six points are part of the implementation of the API call.
Ferenc Mihaly
http://theamiableapi.com

Asynchronous SQL Operations

I've got a problem I'm not sure how best to solve.
I have an application which updates a database in response to ad hoc requests. One request in particular is quite common. The request is an update that by itself is quite simple, but has some complex preconditions.
For this request the business layer
first requests a set of data from the
data layer.
The business logic layer evaluated
the data from the database and
parameters from the request, from
this the action to be performed is
determined, and the request's
response message(s) are created.
The business layer now executes the
actual update command that is the
purpose of the request.
This last step is the problem, this command is dependent on the state of the database, which might have changed since the business logic ran. Locking down the data read in this operation across several round-trips to the database doesn't seem like a good idea either. Is there a 'best-practice' way to accomplish something like this?
Thanks!
In simple terms when you execute the update command you are concerned that the database may have changed?
Then call stored procedures that are written defensively and will only update if the data is in an acceptable state when they are called (by checking the foreign key references, data integrity etc.).
Let me know if I can help in mocking up some aspect of this.
You could store the original state of the modified business objects and compare the original objects to their database counterparts to check if anything has been changed.
If changes have been made, then you either have the choice to merge the objects based on the original, modified and stored (database) objects, or to cancel the update and tell the client the update has failed.
this is kind of difficult, because there are not many specifics in the question, so I'll just give a simple example that you may be able to apply to your situation.
Load all the data as well as the last changed date (yyyy-mm-dd hh:mi:ss.mmm)
SELECT AAA,BBB,LastChgDate FROM YourTable WHERE ID=xxxxxx
do your business logic
save the data
UPDATE YourTable SET AAA=aaaaa,BBB=bbbbb WHERE ID=xxxxxx AND LastChgDate=zzzzzz
If the row count !=1 then error someone else has changed the data, otherwise the data is saved.
Use a proper transaction isolation mode and do everything in a singe database transaction (i.e. start transaction in step 1. and commit after step 3.).
Your question is a little bit vague, but my guess you either need SNAPSHOT or READ COMMITTED mode.