Asynchronous SQL Operations - sql

I've got a problem I'm not sure how best to solve.
I have an application which updates a database in response to ad hoc requests. One request in particular is quite common. The request is an update that by itself is quite simple, but has some complex preconditions.
For this request the business layer
first requests a set of data from the
data layer.
The business logic layer evaluated
the data from the database and
parameters from the request, from
this the action to be performed is
determined, and the request's
response message(s) are created.
The business layer now executes the
actual update command that is the
purpose of the request.
This last step is the problem, this command is dependent on the state of the database, which might have changed since the business logic ran. Locking down the data read in this operation across several round-trips to the database doesn't seem like a good idea either. Is there a 'best-practice' way to accomplish something like this?
Thanks!

In simple terms when you execute the update command you are concerned that the database may have changed?
Then call stored procedures that are written defensively and will only update if the data is in an acceptable state when they are called (by checking the foreign key references, data integrity etc.).
Let me know if I can help in mocking up some aspect of this.

You could store the original state of the modified business objects and compare the original objects to their database counterparts to check if anything has been changed.
If changes have been made, then you either have the choice to merge the objects based on the original, modified and stored (database) objects, or to cancel the update and tell the client the update has failed.

this is kind of difficult, because there are not many specifics in the question, so I'll just give a simple example that you may be able to apply to your situation.
Load all the data as well as the last changed date (yyyy-mm-dd hh:mi:ss.mmm)
SELECT AAA,BBB,LastChgDate FROM YourTable WHERE ID=xxxxxx
do your business logic
save the data
UPDATE YourTable SET AAA=aaaaa,BBB=bbbbb WHERE ID=xxxxxx AND LastChgDate=zzzzzz
If the row count !=1 then error someone else has changed the data, otherwise the data is saved.

Use a proper transaction isolation mode and do everything in a singe database transaction (i.e. start transaction in step 1. and commit after step 3.).
Your question is a little bit vague, but my guess you either need SNAPSHOT or READ COMMITTED mode.

Related

How to handle authentication count to be compliant with CQRS pattern?

I need to count improper authentication attempts to some accounts in my application. If some certain value is reached i need to block the account. In my understanding of CQS/CQRS, an authentication request is kind of query. Queries should not modify any data on the server side in my opinion. To solve such a problem i should update some attribute in the database while handling the query and this would be violation of CQRS principles i guess. What should i do?? Is an authentication a command in my case (remember commands, cannot return any value so how can i know that authentication is correct for example)?? Maybe i should publish some event after unsuccessful authentication?? How can i solve such a problem? Thanks for any answer.
Queries should not modify domain state.
I'm not aware of a general prohibition on a command returning data (strictly interpreted, that constraint would preclude any sort of acknowledgement of a command).
The "S" in CQRS is often interpreted too strictly in my experience: any write model (at least any write model which retains the right to decide that a command is inapplicable based on the results of previous commands) carries with it a read model (if event sourcing, that read model is typically going to be a snapshot derived from the write model, but the principle holds). If there's a query which can be effectively answered from that read model, there's not necessarily any gain from having a different read model handle the query (the best reason to do that separation in that scenario is if there'd be enough query volume that handling "query commands" degrades write performance).
So I'd advise modeling authentication as a command (it almost certainly changes the state of the system) and returning whatever auth status info is relevant/available from an auth as a response to the command.

Fix inconsistent state right away or lazily when data is requested

Our users go through several steps of workflow - the further they go the more objects we create. We also allow users to go back to Step#1 and change one of the existing objects. Which may cause inconsistencies so we must update/delete some of the objects at Step#2. I see 2 options:
Update/delete objects from Step#2 right away. This leads to:
Operation that's supposed to be a simple PATCH of an entity field becomes complicated. And it's a shared object between multiple workflows - so we'll have to add if-statements and do different things depending on the workflow.
Circular dependencies. Operations on Step#1 have to know about objects/operations on Step#2.
On each request in Step#1 we'd have to load data for Step#2 in order to determine whether Step#2 really needs to be updated. Which slows down operations on Step#1. So to change 1 record in DB we'll have to load hundreds (or even thousands) records for Step#2.
Many actions on Step#1 may need fixing state at Step#2. So we have to ensure we don't forget anything today and in the future.
Fix Step#2 lazily - when user goes there (our current approach). Step#2 will recognize that objects are inconsistent and fix them. Which leads to just 1 place where we need to care, but:
Until user opens Step#2 - DB will contain inconsistent objects. This hasn't resulted in any problems so far. But I can imagine it may complicate future SQL migrations.
We update DB state on GET request. This one doesn't seem like that big of a deal since GET stays idempotent anyway. But still it feels awkward.
Anyone knows better approaches? Or maybe improvements to these two?
Update
I haven't found perfect solution, but eventually we implemented an improved version of #1. When updating state on Step#1 we also set a flag "need to rebuild Step#2", when UI opens Step#2 it first checks this flag and issues a PUT to rebuild the state, and only then it GETs Step#2.
This still means that DB state is inconsistent for some period of time. But at least we'll know this for sure from the flag in DB. And if needed - we could write migrations taking this flag into account. This also allows (if needed in the future) to create an async job to fix the state.
I think it is more flexible to separate the state and the context where the objects are stored. Any creation of a new object at any step is accompanied by the preservation of the invariant and consistency of context.
There are separate rules of states - these are rules for transition from one to another and available objects for creation and separate rules for the context, rules for its consistency, which is ensured every time it changes.
What about dirty data asynchronous cleanup?
Whenever user goes back to Step #1 and changes something, mark all related data as "dirty" (e.g. add links to it in "DirtyData" table) and be done for now.
Have a DataCleanup worker (e.g. separate thread or smth) that constantly looks for data to be cleaned up.
Before editing data for Step #2, check if the data is not dirty.
Depending on your logic, 3) might result in user error (e.g. user would need to repeat Step #2). If DataCleanup worker has enough resources (i.e. it processes DirtyData table almost instantaneously), that should happen only on very rare occasions. If that is not OK, you could opt for checking for dirty data on each fetch, but that could be expensive.
It sounds like you're familiar with the HTTP spec regarding GET requests, but for future readers:
Why shouldn't a GET request change data on the server?
Why is using a HTTP GET to update state on the server in a RESTful call incorrect?
For the other bullet under 2, we probably don't need a specification to agree that persisting valid data is preferable to persisting invalid data.
So what can we do for the bullets under 1 to avoid complex branching logic in a particular step and also circular dependencies? My suggestion is an event-driven design. When step #2 changes it should fire a change event. In this scenario, step #2 has no knowledge of the concrete listener(s) who may receive its events, so it remains decoupled from any complex handling logic.
There's probably no way to guarantee you don't forget anything in the future; but if every step in the workflow is defined as a listener, it forces you to consider change events to some extent every time you implement a new step.
One side note on granularity: if a step has many changes, it can batch up its events rather than fire each one individually. You can adjust the size for efficiency.
In summary, I would strongly consider the Observer design pattern.

Keeping multi-user state across DB sessions

The situation
Suppose we have a web application connected to a (Postgre)SQL database whose task can be summarized as:
A SELECT operation to visualize the data.
An UPDATE operation that stores modifications based on the visualized data.
Simple, but... the data involved isn't user specific, so it might potentially be changed during the process by other users. The editing task may take long time (perhaps more than an hour), meaning that the probability of these collisions happening isn't low: it makes sense to implement a robust solution to the problem.
The approach
The idea would be that, once the user tries to submit the changes (i.e. firing the UPDATE operation), a number of database checks will be triggered to ensure that the involved data didn't change in the meantime.
Assuming we have timestamped every change on the data, it would be as easy as keeping the access time when the data was SELECTed and ensuring that no new changes were done after that time on the involved data.
The problem
We could easily just keep that access time in the frontend application while the user performs the editing, and later provide it as an argument to the trigger function when performing the UPDATE, but that's not desirable for security reasons. The database should store the user's access time.
An intuitive solution could be a TEMPORARY TABLE associated to the database session. But, again, the user might take a long time doing the task, so capturing a connection from the pool and keeping it idle for such a long time doesn't seem like a good option either. The SELECT and the UPDATE operations will be performed under different sessions.
The question
Is there any paradigm or canonical way to address and solve this problem efficiently?
This problem is known as the "lost update" problem.
There are several solutions that depend on whether a connection pool is used or not and on the transaction isolation level used:
pessimistic locking with SELECT ... FOR UPDATE without connection pool
optimistic locking with timestamp column if connection pool is used.

How to efficiently trigger system command with SQL query or table change?

I have data conversion and caching service running as self-hosted WCF service.
Now it uses database polling in constant short intervals to update its data.
I think it's unnecessary. The data can be changed only if one of the tables is changed, and when the data is changed depends on system users actions.
There is no problem in setting a trigger for specific tables, however I would need an action outside SQL-Server to update my cache. My WCF service could perform update when receiving specific URI via HTTP. So all I need is a command in table trigger which would send a request. Is it even possible?
I think about a hack I used back in the days with HTTP requests. I halted HTTP request response at server until data packet from somewhere else arrived. There was no delay between polling requests. I achieved fully asynchronous, "real-time" updates.
Maybe this approach is possible to apply with SQL? I think about a query which blocks termination until receives a signal. Well, it eventually times out, but it's good enough to try. Then - how to signal and wait in SQL? By locking and unlocking shared resource, like cursor or dummy table?
Any other options?
I need the cache update done at lowest possible frequency (because it's pretty expensive, so once per minute is great), but I need immediate update when the data is changed.
To answer your question, have you looked at xp_cmdshell?
https://msdn.microsoft.com/en-us/library/ms175046.aspx
However, the security/performance implications of such a decision could be non-trivial depending on your use case.

How to use database triggers in a real world project?

I've learned a lot about triggers and active databases in the last weaks, but I've some questions about real world examples for these.
At work we use the Entity Framework with ASP.Net and an MSSQL Server. We just use the auto generated constrains and no triggers.
When I heared about triggers I asked myself the following questions:
Which tasks can be performed by triggers?
e.g.: Generation of reporting data: currently the data for the reports is created in vb, but I think a trigger could handle this as well. The creation in vb takes a lot of time and the user should not need to wait for it, because it's not necessary for his work.
Is this an example for a perfect task for a trigger?
How does OR-Mapper handle trigger manipulated data?
e.g.: Do OR-Mapper recognize if a trigger manipulated data? The entity framework seems to cache a lot of data, so I'm not sure if it reads the updated data if a trigger manipulates the data, after the insert/update/delete from the framework is processed.
How much constraint handling should be within the database?
e.g.: Sometimes constrains in the database seem much easier and faster than in the layer above (vb.net,...), but how to throw exceptions to the upper layer that could be handled by the OR-Mapper?
Is there a good solution for handeling SQL exceptions (from triggers) in any OR-Mapper?
Thanks in advance
When you hear about a new tool or feture it doesn't mean you have to use it everywhere. You should think about design of your application.
Triggers are used a lot when the logic is in the database but if you build ORM layer on top of your database you want logic in the business layer using your ORM. It doesn't mean you should not use triggers. It means you should use them with ORM in the same way as stored procedures or database functions - only when it makes sense or when it improves performance. If you pass a lot of logic to database you can throw away ORM and perhaps whole your business layer and use two layered architecture where UI will talk directly to database which will do everything you need - such architecture is considered "old".
When using ORM trigger can be helpful for some DB generated data like audit columns or custom sequences of primary key values.
Current ORM mostly don't like triggers - they can only react to changes to currently processed record so for example if you save Order record and your update trigger will modify all ordered items there is no automatic way to let ORM know about that - you must reload data manually. In EF all data modified or generated in the database must be set with StoreGeneratedPattern.Identity or StoreGeneratedPattern.Computed - EF fully follows pattern where logic is either in the database or in the application. Once you define that value is assigned in the database you cannot change it in the application (it will not persist).
Your application logic should be responsible for data validation and call persistence only if validation passes. You should avoid unnecessary transactions and roundtrips to database when you can know upfront that transaction will fail.
I use triggers for two main purposes: auditing and updating modification/insertion times. When auditing, the triggers push data to related audit tables. This doesn't affect the ORM in any way as those tables are not typically mapped in the main data context (there's a separate auditing data context used when needed to look at audit data).
When recording/modifying insert/modification times, I typically mark those properties in the model as [DatabaseGenerated( DatabaseGenerationOptions.Computed )] This prevents any values set on in the datalayer from being persisted back to the DB and allows the trigger to enforce setting the DateTime fields properly.
It's not a hard and fast rule that I manage auditing and these dates in this way. Sometimes I need more auditing information than is available in the database itself and handle auditing in the data layer instead. Sometimes I want to force the application to update dates/times (since they may need to be the same over several rows/tables updated at the same time). In those cases I might make the field nullable, but [Required] in the model to force a date/time to be set before the model can be persisted.
The old Infomodeler/Visiomodeler ORM (not what you think - it was Object Role Modeling) provided an alternative when generating the physical model. It would provide all the referential integrity with triggers. For two reasons:
Some dbmses (notably Sybase/SQL Server) didn't have declarative RI yet, and
It could provide much more finely grained integrity - e.g. "no more than two children" or "sons or daughters but not both" or "mandatory son or daughter but not both".
So trigger logic related to the model in the same way that any RI constraint does. In SQL Server it handled violations with RAISERROR.
An conceptual issue with triggers is that they are essentially context-free - they always fire regardless of context (at least without great pain, and you might better include their logic with the rest of the context-specific logic.) So global domain constraints are the only place I find them useful - which I guess is another general way to identify "referential integrity".
Triggers are used to maintain integrity and consistency of data (by using constraints), help the database designer ensure certain actions are completed and create database change logs.
For example, given numeric input, if you want the value to be constrained to say, less then 100, you could write a trigger that fires for every row on update or insert, and raise an application error if the value of that column does not meet that contraint.
Suppose you want to log historical changes to a table. You could create a Trigger that fires AFTER each INSERT, UPDATE, and DELETE, which also inserts the data into a logging table. If you need to execute custom custom logic, then Triggers may appeal to you.