Undo in a transactional database - undo

I do not know how to implement undo property of user-friendly interfaces using a transactional database.
On one hand it is advisable that the user has multilevel (infinite) undoing possibility as it is stated here in the answer. Patterns that may help in this problem are Memento or Command.
However, using a complex database including triggers, ever-growing sequence numbers, and uninvertable procedures it is hard to imagine how an undo action may work at different points than transaction boundaries.
In other words, undo to a point when the transaction committed for the last time is simply a rollback, but how is it possible to go back to different moments?
UPDATE (based on the answers so far): I do not necessarily want that the undo works when the modification is already committed, I would focus on a running application with an open transaction. Whenever the user clicks on save it means a commit, but before save - during the same transaction - the undo should work. I know that using database as a persistent layer is just an implementation detail and the user should not bother with it. But if we think that "The idea of undo in a database and in a GUI are fundamentally different things" and we do not use undo with a database then the infinite undoing is just a buzzword.
I know that "rollback is ... not a user-undo".
So how to implement a client-level undo given the "cascading effects as the result of any change" inside the same transaction?

The idea of undo in a database and in a GUI are fundamentally different things; the GUI is going to be a single user application, with low levels of interaction with other components; a database is a multi-user application where changes can have cascading effects as the result of any change.
The thing to do is allow the user to try and apply the previous state as a new transaction, which may or may not work; or alternatively just don't have undos after a commit (similar to no undos after a save, which is an option in many applications).

Some (all?) DBMSs support savepoints, which allow partial rollbacks:
savepoint s1;
insert into mytable (id) values (1);
savepoint s2;
insert into mytable (id) values (2);
savepoint s3;
insert into mytable (id) values (3);
rollback to s2;
commit;
In the above example, only the first insert would remain, the other two would have been undone.
I don't think it is practical in general to attempt undo after commit, for the reasons you gave* and probably others. If it is essential in some scenario then you will have to build a lot of code to do it, and take into account the effects of triggers etc.
I don't see any problem with ever-increasing sequences though?

We developped such a possibility in our database by keeping track of all transactions applied to the data (not really all, just the ones that are less than 3 months old). The basic idea was to be able to see who did what and when. Each database record, uniquely identified by its GUID, can then be considered as the result of one INSERT, multiple UPDATEs statements, and finally one DELETE statement. As we keep tracks of all these SQL statements, and as INSERTs are global INSERTS (a track of all fields value is kept in the INSERT statement), it is then possible to:
Know who modified which field and when: Paul inserted a new line in the proforma invoice, Bill renegociated the unit price of the item, Pat modified the final ordered quantity, etc)
'undo' all previous transactions with the following rules:
an 'INSERT' undo is a 'DELETE' based on the unique identifier
an 'UPDATE' undo is equivalent to the previous 'UPDATE'
a 'DELETE' undo is equilavent to the first INSERT followed by all updates
As we do not keep tracks of
transactions older than 3 months,
UNDO's are not allways available.
Access to these functionalities are strictly limited to database managers only, as other users are not allowed to do any data update outside of the business rules (example: what would be the meaning of an 'undo' on a Purchase Order line once the Purchase Order has been agreed by the supplier?). To tell you the truth, we use this option very rarely (a few times a year?)

It's nearly the same like William's post (which I actually voted up), but I try to point out a little more detailed, why it is neccessary to implement a user undo (vs. using a database rollback).
It would be helpful to know more about your application, but I think for a user(-friendly) Undo/Redo the database is not the adequate layer to implement the feature.
The user wants to undo the actions he did, independent of if these lead to no/one/more database transactions
The user wants to undo the actions HE did (not anyone else's)
The database from my point of view is implementation detail, a tool, you use as a programmer for storing data. The rollback is a kind of undo that helps YOU in doing so, it's not a user-undo. Using the rollback would mean to involve the user in things he doesn't want to know about and does not understand (and doesn't have to), which is never a good idea.
As William posted, you need an implementation inside the client or on server side as part of a session, which tracks the steps of what you define as user-transaction and is able to undo these. If database transactions were made during those user transactions you need other db transactions to revoke these (if possible). Make sure to give valuable feedback if the undo is not possible, which again means, explain it business-wise not database-wise.

To maintain arbitrary rollback-to-previous semantics you will need to implement logical deletion in your database. This works as follows:
Each record has a 'Deleted' flag, version number and/or 'Current Indicator' flag, depending on how clever you need your reconstruction to be. In addition, you need a per entity key across all versions of that entity so you know which records actually refer to which specific entity. If you need to know the times a version was applicable to, you can also have 'From' and 'To' columns.
When you delete a record, you mark it as 'deleted'. When you change it, you create a new row and update the old row to reflect its obsolescence. With the version number, you can just find the previous number to roll back to.
If you need referential integrity (which you probably do, even if you think you don't) and can cope with the extra I/O you should also have a parent table with the key as a placeholder for the record across all versions.
On Oracle, clustered tables are useful for this; both the parent and version tables can be co-located, minimising the overhead for the I/O. On SQL Server, a covering index (possibly clustered on the entity key) including the key will reduce the extra I/O.

Related

Create trigger upon each table creation in SQL Server 2008 R2

I need to create an Audit table that is going to track the actions (insert, update, delete) of my tables in the database and add new row with date, row id, table name and a few more details, so I will know what action happened and when.
So basically from my understanding I need a trigger for each table which is going to track insert/update/delete and a trigger on the database which is going to track new table creation.
My main problem is understanding how to connect between those things so when a new table is being created a trigger will be created for that table which is going to track the actions and add new rows for the Audit table as needed.
Is it possible to make a DDL trigger for create_table and inside of it another trigger for insert / update / delete ?
What you're hoping for is not possible. And I'd strongly advise that you'd be better off thinking about what you really want to achieve at a business level with auditing. It will yield a much simpler and more practical solution.
First up
...trigger on the database which is going to track new table creation.
I cannot stress enough how terrible this idea is. Who exactly has such unfettered access to you database that they can create tables without going through code-review and QA? Which should of course be on the gated pathway towards production. Once you realise that schema changes should not happen ad-hoc, it's patently obvious that you don't need triggers (which are by their very nature reactive) to do something because the schema changed.
Even if you could write such triggers: it's at a meta-programming level that simply isn't worth the effort of trying to foresee all possible permutations.
Better options include:
Requirements assessment and acceptance: This is new information in the system. What are the audit requirements?
Design review: New table; does it need auditing?
Test design: How to test an audit requirements?
Code Review: You've added a new table. Does it need auditing?
Not to mention features provided by tools such as:
Source Control.
Db deployment utilities (whether home-grown or third party).
Part two
... a trigger will be created for that table which is going to track the actions and add new rows for the Audit table as needed.
I've already pointed out why doing the above automatically is a terrible. Now I'm going a step further to point out that doing the above at all is also a bad idea.
It's a popular approach, and I'm sure to get some flack from people who've nicely compartmentalised their particular flavour of it; swearing blind how much time it "saves" them. (There may even be claims to it being a "business requirement"; which I can assure you is more likely a misstated version of the real requirement.)
There are fundamental problems with this approach:
It's reactive instead of proactive. So it usually lacks context.
You'll struggle to audit attempted changes that get rolled back. (Which can be a nightmare for debugging and usually violates real business audit requirements.)
Interpreting audit will be a nightmare because it's just raw data. The information is lost in the detail.
As columns are added/renamed/deleted your audit data loses cohesion. (This is usually the least of problems though.)
These extra tables that always get updated as part of other updates can wreak havoc on performance.
Usually this style of auditing involves: every time a column is added to the "base" table, it's also added to the "audit" table. (This ultimately makes the "audit" table very much like a poorly architected persistent transaction log.)
Most people following this approach overlook the significance of NULLable columns in the "base" tables.
I can tell you from first hand experience, interpreting such audit trails in any but the simplest of cases is not easy. The amount of time wasted is ridiculous: investigating issues, training others to be able to interpret them correctly, writing utilities to try make working with these audit trails less painful, painstakingly documenting findings (because the information is not immediately apparent in the raw data).
If you have any sense of self-preservation you'll heed my advice.
Make it great
(Sorry, couldn't resist.)
A better approach is to proactively plan for what needs auditing. Push for specific business requirements. Note that different cases may need different auditing techniques:
If user performs action X, record A details about the action for legal traceability.
If user attempts to do Y but it prevented by system rules, record B details to track rule system integrity.
If user fails to log in, record C details for security purposes.
If system is upgraded, record D details for troubleshooting.
If certain system events occur, record E details ...
The important thing is that once you know the real business requirements, you won't be saying: "Uh, let's just track everything. It might be useful." Instead you'll:
Be able to produce a cleaner more appropriate and reliable design for each distinct kind of auditing.
Be able to test that it behaves as required!
Be able to use the audit data more easily whenever it's needed.

Firebird lock table / lock record

Suppose you have one table for a Desktop application and several users.
When a user opens a record, i want to lock this record. I have tried "WITH LOCK" statement. It works fine.
But when a second users want to update the same record, i want to put a message "Sorry, you cannot work on this order because it is locked. Somebody else has opened this record before you". Firebird waits the first user to commit/rollback. I don t want to wait. I want to put an error message. Is there a simple way to ask firebird record lock status ?
Is there a way to lock a full table ? Or to put a semaphore/mutex (like get_lock on mysql)
i have tried reserving on set transaction statement but it does not work.
My wish is to display a message to the user. Not waiting.
Thanks
If you don't want to wait, then configure your transaction to use NO WAIT, or a wait timeout. However controlling business rules like this through database transactions is not advisable as it requires long running transactions which inhibit garbage collection, increases the chain of interesting transactions, and increases the chance of update conflicts.
I'd advise to use different options like:
First to update wins
Change detection (eg by a timestamp or record version counter which is also used as a condition in the update statement), and allowing the user to overwrite or abandon his update (or maybe merge)
Explicit reservation by updating the record (setting the username) in a separate transaction. This might require cleanup or the ability for a user to break the reservation (eg if someone had it open for too long).
Note that Firebird uses multi version concurrency control (MVCC), so explicit locking is not really natural. See also this answer to Locking tables firebird, delphi.
Locking tables using RESERVING should be possible, but I have never used it, so I am not entirely sure how to use it although you probably also need to specify FOR PROTECTED READ (see Interbase 6.0 Embedded SQL Guide, pages 70/71).

Are single statement UPDATES atomic, regardless of the isolation level? (SQL Server 2005)

In an app, Users and Cases have a many-to-many relationship. Users pull their list of Cases often, Users can update a single case at a time (a 1-10 second operation, requiring more than one UPDATE). Under READCOMMITTED, any in-use Case would block all associated Users from pulling their list of Cases. Also, the most recent data is a hotspot for both reads and writes to the Cases table.
I think I want to employ dirty reads to keep the experience snappy. READPAST on Cases won't work for this purpose. NOLOCK will work, but I'd like to be able to show which records are dirty when they are listed.
I don't know of any native way to show which records are dirty, so I'm thinking that for each update or insert to Cases, an INUSE flag will be set. This flag must be cleared by the end of the updating transaction such that under READCOMMITTED, this flag will never appear to be set. Note that this is NOT to replace concurrency management, only to show which records are potentially dirty to the User.
My question is whether this is reliable - if we UPDATE two or more fields (INUSE plus the other fields) in a single statement, is it possible that a concurrent NOLOCK query would read some of the new values but not others? If so, is it possible to guarantee that INUSE be set first?
And if I'm thinking about this all wrong, please enlighten me. My ideal situation would be to, in a manageable way, be able to show the values as they were PRIOR to any related transaction so the data is immediately available and always consistent (but partially out-dated). But I don't think this is available - especially in the more complex actual database.
Thanks!
Restating the problem just to be sure: User A on connection A updates two columns (col1, col2) in MyTable. While this is going on, user B on connection B issues a dirty read, selecting data from that row. You are wondering if user B could get, say, the updated value in col1 AND the old/not updated value in col2. Correct?
I have to say: no way could this happen. As I understand it, updates are indeed an atomic transaction, and if you're writing data to the page (in memory), then the entire row update would have to finish on that set of bytes before anything else (another thread) could get access to them.
But I don't know for sure, and I can't imagine how to set up a test to confirm or deny this. The only answer I'd rely on would have to come from someone who actually had a hand in writing the code, or perhaps a Microsoft technician who has similar access. If you don't get any good answers here, posting the question on the appropriate MSDN forum (link) might get a good answer.
Have you considered using SNAPSHOT isolation level? When used for a query, it requires no locks whatsoever, and it gives precisely the semantics that you're asking for:
show the values as they were PRIOR to any related transaction so the data is immediately available and always consistent (but partially out-dated)

How do I securely delete a row from a database?

For compliance reasons, when I delete a user's personal information from the database in my current project, the relevant rows need to be really, irrecoverably deleted.
The database we are using is postgres 8.x,
Is there anything I can do, beyond running COMPACT/VACUUM regularly?
Thankfully, our backups will be held by others, and they are allowed to keep the deleted information.
"Irrecoverable deletion" is harder than it sounds, and extends beyond your database. For example, are you planning on going back to all previous instances of your database on tape/backup where this row also exists, and deleting it there too?
Consider a regular deletion and the periodic VACUUMing that you mentioned before.
To accomplish the "D" in ACID, relational databases use a transaction log type system for changes to the database. When a delete is made that delete is made to a memory copy of the data (buffer cache) and then written to a transaction log file in synchronous mode. If the database were to crash the transaction log would be replayed to bring the system back to the correct state. So a delete exists in multiple locations where it would have to be removed. Only at some later time is the record "deleted" from the actual data file on disk (and any indexes). This amount of time varies depending on the database.
Do you back up your database? - If Yes, make sure you delete it from Back ups too.
Is that because of security risk? In that case, I'd change the data in the row and then delete the row.
Perhaps I'm off on a tangent, but do you really want to delete users like that? Most identity & access management approaches recommend keeping users around but in a flagged-as-deleted state, in order not to lose auditing ability (what has this user been up to in the previous five years)?
Deleting user information might be needed for integrity compliance reasons, or for nefarious black-hat purposes. In neither case is there a deletion method which guarantees that no traces could be left of the user's existence, as has been noted in other posts.
Perhaps you should elaborate as to why such an irrevocable delete is desirable...?
This is not something that you can do on the software side. Its a hardware issue to really delete it you need to physically destroy the drive.
How about overwriting the record with random characters/dates/numbers etc?

Audit Logging Strategies

I am trying to decide on the best method for audit logging within my application. The main reason for the log is reporting the sequence of events (changes).
I have a hierarchy of Objects, I need to create reports when something changes on any part of that hierarchy, at a latter date.
I think that I have three options:
Have a log for each table and therefore matching the hierarchy of objects then creating a view for the report.
Flatten the hierarchy and de-normalise the table, making reporting easier - simple select statement.
Have one log table and have a record for each change making reporting harder but more flexible to changes.
I am currently leaning towards option 1.
I have to talk to this subject even though it's old.
It is usually a poor idea to have only one audit table as you will create locking problems in the database as everything hits that table. Use separate audit tables for each table.
It is also a poor idea to have the application do the auditing. Audit must be done at the database level or you risk losing some of the information. Data does not change only from applications in most databases; no one is going to change the prices of all their products one at a time from the user interface when you need a 10% increase to all 10,000,000 of them. Auditing should capture all changes not just some of them. This should be done in a trigger in most databases (SQL server 2008 has a built in auditing function). Some of the worst potential possible changes (employees committing fraud or wanting to maliciously destroy data) also are frequently from places other than the application especially if you allow table level access to users (Which you should not do in any financial database or one that contains personal information). Auditing from the application won't catch this. Developers often forget that in protecting their data, outside sources are not the only threat.
An audit log is basically a chronological list of events that occurred, who performed these events, and what the events were.
I think a flat view would be better as it can be easily ordered and queried. So I'm leaning more towards your option #2/#3.
Include things like the transaction type, the time, the user id, a description of what's changed, and other pertinent information related to your product.
You can also add things to your product over time and you won't need to continually modify your audit log module.
If it's for auditing purposes I'd use a true append-only medium rather than a table/tables in the same db.
You suggest it's for change history purposes - in which case I would restructure your application/db to record the actual events in the first place rather than just the current state.
I would go with (2) and (3): create a single table for all Audit entries.
A flat view is good, provided the extra work flattening does not impact performance.
You could look into an AOP framework to help with this. It would allow you to inject logging functionality at the beginning or end of any/all methods. If you go down this road, it might help define what would make sense for storing the log data.