Mark "deleted" instead of physical deletion with Castle ActiveRecord - nhibernate

In my current project, we got a rather unusual request(to me). The client wants all the deletion procedure to mark a flag instead of physically delete the record from the database table. It looks pretty easy at the first glance. I just have change
public void DeleteRecord(Record record)
{
record.DeleteAndFlush();
}
public IList GetAllRecords()
{
Record.FindAll().ToList();
}
To
public void DeleteRecord(Record record)
{
record.Deleted = true;
record.UpdateAndFlush();
}
public IList GetAllRecords()
{
Record.FindAll().Where(x=>x.Deleted==false).ToList();
}
But as after I get a bit of time and think though again. I found that this little change would cause a huge problem in my cascade settings. As I am pretty new to the Active Record business. I wouldn't trust myself to simply change all the CascaeEnum.Delete to CascadeEnum.SaveUpdate. So, I am looking some input here.
1) Is the mark a flag instead of physical requirement a common one?
2) If the answer to question 1 is Yes, then I believe there is something built-in in NHibernate to handle this. Can someone tell me what is the right approach for this kind of problem?
Thanks for your input.

This is known as Soft Deletes and it is very common. There is some debate about the best practice - check out this recent blog post: http://ayende.com/Blog/archive/2009/08/30/avoid-soft-deletes.aspx

This is quite common and called "soft delete". Here's an implementation of this for NHibernate.

This is a relatively common request and it's sometimes implemented for (at least) two reasons:
Auditing and history - Marking a row as deleted rather than physically deleting it means the information is still available if needed (including recovery of information if, for example, you accidentally delete the wrong customer).
Performance - I've seen systems that batch up deletes with this method so they can be performed physically at a quiet time. I'm doubtful this is needed with modern DBMS' but I can see how it might have been so in the past if you wanted to avoid cascaded deletes on severely overloaded systems (of course, you shouldn't be running on such a system in the first place). Oracle 8 introduced a feature like this where you could drop columns in this manner and it would only physically remove them when you asked - you couldn't use the column at all even though the information had not yet been removed fully. Granted removal of a column is more intensive than removal of a row but it may still help.

Related

Should I ignore _NSCoreDataConstraintViolationException?

For some reason I only recently found out about unique constraints for Core Data. It looks way cleaner than the alternative (doing a fetch first, then inserting the missing entities in the designated context) so I decided to refactor all my existing persistence code.
If I got it right, the gist of it is to always insert a new entity, and, as longs as I have a proper merge policy, saving the context will take care of the uniqueness and in a more efficient way. The problem is every time I save a context with the inserted entity I get a NSCoreDataConstraintViolationException, no error though. When I do the fetch to make sure
there is indeed only one instance with a unique field
other changes to this entity were applied
everything seems to be okay, but I’m still concerned about this exception, since I do saves and therefore get it quite often, a few times per second in some cases.
My project is in objective-c and I know exceptions are expensive there so I’m having doubts if I’m missing something.
Here is a sample project with this issue (just a few lines of code, be sure to add an exception breakpoint)
NSMergeByPropertyObjectTrumpMergePolicy and constraints are not useful tools and should also never be used. The correct way to manage uniqueness is with a fetch before the insert as it appears you have already been doing.
Let's starts with why the only correct merge policy is NSErrorMergePolicy. You should only be writing to core data in on synchronous say (performBackgroundTask is not enough you also need an operation queue). If you have two performBackgroundTask running at the same time and they contradict then you will lose data. Merge policy is answer the question of "which data would you like to lose?" the correct answer is "Don't lose my data!" which is NSErrorMergePolicy.
The same issue happens when you have a constraint. Let's says you have an entity with a unique constraint on the phone number. And you try to insert another entity with the same phone number. What would you like to happen? It depends on what exactly the data is. It might be two different people, and the phone number should be made different (perhaps they were lacking area code), or it might be one person and the data should be merged. Or you might have a constraint on an uniqueID and the number should just be incremented. But on the database level it doesn't know. It always just does a merge. It will silently lose data.
You can create a custom NSMergePolicy and inspect NSConstraintConflict to decide what do to. But in practice you'd have to think about every time you edit the database and what each change means, which can be very hard outside of the context of writing a change to the database. In other words, the problem with a constraints and merge policy is that it the run is on the wrong level of your application to effectively deal with the problem.
Using constraints with a merge policy of error is OK, as it is a way to find problems with your app (as long as you are monitoring crashes and fixing them). But you still need to do the fetch before the insert to make sure the error doesn't happen.
If you want to clean up code then just have one place that you create your objects. Something like objectWithId:createIfNeed:inContext: which does the fetch and create.

Model validation logic in the database via constraints. Good idea, bad idea, or not worth it?

It's always rubbed me the wrong way to write code in my model's clean method to validate various constraints on the data when these same constraints aren't also present in the database.
After all, the database already has constraints for some of my data, like NOT NULL.
So, I've been writing RawSQL migrations that ADD CONSTRAINT some_logic in my most recent project that matches whatever logic I have in my clean() method.
It works OK, but it isn't an insignificant task to remember to add these constraints, add tests for these migrations, and update them when my model changes. Also, of course, I'm violating DRY by writing code in two places to do the same thing.
Should I give up this quixotic quest?
This is by no means a comprehensive answer, but at least I wanted to give my opinion.
There has been many frameworks that have pushed the idea of removing the constraints from the database, in order to check them at the application level. The idea seemed nice to me at first (in the early 2000s) but after some years I came to the (very personal) conclusion that this is a bad idea.
I think, to me it boils down to two things:
Data survives much longer than the applications. Whole systems go obsolete, but the data survives many more years. Sometimes the application is replaced, but the database is stil the same one.
The application is not as reliable when it comes to validate data. I'm talking about programming defects here. One version of the app may work well and then the next one has a bug. It may be that one developer moves out of the company, then the new replacement -- who doesn't know as much -- changes the app with disastrous consequences. All that time a simple database constraint (that is usually very cheap to implement) could have enforced data quality.
Yep, I'm a fan of strict database constraint. Nevertheless, this doesn't mean I'm against application validations. These ones can show much nicer error messages.
If writing too much logic in clean() feels dirty, an in-between solution would be to use Django's built-in validators directly on your model fields.
The validation logic isn't saved in the database, but it is tracked in migrations. Like clean() logic, Validators require you to call Model.clean_fields(), but a ModelForm does this automatically.
You can also dig into django-db-constraints. The library might help do what you're looking to do, and the source code might help you roll a solution that fits your needs.

Is adding a bit mask to all tables in a database useful?

A colleague is adding a bit mask to all our database tables. In theory this is so we can track certain properties of each row across the entire system. For example...
Is the row shipped with the system or added by the client once they've started using the system
Has the row been deleted from the table (soft deletes)
Is the row a default value within a set of rows
Is this a good idea? Are there other uses where this approach would be beneficial?
My preference is these properties are obviously important, and having a dedicated column for each property is justified to make what is happening clearer to fellow developers.
Not really, no.
You can only store bits in it, and only so many. So, seems to me like it's asking for a lot of application-level headaches later on keeping track of what each one means and potential abuse later on because "hey they're everywhere". Is every bitmask on every table going to use the same definition for each bit? Will it be different on each table? What happens when you run out of bits? Add another?
There are lots of potential things you could do with it, but it begs the question "why do it that way instead of identifying what we will use those bits for right now and just make them proper columns?" You don't really circumvent the possibility of schema changes this way anyway, so it seems like it's trying to solve a problem that you can't really "solve" and especially not with bitmasks.
Each of the things you mentioned can be (and should be) solved with real columns on the database, and those are far more self-documenting than "bit 5 of the BitMaskOptions field".
A dedicated column is is better, because it's undoubtedly more obvious and less error-prone. SQL Server already stores BIT columns efficiently, so performance is not an issue.
The only argument I could see for a bitmask is not having to change the DB schema every time you add a new flag, but really, if you're adding new flags that often then something is not right.
No, it is not even remotely a good idea IMO. Each column should represent a single concept and value. Bit masks have all kinds of performance and maintenance problems. How do new developers understand what each of the bits mean? How do you prevent someone from accidentally mixing the meaning of the order of the bits?
It would be better to have a many-to-many relationship or separate columns rather than a bit mask. You will be able to index on it, enable referential integrity (depending on approach), easily add new items and change the order of the results to fit different reports and so on.

ORM & Logical Delete

Do any of the available ORMs support using a bit field to represent row removal?
More information. Working in C#. I need to delete this way to support synchronization of remote database changes to a central database. I'm looking for a possible ORM, but am also interested in approaches to the problem. So if anyone knows any ORM in any language/environment that addresses this problem I would be interested in looking at it. Thanks for the questions feel free to ask more if anything is unclear.
This may not apply if you're not using .NET, but the LightSpeed ORM has a built in feature called "soft delete". Basically, when you have a DeletedOn field on your table LightSpeed will insert the time it was deleted. It automatically handles this on normal selects (e.g. where Deleted == null) so that the deleted items are not seen again. You could then write a sync process that detects the deleted state by checking that field.
You can of course instruct the querying engine to include deleted results.
Mindscape LightSpeed ORM
I am making an assumption also that we're talking about the same thing here :-)
I recommend to implement logical delete externally in your application, cause it's not very complex, but it will be more flexible. See this article for details.

SQL Server DB - Deleting Records, or setting an IsDeleted flag? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
What to do when I want to use database constraints but only mark as deleted instead of deleting?
Is it more appropriate to set some kind of "IsDeleted" flag in a heavily used database to simply mark records for deletion (and then delete them later), or should they be deleted directly?
I like the IsDeleted flag approach because it gives an easy option to restore data in case something went terribly wrong, and I could even provide some kind of "Undo" function to the user. The data I'm dealing with is fairly important.
I don't like IsDeleted because it really messes with data retrieval queries, as I'd have to filter by the state of the IsDeleted flag in addition to the regular query. Queries use no more than one index so I'd also assume that this would slow down things tremendously unless I create composite indexes.
So, what is more appropriate? Is there a better "middle way" to get the benefits of both, and what are you using & why?
As a rule of thumb I never delete any data. The type of business I am in there are always questions suchas 'Of the customers that cancelled how many of them had a widget of size 4' If I had deleted the customer how could I get it. Or more likely if had deleted a widget of size 4 from the widget table this would cause a problem with referential integrity. An 'Active' bit flag seems to work for me and with indexing there is no big performance hit.
I would be driven by business requirements. If the client expects you to restore deleted data instantly and undeleting data is part of business logic and/or use cases then isDeleted flag makes sense.
Otherwise, by leaving deleted data in the database, you address problems that are more suitable to be addressed by database backups and maintenance procedures.
The mechanism for doing this has been discussed several times before.
Question 771197
Question 68323
My personal favourite, a deleted_at column is documented in Question 771197
The answer is, it depends on the scenario.
Does it requires undo-delete?
what is the frequency of user doing that?
how many records will it result into over time?
If it is required, you can create tables with identical structure with name _DELETED suffix.
Customers__DELETED.
I think you should also consider what happens if there is a conflict when you Undelete the record and some other user has entered the record with similar content.
I have learnt that deleting the data rarely makes sense, as there's always some reporting that needs the data or more often some one deletes by mistake and needs it back. Personally I move all "deleted" items to an archive version of the database. This is then backed up separetly and reports can use it. The main DB Size is kept lean and restoring the data is not too much of an issue.
But like others have said, it depends on your business requirements and scale / size of DB. An archived / deleted field may be enough.