SQL Schema design question - delete flags - sql

in our database schema, we like to use delete flags. When a record is deleted, we then update that field, rather than run a delete statement. The rest of our queries then check for the delete flag when returning data.
Here is the problem:
The delete flag is a date, with a default value of NULL. This is convenient because when a record is deleted we can easily see the date that it was deleted on.
However, to enforce unique constraints properly, we need to include the delete flag in the unique constraint. The problem is, on MS SQL , it behaves in accordance to what we want (for this design), but in postgresql, if any field in a multi column unique constraint is NULL, it allows the field. This behavior fits the SQL standard, but it makes our design broken.
The options we are considering are:
make a default value for the deleted field to be some hardcoded date
add a bit flag for deleted, then each table would have 2 delete related fields - date_deleted and is_deleted (for example)
change the date_deleted to is_deleted (bit field)
I suspect option 1 is a performance hit, each query would have to check for the hardcoded date, rather than just checking for IsNUll. Plus it feels wrong.
Option 2, also, feels wrong - 2 fields for "deleted" is non-dry.
Option 3, we lose the "date" information. There is a modified field, which would, in theory reflect the date deleted, but only assuming the last update to the row was the update to the delete bit.
So, Any suggestions? What have you done in the past to deal with "delete flags" ?
Update
Thanks to everyone for the super quick, and thoughtful responses.
We ended up going with a simple boolean field and a modified date field (with a trigger). I just noticed the partial index suggestion, and that looks like the perfect solution for this problem (but I havent actually tried it)

If just retaining the deleted records is important to you, have you considered just moving them to a history table?
This could easily be achieved with a trigger.
Application logic doesn't need to account for this deleted flag.
Your tables would stay lean and mean when selecting from it.
It would solve your problem with unique indexes.

Option 3, we lose the "date"
information. There is a modified
field, which would, in theory reflect
the date deleted, but only assuming
the last update to the row was the
update to the delete bit.
Is there a business reason that the record would be modified after it was deleted? If not, are you worrying about something that's not actually an issue? =)
In the system I currently work on we have the following "metadata" columns _Deleted, _CreatedStamp, _UpdatedStamp, _UpdatedUserId, _CreatedUserId ... quite a bit, but it's important for this system to carry that much data. I'd suggest going down the road of having a separate flag for Deleted to Modified Date / Deleted Date. "Diskspace is cheap", and having two fields to represent a deleted record isn't world-ending, if that's what you have to do for the RDBMS you're using.

What about triggers? When a record is deleted, a post-update trigger copies the row into an archive table which has the same structure plus any additional columns, and an additional column of the date/time and perhaps the user that deleted it.
That way your "live" table only has records that are actually live, so is better performance-wise, and your application doesn't have to worry about whether a record has been deleted or not.

One of my favourite solutions is an is_deleted bit flag, and a last_modified date field.
The last_modified field is updated automatically every time the row is modified (using any technique supported by your DBMS.) If the is_deleted bit flag is TRUE, then the last_modified value implies the time when the row was deleted.
You will then be able to set the default value of last_modified to GETDATE(). No more NULL values, and this should work with your unique constraints.

Just create a conditional unique constraint:
CREATE UNIQUE INDEX i_bla ON yourtable (colname) WHERE date_deleted IS NULL;

Would creating a multi column unique index that included the deleted date achieve the same constraint limit you need?
http://www.postgresql.org/docs/current/interactive/indexes-unique.html
Alternately, can you store a non-NULL and check that the deleted date to the minimum sql date = 0 or "1/1/1753" instead of NULL for undeleted records.

Is it possible to exclude the deleted date field from your unique index? In what way does this field contribute to the uniqueness of each record, especially if the field is usually null?

Related

Informix select trigger to update column

Is it possible to increase the value of a number in a column with a trigger every time it gets selected? We have special tables where we store the new id and when we update it in the app, it tends to get conflicts before the update happens, even when it all takes less than a second. So I was wondering if it is not possible to set database to increase value after every select on that column? Do not ask me why we do not use autoincrement for ids because I do not know.
Informix provides the SERIAL and BIGSERIAL types (and also SERIAL8, but don't use that) which provide autoincrement support. It also provides SEQUENCES with more sophisticated autoincrements. You should aim to use one of those.
Trying to use a SELECT trigger to update the table being selected from is, at best, fraught with problems about transactions and the like (problems which both the types and sequences carefully avoid).
If your design team needs help making effective use of these, ask a new question outlining what you want to achieve.
Normally, the correct way to proceed is to make the ID column in each table that defines 'something' (the Orders table, the Customer table, …) into a SERIAL column and either not insert a value into the ID column or insert 0 into it. The generated value can be retrieved and used when creating auxilliary information — order items, etc.
Note that you could think about using:
CREATE TABLE xyz_sequence
(
xyz SERIAL NOT NULL PRIMARY KEY
);
and using:
INSERT INTO xyz_sequence VALUES(0);
and then retrieving the inserted value — in Informix ESQL/C, you'd use sqlca.sqlerrd[1], in other languages, other techniques. You can also delete the newly inserted record, or even all the records in the table. You can afford to ignore errors from the DELETE statement; sooner or later, the rows will be deleted. The next value inserted will continue where the prior ones left off.
In a stored procedure, you'd use DBINFO('sqlca.sqlerrd1') to get the inserted value. You'd use DBINFO('bigserial') to get the value if you use a BIGSERIAL type.
I found out possible answer in this question update with return value instead of doing it with select it seems better to return value directly from update as update use locks it should be more safer even when you use multithreading application. But these are just my assumptions. Hopefully it will help someone.

SQL Server - store datetime and decimal

I'm developing a change history table where I'll basically record the old and new value for changes in fields of two types: decimal and datetime.
To make it simple, I was thinking about create a string field and convert the values to string before store in the table.
My problem is that later I'll have to create a field in the report to show the difference between the changes (like if the date as changed from 01/20/2015 to 01/27/2015 the difference will be 7 and so on). I do not want to create a field in the table to record the difference between the fields, I want to do it in the report side.
My question is:
Is there any way to store those two kind of data (decimal and datetime) to make it simple to do comparisons later? Cause if I have it in string type I'll have to convert it two times - one before create the record in DB and the other to see what is the difference between them.
I believe the best approach would be what I like to call the never delete, never update approach.
Basically, you add a column to your source table for the record status, that can be either current, historic or deleted (Use a tinyint for that, just be sure to have it linked to a row status table for readability). then instead of deleting a record you update it's status to deleted, and instead of updating it, you change it's status to historic and then insert a new record with the new data.
Naturally, this approach has it's price, since you will have to write an instead of update trigger, but that is a small price to pay comparing to other approaches of keeping history data.
Also, if your primary key is not an identity column, you will need to add this column to your primary key (and any other unique constraints you might have).
You also might want to add a filter to your non-clustered indexes so that they will only index the records where the status is current.

rebuild/refresh my table's PK list - gap in numbers

I have finished all my changes to a database table in sql server management studio 2012, but now I have a large gap between some values due to editing. Is there a way to keep my data, but re-assign all the ID's from 1 up to my last value?
I would like this cleaned up as I populate dropdownlists with these values and then I make interactions with my database with the assumption that my dropdownlist index and the table's ID match up, which is not the case right now.
My current DB has a large gap from 7 to 28, I would like to shift everything from 28 and up, back down to 8, 9, 10, 11, ect... so that my database has NO gaps from 1 and onward.
If the solution is tricky please give me some steps as I am new to SQL.
Thank you!
Yes, there are any number of ways to "close the gaps" in an auto generated sequence. You say you're new to SQL so I'll assume you're also new to relational concepts. Here is my advice to you: don't do it.
The ID field is a surrogate key. There are several aspects of surrogates one must be mindful of when using them, but the one I want to impress upon you is,
-- A surrogate key is used to make the row unique. Other than the guarantee that
-- the value is unique, no other assumptions may be made concerning the value.
-- In particular, no meaning may be derived from the value as to the contents of
-- the row or the row's relationship to any other row.
You have designed your app with a built-in assumption of the value of the key field (that they will be consecutive). Already it is causing you problems. Do you really want to go through this every time you make changes to the table? And suppose a future feature requires you to filter out some of the choices according to an option the user has selected? Or enable the user to specify the order of the items? Not going to be easy. So what is the solution?
You can create an additional (non-visible) field in the dropdown list that contains the key value. When the user makes a selection, use that index to get the key value of the selection and then go out to the database and get whatever additional data you need. This will work if you populate the list from the entire table or just select a few according to some as yet unknown filtering criteria or change the order in any way.
Viola. You never have this problem again, no matter how often you add and remove rows in the table.
However, on the off chance that you are as stubborn as me (not likely!) or just refuse to listen to the melodious voice of reason and experience, then try this:
Create a new table exactly like the old table, including auto incrementing PK.
Populate the new table using a Select from the old table. You can specify any order you want.
Drop the old table.
Rename the new table to the old table name.
You will have to drop and redefine any FKs from other tables. But this entire process
can be placed in a script because if you do this once, you'll probably do it again.
Now all the values are consecutive. Until you edit the table again...
You should refactor the code for your dropdown list and not the PK of the table.
If you do not agree, you can do one of the following:
Insert another column holding the dropdown's "order of appearance", make a unique index on it and fill this by hand (or programmatically).
Replace the SERIAL with an INT would work, make a unique index on the column and fill this by hand (or programmatically).
Remove the large ids and reseed your serial - the code depending on your DBMS
This happens to me all the time. If you don't have any foreign key constraints then it should be an easy fix.
Remember a DELETE statement will remove the record but keep the identity seed the same. (If I remove id # 5 and #5 was the last record inserted then SQL server still stores the identity seed value at "6").
TRUNCATING the table will reset the identity seed back to it's original value.
INSERT_IDENTITY [TABLE] ON can also be used to insert the correct data in the correct order if tuncating cannot happen.
SELECT *
INTO #tempTable
FROM [TableTryingToFix]
TRUNCATE TABLE [TableTryingToFix];
INSERT INTO [TableTryingToFix] (COL1, COL2, COL3, ETC)
SELECT COL1, COL2, COL2, ETC
FROM #tempTable
ORDER BY oldTableID

Check for a field value being updated/changed SQL (Access)

This is certainly a long shot, and is by no means vital to my development requirements, so if there's not a readily available solution the please note; I won't be too upset ;)
I was wondering if there was a way to see if a field value had been changed or updated within a date range in Access.
For example, I have a status field in lets say table1 that may read "active" or "inactive" (simply via validation, no related tables for this field), I would like to see how many records changed from "inactive" to "active" within 30 days.
I have found a solution for timestamping a form update, and if worst comes to worse, I can just amend this to apply to a field, but I would rather be able to search for the value changes than the date the field was last changed.
Again, if this strikes anyone as impossible, then please don't worry yourself too much.
Regards,
Andy
You need to have a change history.
Separate table which stores the key of the row as foreign key, the status and the timestamp. every change inserts new row to the table.
Depending on the technology you are using, the easiest way is to use trigger. The trigger can check if the the field is changed odl.status <> new.status and to insert new row in the history table.
If you do not like to keep history, then only one field in the same table can do the job.
The field can be datetime, and also the trigger can update it when the status is changed.
Timestamp will not do the job because if some other field is changed this field will be changed.
So in this case also the trigger can do the job.
But also depending of the type of the client, the client can detect if the field is changed and update the datetime field.

SQL Server Database - Hidden Fields?

I'm implementing CRUD on my silverlight application, however I don't want to implement the Delete functionality in the traditional way, instead I'd like to set the data to be hidden instead inside the database.
Does anyone know of a way of doing this with an SQL Server Database?
Help greatly appreciated.
You can add another column to the table "deleted" which has value 0 or 1, and display only those records with deleted = 0.
ALTER TABLE TheTable ADD COLUMN deleted BIT NOT NULL DEFAULT 0
You can also create view which takes only undeleted rows.
CREATE VIEW undeleted AS SELECT * FROM TheTable WHERE deleted = 0
And you delete command would look like this:
UPDATE TheTable SET deleted = 1 WHERE id = ...
Extending Lukasz' idea, a datetime column is useful too.
NULL = current
Value = when soft deleted
This adds simple versioning that a bit column can not which may work better
In most situations I would rather archive the deleted rows to an archive table with a delete trigger. This way I can also capture who deleted each row and the deleted rows don't impact my performance. You can then create a view that unions both tables together when you want to include the deleted ones.
You could do as Lukasz Lysik suggests, and have a field that serves as a flag for "deleted" rows, filtering them out when you don't want them showing up. I've used that in a number of applications.
An alternate suggestion would be to add an extra status assignment if there's a pre-existing status code. For example, in a class attendance app we use internally an attendance record could be "Imported", "Registered", "Completed", "Incomplete", etc.* - we added a "Deleted" option for times where there are unintentional duplicates. That way we have a record and we're not just throwing a new column at the problem.
*That is the display name for a numeric code used behind the scenes. Just clarifying. :)
Solution with triggers
If you are friends with DB trigger, then you might consider:
add a DeletedAt and DeletedBy columns to your tables
create a view for each tables (ex: for table Customer have a CustomerView view, which would filter out rows that have DeletedAt not null (idea of gbn with date columns)
all your CRUD operations perform as usual, but not on the Customer table, but on the CustomerView
add INSTEAD OF DELETE trigger that would mark the row as delete instead of physically deleting it.
you may want to do a bit more complex stuff there like ensuring that all FK references to this row are also "logically" deleted in order to still have logical referential integrity
I you choose to use this pattern, I would probably name my tables differently like TCustomer, and views just Customer for clarity of client code.
Be careful with this kind of implementation because soft deletes break referential integrity and you have to enforce integrity in your entities using custom logic.