Background:
I am trying to solve one simple problem. I have a database with two tables, one stores text (this is something like articles), and the other stores the category to which this text belongs. Users can make changes to the text, and I need to save who and when made the changes, also when saving changes, the user writes a comment on his changes, which I also save.
As I have done now:
I added another table to which I save everything related to changes, who made the changes and when, as well as a comment on the changes, and the ID of the text to which the changes apply.
What is the problem:
Deleting the text also needs to be recorded in history, but since in the records with history there is a foreign key with a check, then I have to delete the entire history that is associated with the text, so as not to get an error.
What I have tried else:
I tried to add an attribute to the table with the text "Deleted", and the row is not physically deleted, but the "Deleted" = 1 flag is simply set, and at the same time I can save the history, and even the moment of deletion. But there is another problem, the table with the text has an attribute "Name", which must be unique, and if the record is not physically deleted, then when I try to insert a new record with the value "Name", which already exists, I get a uniqueness error, although the old record with such a name is considered remote.
Question:
What are the approaches to solving the problem, in which it is possible to save the history of changes in another table, even after deleting records from the main table, and at the same time keep the uniqueness of some attributes of the main table and maintain data integrity.
I would be grateful for any options and hints.
A good practice is to use a unique identifier such as a UUID as the primary key for your primary record (ie. your text record). That way, you can safely soft delete the primary record and any associated metadata can be kept without fear of collisions in the future.
If you need to enforce uniqueness of certain attributes (such as the Name you mentioned) you can create a secondary index (non-clustered index in SQL terminology) on that column in the table and then, when performing the soft delete you can set the Name to NULL and record the old Name value in some other column. For SQL Server (since 2008), in order to allow multiple NULL values in a unique index you need to created what they call a filtered index where you explicitly say you want to ignore NULL values.
In other words, you schema would consist of something like this:
a UUID as primary key for the text record
change metadata would have a foreign key relation to text record via the UUID
a Name column with a non-clustered UNIQUE index
a DeletedName column that will store the Name when record is deleted
a Deleted bit column that can be NULL for non-deleted records and set to 1 for deleted
When you do a soft-delete, you would execute an atomic transaction that would:
set the DeletedName = Name
set Name = NULL (so as not to break the UNIQUE index)
mark record as deleted by setting Deleted = 1
There are other ways too but this one seems to be easily achievable based on what you already have.
In my opinion, you can do it in one of two ways:
Using the tables corresponding to the main table, which includes the action field, and using the delete , insert , update trigger of main tables for filling.
ArticlesTable(Id,Name) -> AuditArticlesTable(Id,Name,Action,User,ModifiedDate)
You can use the Filtered unique index (https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-indexes?view=sql-server-ver15) on the “Name” field to solving your issue on adding same name when exists another instance as deleted record
Related
In my design, I have many tables which use FKs. The issue is because certain records will be deleted and re-added at various points of time as they are linked to specific project files, the references will be always be inaccurate if I rely on the traditional auto-incrementing ID (because each time they are re-added they will be given a new ID).
I previously asked a question (Sqlite - composite PK with two auto-incrementing values) as to whether I can create a composite auto-incrementing ID however it appears to not be possible as answered by the question I was linked.
The only automatic value I can think of that'll always be unique and never repeated is a full date value, down to the second - however the idea of using a date for the tables' IDs feels like bad design. So, if I instead place a full date field in every table and use these as the FK reference, am I looking at any potential issues down the line? And am I correct in thinking it would be more efficient to store it as integer rather than a text value?
Thanks for the help
Update
To clarify, I am not looking asking in regards to Primary Keys. The PK will be standard auto-incrementing ID. I am asking in regards to basing hundreds of FKs on dates.
Thank you for the replies below, the difficulty i'm having is I can't find a similar model to learn from. The end result is i'd like the application to use project files (like Word has their docx files) to import data into the database. Once a new project is loaded, the previous project's records are cleared but their data is preserved in the project file (the application's custom file format / a txt file) so they can be added once again. The FKs will all be project-based so they will only be referencing records that exist at the time in the database. For example, as it's a world-building application, let's say a user adds a subject type that would be relevant to any project (e.g. mathematics), due to the form it's entered on in the application, the record is given a_type number of 1, meaning it’s something that persists regardless of the project loaded. Another subject type however may be Demonology which only applies to the specific project loaded (e.g. a fantasy world). A school_subject junction table needs both of these in the same table to reference as the FK. So let’s say Demonology is the second record in the subjects type table, it has an auto-increment value of 2 - thus the junction table records 2 as it’s FK value. The issue is, before this project is re-opened again, the user may have added 10 more subject types that are universal and persist, thus next time the project’s subject type records and school_subject records are added back, Demonology is now given the ID of 11. However, the school_subject junction table is re-recreated with the same record having 2 as its value. This is why I’d like a FK which will always remain the same. I don’t want all projects to be present in the database, because I want users to be able to backup and duplicate individual projects as well know that even if the application is deleted, they can re-download and re-open their project files.
This is a bit long for a comment.
Something seems wrong with your design. When you delete a row in a table, there should be no foreign key references to that key. The entity is gone. Does not exist (as far as the database is concerned). Under most circumstances, you will get an error if you try to delete a row in one table where another row refers to that row using a foreign key reference.
When you insert a row into a table, the database becomes aware of that entity. There should not be references to it.
Hence, you have an unusual situation. It sounds like you have primary keys that represent something in the real world -- such as a social security number or vehicle identification number. If that is the case, you might want this id to be the primary key of the table.
Another option is soft deletion. Once one of these rows is inserted in the table, it cannot be deleted. However, you can set a flag that says that it is deleted. Then, foreign key references can stay to the "soft" deleted row.
I'm learning SQLite from this webiste: SQLite Tutorial.
I was reading the article they had on the AUTOINCREMENT command.
My question had to do with their explanation of why this feature is useful:
The main purpose of using AUTOINCREMENT attribute is…
To prevent SQLite to reuse value that has not been used or from the previously deleted row.
I'm confused about this explanation as it doesn't explain in detail what the implications of this statement is.
Could someone please give more detail about what happens in the background, if this feature is implemented differently for different platforms or specific packaging of the engine in different packages (npm packages etc.).
Also, more importantly, give examples of use cases where using this feature would be necessary and what would be both the proper and improper ways of using it.
Thanks to all!
To prevent SQLite to reuse value that has not been used or from the
previously deleted row.
AUTOINCREMENT property ensure that newly generated id will be unique that will be not from any already used id in that column or should not be from id that has been deleted. It is mostly used in primary key of table where we need unique property which has not been used so far.
In most of relational database, there is AutoIncrement property but in Oracle, I've seen Sequence which similarly acts AutoIncrement property.
For e.g : if you have 10 rows which has AutoIncrement column called id and has value from 1 to 10. Now, you delete all rows and insert new one, then new row will have id = 11 becase 1 to 10 has already been used. You do not need to specify id value as it automatically fills up new row id value by checking previous inserted value.
This feature is usually being used on the table's primary key (I personally prefer to name it ID), like this:
CREATE TABLE MYTABLE(
ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
...
);
If you are learning SQLite, you should know that the table's primary key absolutely mush be unique for each record in this table.
So if you are inserting a record to the table without AUTOINCREMENT on its primary key, the database will force you to specify ID of each new record.
If there are already some records in your table, you may ask yourself a question like "What ID whould I put in the record to ensure that it will be unique?"
This is what AUTOINCREMENT was created for. If AUTOINCREMENT is set on the table's primary key, you don't longer need to specify it when inserting a record, so you don't longer need to think what ID to put there.
Now how does it work. If AUTOINCREMENT is set on the table's primary key, a special number of added records (let's name it as a variable "added") is being stored along with the table's data in the database. When you issue the INSERT command with this table, its ID will be calculated like
added + 1
And the added variable will be incremented (autoINCREMENT)
Initially, added's value is 0.
For example, as Akash KC already said, if 10 records were added to the table, the next record's ID will be 11.
The detail is that AUTOINCREMENT doesn't mind deletions - if you take an empty table, add 10 records to it, delete one of them with ID 5 (for example) and then add a new one, its ID will be 11 as well.
For a project with offline storage, I created a database with a primary key for the id and also a unique field holding the date.
When I delete an entry, I can not create a new one with the same date the deleted entry had.
What can I do to get the date out of the index when I delete that entry?
This is how I create Table:
CREATE TABLE IF NOT EXISTS userdata (id INTEGER PRIMARY KEY ASC, entrydate DATE UNIQUE, strength, comments
I wonder if I need to add something to tell the DB server to allow me to use the same value again as soon it is free again. Maybe I need to run some kind of an update, where SQLite updates its internal records.
I think there are a few possibilities here.
Your delete statement is in an uncommitted transaction. So the unique value hasn't actually been removed from the table before your attempt to insert.
The value you are deleting and the new value you are inserting are not actually the same value. Run a select and make sure the value you are attempting to insert is actually available.
You have a corrupt index and need to reindex it.
I'm looking for a way to set up a one to many relationship between 2 tables. The table structures is explained below but I've tried to leave everything off that has nothing to do with the problem.
Table objects has 1 column called uuid.
Table contents has 3 columns called content, object_uuid and timestamp.
The basic idea is to insert a row into objects and get a new uuid from the database. This uuid is then used stored for every row in contents to associate contents with objects.
Now I'm trying to use the database to enforce that:
Each row in contents references a row in objects (a foreign key should do)
No row in objects exists without at least a row in contents
These constraints should be enforced on commit of transactions.
Ordinary triggers can't help probably because when a row in the objects table is written, there can't be a row in contents yet. Postgres does have so called constraint triggers that can be deferred until the end of the transaction. It would be possible to use those but they seem to be some sort of internal construct not intended for everyday use.
Ideas or solutions should be standard SQL (preferred) or work with Postgres (version does not matter). Thanks for any input.
Your main problem is that other than foreign key constraints; no constraint can reference another table.
Your best bet is to denormalize this a little and have a column on object containing the count of contents that reference it. You can create a trigger to keep this up to date.
contents_count INTEGER NOT NULL DEFAULT 0
This won't be as unbreakable unless you put some user security over who can update this column. But if you keep it up to date with a trigger and all you're looking to avoid is accidental corruption, this should be sufficient.
EDIT: As per the comment, CHECK constraints are not deferrable. This solution would raise an error if all the contents are removed even if the intention is to add more in the same transaction.
Maybe what you want to do is normalize a little bit more. You need a third table, that references elements of the other tables. Table objects should have its own uuid and table contents sholud have also its own uuid and no reference to the table objects. The third table should have only the references to the other two tables, but the primary key is the combination of both references.
so for example you have an uuid of the table objects and you want all the contents of that uuid, assuming that the third table has as columns object_uuid and content_uuid, and the table contents has its own serial column named uuid, your query should be like this:
SELECT * FROM thirdtable,contents
WHERE thirdtable.content_uuid = contents.uuid AND thirdtable.object_uuid=34;
Then you can use an on insert trigger on every table
CREATE TRIGGER my_insert_trigger AFTER INSERT OR UPDATE ON contents
FOR EACH ROW EXECUTE PROCEDURE my_check_function();
and then in function my_check_function() delete every row in objects that is not present in the third table. Somebody else answered first while I was answering, if you guys like my solution I could help you to make the my_check_function() function.
Assume that I know that updating a primary key is bad.
There are other questions which imply that the inserted and updated table records match by position (the first of one matches the first of the other.) Is this a fact or coincidence?
Is there anything that could join the two tables together when the primary key changes on an update?
There is no match of inserted+deleted virtual table row positions.
And no, you can't match rows
Some options:
there is another unique unchanging (for that update) key to link rows
limit to single row actions.
use a stored procedure with the OUTPUT clause to capture before and after keys
INSTEAD OF trigger with OUTPUT clause (TBH not sure if you can do this)
disallow primary key updates (added after comment)
Each table is allowed to have one identity column. Identity columns are not updateable; they are assigned a value when the records are inserted (or when the column is added), and they can never change. If the primary key is updateable, it must not be an identity column. So, either the table has another column which is an identity column, or you can add one to it. There is no rule that says the identity column has to be the primary key. Then in the trigger, rows in inserted and updated that have the same identity value are the same row, and you can support updating the primary key on multiple rows at a time.
Yes -- create an "old_primary_key" field in the table you're updating, and populate it first.
Nothing you can do to match-up the inserted and deleted psuedo table record keys -- even if you store their data in a log table somewhere.
I guess alternatively, you could create a separate log table that tracked changes to primary keys (old and new). This might be more useful than adding a field to the table you're updating as I suggested right at first, as it would allow you to track more than one change for a given record. Just depends on your situation, I guess.
But that said -- before you do anything, please go find a chalk board and write this 100 times:
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
...
:-) (just kidding)