I have a few tables in postgres that refer to each other. I want to set up a mechanism to "archive" rows in one of my tables. That is, I want to still hold onto the data and be able to read from it, but I don't want to be able to edit that row anymore or edit the foreign keys in other tables to reference this now "archived" row.
Is this something that can be achieved? Essentially, I want the rest of the database to act like this row's primary key is no longer there, the same way that if you try to set an invalid foreign key, postgres will throw an error that that key was not found in the referenced table.
Thanks
EDIT:
I don't want to actually archive any of the data. I say "archive" because I can't think of a better way to describe it. Essentially, I just want to be able to change a bool value in a row of the table and then that signals to postgres to no longer allow any changes to that row or use that rows id as a foreign key in any other tables. The only thing that someone should be able to do would be to change that bool back to true and then interact normally.
Add a flag column, either a bool or an enum jf there are multiple states. Postgres won't check this for you, you have to add a where clause to every applicable query.
This is error prone. You can make it safer by defining a view which already has the where clause. Do all queries on that view. Rename the table to something like "table_all" and let the view use the table's name, then all existing queries will Just Work.
Related
I am designing a database to contain a table reference, with a column type that is one of several predefined values (e.g., book, movie, magazine, etc.). I intend the range of possible values to expand over time (e.g. if I realize that I missed the academic_paper type, I want to be able to put that in).
The easiest solution would seem to be to simply store a string representing the type into the table. But this sounds like it would result in a lot of wasted space.
The other solution I thought of is creating a new table reference_types, which the type column references in its foreign key. This seems to have the added benefit of ensuring valid foreign keys (so that I won't accidentally mistype a "magzine" somewhere in my code), possible allow for faster queries for all media of a certain type (since integer comparisons should be much faster than string comparisons), but also slow my application down a bit as joins would be required whenever I need the reference type, and probably complicate logic because of those extra joins.
What are your thoughts on schema design for this problem?
Your second solution is the correct one. Create a secondary table to store your reference types and link them using a foreign key.
For further reading on this subject the search term you'd want to use is 'database normalisation'.
Create the reference_types table. And in your references table use integer and also add a reference_type_name field.
You can query the references table to get the integer key and print its name when needed without performing a join to the other table, and still use that table to perfom other operations, just keep both tables with equal type names.
I know it sonds redundant, but it's really the fastest way to do a simple query by int key and have it all together.
It depends, if you will want to add some other information to reference types, then use the second approach. If not, use the first one because it's faster and the information stored is only a string (you can always select unique to retrieve your types). Read this article for more info.
I'm looking for a way to set up a one to many relationship between 2 tables. The table structures is explained below but I've tried to leave everything off that has nothing to do with the problem.
Table objects has 1 column called uuid.
Table contents has 3 columns called content, object_uuid and timestamp.
The basic idea is to insert a row into objects and get a new uuid from the database. This uuid is then used stored for every row in contents to associate contents with objects.
Now I'm trying to use the database to enforce that:
Each row in contents references a row in objects (a foreign key should do)
No row in objects exists without at least a row in contents
These constraints should be enforced on commit of transactions.
Ordinary triggers can't help probably because when a row in the objects table is written, there can't be a row in contents yet. Postgres does have so called constraint triggers that can be deferred until the end of the transaction. It would be possible to use those but they seem to be some sort of internal construct not intended for everyday use.
Ideas or solutions should be standard SQL (preferred) or work with Postgres (version does not matter). Thanks for any input.
Your main problem is that other than foreign key constraints; no constraint can reference another table.
Your best bet is to denormalize this a little and have a column on object containing the count of contents that reference it. You can create a trigger to keep this up to date.
contents_count INTEGER NOT NULL DEFAULT 0
This won't be as unbreakable unless you put some user security over who can update this column. But if you keep it up to date with a trigger and all you're looking to avoid is accidental corruption, this should be sufficient.
EDIT: As per the comment, CHECK constraints are not deferrable. This solution would raise an error if all the contents are removed even if the intention is to add more in the same transaction.
Maybe what you want to do is normalize a little bit more. You need a third table, that references elements of the other tables. Table objects should have its own uuid and table contents sholud have also its own uuid and no reference to the table objects. The third table should have only the references to the other two tables, but the primary key is the combination of both references.
so for example you have an uuid of the table objects and you want all the contents of that uuid, assuming that the third table has as columns object_uuid and content_uuid, and the table contents has its own serial column named uuid, your query should be like this:
SELECT * FROM thirdtable,contents
WHERE thirdtable.content_uuid = contents.uuid AND thirdtable.object_uuid=34;
Then you can use an on insert trigger on every table
CREATE TRIGGER my_insert_trigger AFTER INSERT OR UPDATE ON contents
FOR EACH ROW EXECUTE PROCEDURE my_check_function();
and then in function my_check_function() delete every row in objects that is not present in the third table. Somebody else answered first while I was answering, if you guys like my solution I could help you to make the my_check_function() function.
Say I have field Ice_Cream.flavor, with the current choices in lookup table Flavor.flavor.
I use Flavor.flavor like an enum() list, storing the value, not the record ID, in Ice_Cream.flavor. If Flavor.flavor changes, I don't want to update Ice_Cream:flavor. I want it to stay as created.
Should I set up Ice_Cream.Flavor as a foreign key, so I can see the source of the values in my ER diagram, or not?
If you want Ice_Cream.flavor to stay as created even if there is no matching record in Flavor (which is what your question sounds like) then you cannot create a FOREIGN KEY relationship, it will not allow that condition to occur in your database.
Furthermore, if you're storing the actual text Flavor.Flavor string in Ice_Cream.Flavor, there's no particular reason to have a separate RecordID column in Flavor.
IMHO, you do not need a FK here except if you have additional informations about a flavor in the Flavor table beside the name in the column flavor. It is the case because you do not keep an ID, you keep the name AND you want to keep the old value.
I also supposed that you do not want to keep old flavors in the Flavor table or elsewhere except in the Ice_Cream table.
Last but not least, a FK would require that any flavor stored in Ice_Cream.flavor exists in the Flavor table. It is not the case if I understand correctly your question.
I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.
Say I'm mapping a simple object to a table that contains duplicate records and I want to allow duplicates in my code. I don't need to update/insert/delete on this table, only display the records.
Is there a way that I can put a fake (generated) ID column in my mapping file to trick NHibernate into thinking the rows are unique? Creating a composite key won't work because there could be duplicates across all of the columns.
If this isn't possible, what is the best way to get around this issue?
Thanks!
Edit: Query seemed to be the way to go
The NHibernate mapping makes the assumption that you're going to want to save changes, hence the requirement for an ID of some kind.
If you're allowed to modify the table, you could add an identity column (SQL Server naming - your database may differ) to autogenerate unique Ids - existing code should be unaffected.
If you're allowed to add to the database, but not to the table, you could try defining a view that includes a RowNumber synthetic (calculated) column, and using that as the data source to load from. Depending on your database vendor (and the products handling of views and indexes) this may face some performance issues.
The other alternative, which I've not tried, would be to map your class to a SQL query instead of a table. IIRC, NHibernate supports having named SQL queries in the mapping file, and you can use those as the "data source" instead of a table or view.
If you're data is read only one simple way we found was to wrapper the query in a view and build the entity off the view, and add a newguid() column, result is something like
SELECT NEWGUID() as ID, * FROM TABLE
ID then becomes your uniquer primary key. As stated above this is only useful for read-only views. As the ID has no relevance after the query.