Composite Foreign Keys in Postgres - sql

I currently am working with a database for a social media application. The database is running on postgresql, and I have ran into a logic issue.
I have two different types of content that can be submitted, topics, and posts. Each of these with their own primary key.
All of these have items can have some media attached to them. In my media table I have the columns content_type_id and content_id where content type is a key in a look up table with the different types of content, and the content_id is the primary key in the table where that particular piece of content is stored.
The issue I have ran into is that I cannot create a foreign key on content_id because depending on content_type it could be referring to one of two tables. Is there a way that I can set up a foreign key to look at the proper table depending on the value of the content_type_id column?

I'm not sure to understand your question, but you have a design problem. If I've interpreted that right, maybe you need a design like this:
but I can't know that if you don't provide your current design.
On this design:
CONTENT_TYPE can be a POST or TOPIC.
MEDIA can have 1 CONTENT_TYPE (POST or TOPIC).
CONTENT_TYPE can be related to N MEDIA.

The issue was resolved. Rather than having each table have it's own sequence for the primary key, a single sequence is used across all the tables, with the entity type lookup table becoming the entity map table mapping the now global id to the type of entity it is (post, topic ect). So that there will no longer be a need for a secondary table to differentiate if the primary key is a post or topic.
For example before when a post is created it is made with a sequental id as a primary key (1,2,3,4...) and when a topic is created the same thing happens sequential keys (1,2,3,4,...).
When media would be stored in the media_table, the media_table would have the issue of duplicate entity keys (both post with id 1 and topic with id 1 have a picture). This was originally designed to be resolved by having an additional column in the media table to differentiate between it being a post or topic.
By having the same sequence used for both posts and topics, they no longer will share any primary keys, so entity type is no longer needed to differentiate between the two, and the primary key in both topic and post will act as a super key pointing to the media table's entity id.

Related

Postgres Foreign Keys on a generalizing link table

I'm doing some work with a system that has many types of entities that may or may not have access to many types of resources. Setting up these tables I have a structure where I set up an Entity_Sequence and a Resource Sequence, and then create a link table for each different entity and each different Resource to associate them with their respective sequence.
For example, I have a Users_Entities_Link table with the columns user_id referencing User, and entity_id, which is a bigint default nextval(Entity_Sequence).
To top this structure off is an Entities_Resources_Access of (entity_id, resource_id) to denote whether the entity has access to the resource. However, given that each of these entities could be related to any one of the entity link tables and the same for resources and each of the resource tables, I'm trying to figure out what the best way to handle the relationship is. This seems like a fairly rare problem, so I couldn't find help elsewhere on it.
The best that I could determine myself was to run an after deletion trigger on each of the Entity or Resource link tables that would check if the entity or resource exists in the access table, but that's a lot of debt to handle when adding in new potential entities or resources.
Is there a better solution to either the structural problem of how to deal with this many entities accessing many resources issue, or how to handle the sequence relationship better? Do I need to add in a dummy entity table and a dummy resource table that each only have an ID for the link and access tables to link foreign keys to? That seems like a lot of wasted space if I have a large quantity of any given entity or resource, and also something that I would have to manually unlink if it floated without anything referencing it on deletion of a row in an associated table (like a user)
Here's how the setup is currently designed:
Table some_entity_entity_link
some_entity_id FK refs some_entity(id)
entity_id not null default nextval(entity_sequence)
Table another_entity_entity_link
another_entity_id FK refs another_entity(id)
entity_id not null default nextval(entity_sequence)
Table some_resource_resource_link
some_resource_id FK refs some_resource(id)
resource_id not null default nextval(resource_sequence)
Table another_resource_resource_link
another_resource_id FK refs another_resource(id)
resource_id not null default nextval(resource_sequence)
Table entities_resources_access
entity_id
resource_id

Oracle SQL optional primary key

I'm having a problem in an oracle SQL project I'm working on:
I have a (weak) entity 'Ticket' that has as primary key: (Customer_id,packet_id,project_id,ticket_id). Customer_id, packet_id and project_id are also foreign keys.
But, a ticket can only be for a packet OR a project, and because they are both in the primary key, they can't be null, while one of them actually will always be null. But I need both because a ticket always belongs to a one of them.
I thought of a possible solution and I thought I could make 1 ID for all products, but then there is another problem, because if I want to implement this product_id as a foreign key, I don't know to which entity I should let it reference to.
Is there a way to make 'optional primary keys' or work with if-statements when creating tables? Or a way to make an optional reference? I've tried if-statements and a case but it didn't work.
A lot will depend on exactly what these entities represent. Since a ticket can be for either a packet or a project, that implies that there is some higher-level entity that combines both packets and projects. That could either be modeled as a new table that is the parent to both PACKET and PROJECT or it could be modeled by combining the PACKET and PROJECT table and adding a TYPE column that differentiates between packets and projects. The TICKET table would then reference either the new parent table or the combined table.
Taking a step back, though, if the TICKET entity has a TICKET_ID, it seems likely that the TICKET_ID alone should be the primary key. It seems unusual that you would have a composite primary key if you have a TICKET_ID column-- do you really want to allow multiple rows in the TICKET table with the same TICKET_ID that are related to different customers, packets, and projects? If TICKET_ID alone is the primary key, then both PACKET_ID and PROJECT_ID can be nullable foreign keys and you can create a CHECK constraint that ensures that exactly one of the two is non-NULL.

Relational database theory and keys

I'm designing a schema to hold player data within a browser based game.
I have three relations. Two of them have at least two candidate keys, however the third has only three attributes: {playerId, message, date}
This relation will hold no unique rows as there is a 1..1:0..* relationship, meaning there can be any number of news tuples for each player. I don't need to be able to uniquely identify any tuple and none of the attributes can actually be a candidate, anyway.
My question is: I understand the relational model states there cannot be duplicate tuples and each relation must have a key. My schema above contradicts both of those constraints but works for my purpose. I know I could simply add an index attribute (like an ID) that is unique, but that seems unnecessary. Am I missing something?
Thanks for your time.
I think what you are missing is a composite primary key.
In your case if you are save to get no dublicate entries you want to use a composite primary key.
But think about the same player sends the same message at the same date....
In this case you will have a conflict with a composite primary key.
A virtual unique id as primary key is a saver way.
Tricky question ! I don't have a clear answer, but i think you may run into trouble if you don't have at least a unicity constraint on the whole tuple : imagine some app runs amok and tries to insert 1.000.000.000 times the same tuple in your table...

SQL: Do you need an auto-incremental primary key for Many-Many tables?

Say you have a Many-Many table between Artists and Fans. When it comes to designing the table, do you design the table like such:
ArtistFans
ArtistFanID (PK)
ArtistID (FK)
UserID (FK)
(ArtistID and UserID will then be contrained with a Unique Constraint
to prevent duplicate data)
Or do you build use a compound PK for the two relevant fields:
ArtistFans
ArtistID (PK)
UserID (PK)
(The need for the separate unique constraint is removed because of the
compound PK)
Are there are any advantages (maybe indexing?) for using the former schema?
ArtistFans
ArtistID (PK)
UserID (PK)
The use of an auto incremental PK has no advantages here, even if the parent tables have them.
I'd also create a "reverse PK" index automatically on (UserID, ArtistID) too: you will need it because you'll query the table by both columns.
Autonumber/ID columns have their place. You'd choose them to improve certain things after the normalisation process based on the physical platform. But not for link tables: if your braindead ORM insists, then change ORMs...
Edit, Oct 2012
It's important to note that you'd still need unique (UserID, ArtistID) and (ArtistID, UserID) indexes. Adding an auto increments just uses more space (in memory, not just on disk) that shouldn't be used
Assuming that you're already a devotee of the surrogate key (you're in good company), there's a case to be made for going all the way.
A key point that is sometimes forgotten is that relationships themselves can have properties. Often it's not enough to state that two things are related; you might have to describe the nature of that relationship. In other words, there's nothing special about a relationship table that says it can only have two columns.
If there's nothing special about these tables, why not treat it like every other table and use a surrogate key? If you do end up having to add properties to the table, you'll thank your lucky presentation layers that you don't have to pass around a compound key just to modify those properties.
I wouldn't even call this a rule of thumb, more of a something-to-consider. In my experience, some slim majority of relationships end up carrying around additional data, essentially becoming entities in themselves, worthy of a surrogate key.
The rub is that adding these keys after the fact can be a pain. Whether the cost of the additional column and index is worth the value of preempting this headache, that really depends on the project.
As for me, once bitten, twice shy – I go for the surrogate key out of the gate.
Even if you create an identity column, it doesn't have to be the primary key.
ArtistFans
ArtistFanId
ArtistId (PK)
UserId (PK)
Identity columns can be useful to relate this relation to other relations. For example, if there was a creator table which specified the person who created the artist-user relation, it could have a foreign key on ArtistFanId, instead of the composite ArtistId+UserId primary key.
Also, identity columns are required (or greatly improve the operation of) certain ORM packages.
I cannot think of any reason to use the first form you list. The compound primary key is fine, and having a separate, artificial primary key (along with the unique contraint you need on the foreign keys) will just take more time to compute and space to store.
The standard way is to use the composite primary key. Adding in a separate autoincrement key is just creating a substitute that is already there using what you have. Proper database normalization patterns would look down on using the autoincrement.
Funny how all answers favor variant 2, so I have to dissent and argue for variant 1 ;)
To answer the question in the title: no, you don't need it. But...
Having an auto-incremental or identity column in every table simplifies your data model so that you know that each of your tables always has a single PK column.
As a consequence, every relation (foreign key) from one table to another always consists of a single column for each table.
Further, if you happen to write some application framework for forms, lists, reports, logging etc you only have to deal with tables with a single PK column, which simplifies the complexity of your framework.
Also, an additional id PK column does not cost you very much in terms of disk space (except for billion-record-plus tables).
Of course, I need to mention one downside: in a grandparent-parent-child relation, child will lose its grandparent information and require a JOIN to retrieve it.
In my opinion, in pure SQL id column is not necessary and should not be used. But for ORM frameworks such as Hibernate, managing many-to-many relations is not simple with compound keys etc., especially if join table have extra columns.
So if I am going to use a ORM framework on the db, I prefer putting an auto-increment id column to that table and a unique constraint to the referencing columns together. And of course, not-null constraint if it is required.
Then I treat the table just like any other table in my project.

How to design a database schema to support tagging with categories?

I am trying to so something like Database Design for Tagging, except each of my tags are grouped into categories.
For example, let's say I have a database about vehicles. Let's say we actually don't know very much about vehicles, so we can't specify the columns all vehicles will have. Therefore we shall "tag" vehicles with information.
1. manufacture: Mercedes
model: SLK32 AMG
convertible: hardtop
2. manufacture: Ford
model: GT90
production phase: prototype
3. manufacture: Mazda
model: MX-5
convertible: softtop
Now as you can see all cars are tagged with their manufacture and model, but the other categories don't all match. Note that a car can only have one of each category. IE. A car can only have one manufacturer.
I want to design a database to support a search for all Mercedes, or to be able to list all manufactures.
My current design is something like this:
vehicles
int vid
String vin
vehicleTags
int vid
int tid
tags
int tid
String tag
int cid
categories
int cid
String category
I have all the right primary and foreign keys in place, except I can't handle the case where each car can only have one manufacturer. Or can I?
Can I add a foreign key constraint to the composite primary key in vehicleTags? IE. Could I add a constraint such that the composite primary key (vid, tid) can only be added to vehicleTags only if there isn't already a row in vehicleTags such that for the same vid, there isn't already a tid in the with the same cid?
My guess is no. I think the solution to this problem is add a cid column to vehicleTags, and make the new composite primary key (vid, cid). It would look like:
vehicleTags
int vid
int cid
int tid
This would prevent a car from having two manufacturers, but now I have duplicated the information that tid is in cid.
What should my schema be?
Tom noticed this problem in my database schema in my previous question, How do you do many to many table outer joins?
EDIT
I know that in the example manufacture should really be a column in the vehicle table, but let's say you can't do that. The example is just an example.
This is yet another variation on the Entity-Attribute-Value design.
A more recognizable EAV table looks like the following:
CREATE TABLE vehicleEAV (
vid INTEGER,
attr_name VARCHAR(20),
attr_value VARCHAR(100),
PRIMARY KEY (vid, attr_name),
FOREIGN KEY (vid) REFERENCES vehicles (vid)
);
Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.
What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER,
tid INTEGER,
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid),
FOREIGN KEY (tid) REFERENCES tags(tid)
);
CREATE TABLE categories (
cid INTEGER PRIMARY KEY,
category VARCHAR(20) -- "attr_name"
);
CREATE TABLE tags (
tid INTEGER PRIMARY KEY,
tag VARCHAR(100) -- "attr_value"
);
If you're going to use the EAV design, you only need the vehicleTags and categories tables.
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER, -- reference to "attr_name" lookup table
tag VARCHAR(100, -- "attr_value"
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid)
);
But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.
How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.
SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.
Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is RDF4J (formerly Sesame).
I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
What you describe are not tags, tags are only values, they do not have an associated key.
Tags are normally implemented as a string column, the value being a list of values delimited.
For example #1, a tag field would contain a value such as:
"manufacture_Mercedes,model_SLK32 AMG,convertible_hardtop"
The user then would normally be able to easily filter entries, by the existence of one or more tags. It is essentially schemaless data from a database perspective. There are downsides to tags, but they also avoid the extreme complications that come from using an EAV model. If you really need an EAV model, it also might be worth considering an attributes field, which contains JSON data. It's more painful to query, but still not as horrible as querying EAV across multiple tables.
I think your solution is to simply add a manufacturer column to your vehicles table. It's an attribute that you know all the vehicles will have (i.e. cars don't spontaneously appear by themselves) and by making it a column in your vehicle table you solve the issue of having one and only one manufacturer for each vehicle. This approach would apply to any attributes that you know will be shared by all vehicles. You can then implement the tagging system for the other attributes that aren't universal.
So taking from your example the vehicle table would be something like:
vehicle
vid
vin
make
model
One way would be to slightly rethink your schema, normalising tag keys away from values:
vehicles
int vid
string vin
tags
int tid
int cid
string key
categories
int cid
string category
vehicleTags
int vid
int tid
string value
Now all you need is a unique constraint on vehicleTags(vid, tid).
Alternatively, there are ways to create constraints beyond simple foreign keys: depending on your database, can you write a custom constraint or an insert/update trigger to enforce vehicle-tag uniqueness?
I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
These days, I almost never end up building a database-backed web app that doesn't involve a full-text indexer. This problem and the general issue of search just come up way too often to omit indexers from your toolbox.