Just a really quick question that I can't seem to find an answer for. I'm making these tables, and I have been told that every table needs to have some form of random unique ID, that is separate to the PK. My question is if I can relate two tables with UID/FK in the same way you would PK/FK.
Is this bad practice? What are the advantages/disadvantages?
Your "mentor" is right in a sense. In migrations and in BI star schema DWH's, it's best to come up with a new UID.
The reason for this: when joining like tables together, it's possible to have primary keys that are matches, or different formats.
Though, as others have said, it isn't necessary. Just best practice when joining data in a BI environment.
Hope that helps.
Related
Let's say I have two Tables, called Person, and Couple, where each Couple record stores a pair of Person id's (also assume that each person is bound to at most another different person).
I am planning to support a lot of queries where I will ask for Person records that are not married yet. Do you guys think it's worthwhile to add a 'partnerId' field to Person? (It would be set to null if that person is not married yet)
I am hesitant to do this because the partnerId field is something that is computable - just go through the Couple table to find out. The performance cost for creating new couple will also increase because I have to do this extra book keeping.
I hope that it doesn't sound like I am asking two different questions here, but I felt that this is relevant. Is it a good/common idea to include extra fields that are redundant (computable/inferable by joining with other tables), but will make your query a lot easier to write and faster?
Thanks!
A better option is to keep the data normalized, and utilize a view (indexed, if supported by your rdbms). This gets you the convenience of dealing with all the relevant fields in one place, without denormalizing your data.
Note: Even if a database doesn't support indexed views, you'll likely still be better off with a view as the indexes on the underlying tables can be utilized.
Is there always a zero to one relationship between Person and Couples? i.e. a person can have zero or one partner? If so then your Couple table is actually redundant, and your new field is a better approach.
The only reason to split Couple off to another table is if one Person can have many partners.
When someone gets a partner you either write one record to the Couple table or update one record in the Person table. I argue that your Couple table is redundant here. You haven't indicated that there is any extra info on the Couple record besides the link, and it appears that there is only ever zero or one Couple record for every Person record.
How about one table?
-- This is psuedo-code, the syntax is not correct, but it should
-- be clear what it's doing
CREATE TABLE Person
(
PersonId int not null
primary key
,PartnerId int null
foreign key references Person (PersonId)
)
With this,
Everyone on the system has a row and a PersonId
If you have a partner, they are listed in the PartnerId column
Unnormalized data is always bad. Denormalized data, now, that can be beneficial under very specific circumstances. The best advice I ever heard on this subject it to first fully normalize your data, assess performance/goals/objectives, and then carefully denormalize only if it's demonstrably worth the extra overhead.
I agree with Nick. Also consider the need for history of the couples. You could use row versioning in the same table, but this doesn't work very well for application databases, works best in a in a DW scenario. A history table in theory would duplicate all the data in the table, not just the relationship. A secondary table would give you this flexibility to add additional information about the relationship including StartDate and EndDate.
Good Morning,
in the design of a database, I have a table (TabA's call it) that could have relationships with four other tables. In the sense that this table can be connected both with the first of four, and with the second, and the third to the fourth, but could not have links with them; or it could have one (with any of the tables), or two links (always with two of any of them), and so on.
The table TabA I added four fields that refer to the four tables which could be "null" when they do not have any connection.
Wondering is this the kind of optimal design (say the four fields in the TabA) or you can make a better design for this type of situation?
Many thanks for your reply.
dave
In answer to the question and clarification in your comment, the answer is that your design can't be improved in terms of the number of foreign key columns. Having a specific foreign key column for every potential foreign key relationship is a best practice design.
However, the schema design itself seems questionable. I don't have enough information to tell whether the "Distributori_[N]_Livello" tables are a truly hierarchical structure or not. If it is, it is often possible to use a self-referential table for hierarchical structures rather than a set of N tables, as the diagram you linked seems to use. If you are able to refactor your design in such a way, it might be possible to reduce the number of foreign key columns required.
Whether this is possible or not is not for me to say given the data provided.
Relationship tables mostly contain two columns: IDTABLE1, and IDTABLE2.
Only thing that seems to change between relationship tables is the names of those two columns, and table name.
Would it be better if we create one table Relationships and in this table we place 3 columns:
TABLE_NAME, IDTABLE1, IDTABLE2, and then use this table for all relationships?
Is this a good/acceptable solution in web/desktop application development? What would be downside of this?
Note:
Thank you all for feedback. I appreciate it.
But, I think you are taking it a bit too far... Every solution works until one point.
As data storage simple text file is good till certain point, than excel is better, than MS Access, than SQL Server, than...
To be honest, I haven't seen any argument that states why this solution is bad for small projects (with DB size of few GB).
It would be a monster of a table; it would also be cumbersome. Performance-wise, such a table would not be a great idea. Also, foreign keys are impossible to add to such a table. I really can't see a lot of advantages to such a solution.
Bad idea.
How would you enforce the foreign keys if IDTABLE1 could contain ids from any table at all?
To achieve acceptable performance on joins without a load of unnecessary IO to bring in completely unrelated rows you would need a composite index with leading column TABLE_NAME that basically ends up partitioning the table into sections anyway.
Obviously even with this pseudo partitioning going on you would still be wasting a lot of space in the table/indexes just repeating the table name for each row.
Isn't it a big IF that you're only going to store the 2 ID fields? If I have a StudentCourse (or better yet Enrollment) table that has StudentID & CourseID, but wouldn't EnrollmentDate go in this table as well since not all students enroll on the first day of class. Seems like a bad idea to add this column to an already bloated table where most records will be null.
The benefit of a single table could be a requirement that the application has the ability to allow user/admin to create these relationships with data (Similar to have a single lookup or reference list table) and avoid having to create a new table to address these User Created References. Needing dynamic querying may benefit as well. An application that requires such dynamic data structure requirements might be better suited for a schemaless or nosql database.
I have a table that must reference another record, but of the same table. Here's an example:
Customer
********
ID
ManagerID (the ID of another customer)
...
I have a bad feeling about doing this. My other idea was to just have a separate table that just stored the relationship.
CustomerRelationship
***************
ID
CustomerID
ManagerID
I feel I may be over complicating such a trivial thing however, I would like to get some idea's on the best approach for this particular scenario?
Thanks.
There's nothing wrong about the first design. The second one, where you have an 'intermediate' table, is used for many-to-many relationships, which i don't think is yours.
BTW, that intermediate table wouldn't have and ID of its own.
Why do you have a "bad feeling" about this? It's perfectly acceptable for a table to reference its own primary key. Introducing a secondary table only increases the complexity of your queries and negatively impacts performance.
Can a Customer have multiple managers? If so, then you need a separate table.
Otherwise, a single table is fine.
You can use the first approach. See also Using Self-Joins
There's absolutely nothing wrong with the first approach, in fact Oracle has included the 'CONNECT BY' extension to SQL since at least version 6 which is intended to directly support this type of hierarchical structure (and possibly makes Oracle worth considering as your database if you are going to be doing a lot of this).
You'll need self-joins in databases which don't have something analogous, but that's also a perfectly fine and standard solution.
As a programmer I like the first approach. I like to have less number of tables. Here we are not even talking of normalization and why do we need more tables? That is just me.
Follow the KISS principle here: Keep it simple, (silly | stupid | stud | [whatever epithet starting with S you prefer]). Go with one table, unless you have a reason to need more.
Note that if the one-to-many/many-to-many relationship ends up being the case, you can extract the existing column into a table of its own, and fill in the new entries at that time.
The only reason I would ever recommend avoiding such self-referecing tables is that SQL Server does have a few spots where there are limitations with self-referencing tables.
For one, if you ever happen to come across the need for an indexed view, then you'd find out that if one of the tables used in a view definition is indeed self-referencing, you won't be able to create a clustered index on your view :-(
But apart from that - the design per se is sound and absolutely valid - go for it! I always like to keep things as simple as possible (but no simpler than that).
Marc
I have a site like SO, Wordpress, etc, where you make a post and u can have (optional) tags against it.
What is a common database schema to handle this? I'm assuming it's a many<->many structure, with three tables.
Anyone have any ideas?
A three table many to many structure should be fine.
Eg. Posts, PostsToTags(post_id,tag_id), Tags
The key is indexing. Make sure you PostsToTags table is indexed both ways (post_id,tag_id and tag_id,post_id) also if read performance is ultra critical you could introduce an indexed view (which could give you post_name, tag_name)
You will of course need indexes on Posts and Tags as well.
"I'm assuming it's a many<->many structure, with three tables. Anyone have any ideas?"
More to the point, there aren't any serious alternatives, are there? Two relational tables in a many-to-many relationship require at least an association table to carry all the combination of foreign keys.
Does SO do this? Who knows. Their data model includes reference counts, and -- for all any knows -- date time stamps and original creator and a lot of other junk about the tag.
Minimally, there have to be three tables.
What they do on SO is hard to know.
I'm not entirely sure if this is what SO uses. But there is a good discussion here.
It would be a good idea to loook at how wordpress handles tags for posts and it will give you some idea.
The other possibility of course is that there are only two tables.
Given there are at most 5 tags, a Question table with five nullable foreign-key references to a Tag table is a possiblity.
Not very normalized, but it could be more performant.