I'm really enjoying the Peewee ORM. It's lightweight, easy to use, and pretty well documented. One thing I'm having trouble grasping is the related_name property used when implementing foreign keys. I'm never sure whether the property should relate to the table, or to the column. Could someone explain to me exactly what I should be using the name for, and how? For example, with the Student/Courses example found in the Peeewee docs themselves.
https://peewee.readthedocs.org/en/2.0.2/peewee/fields.html#implementing-many-to-many
class Student(Model):
name = CharField()
class Course(Model):
name = CharField()
class StudentCourse(Model):
student = ForeignKeyField(Student)
course = ForeignKeyField(Course)
Assuming I have Student, Course, StudentCourse models. What would the related names be for the StudentCourse columns?
I'm never sure whether the property should relate to the table, or to the column. Could someone explain to me exactly what I should be using the name for, and how?
A foreign key is like a pointer, a 1-to-1. But there is also an implied back-reference -- this is the related name. Examples:
Tweet has a foreign key to the User who tweeted it. The back-reference is the tweets created by a user, so related_name='tweets'.
Category has a foreign key to itself to indicate the parent category. The backreference is the child categories for a given parent, so related_name='children'.
Code snippet has a foreign key to the language it's written in. The back-reference is the snippets for a language, so related_name='snippets'.
For example, with the Student/Courses example found in the Peeewee docs themselves.
That is a many-to-many, and so the back-references aren't quite as "clear" because the foreign keys exist on a junction table. Your frame of reference is the junction table, so the back-references would be studentcourses in both cases, though this isn't helpful because the backreferences just take you to the junction table. So, with many-to-many, generally the backreferences can be left at the default since your queries will usually look like:
# get students in english 101
Student.select().join(StudentCourse).join(Course).where(Course.name == 'ENGL 101')
# get huey's courses
Course.select().join(StudentCourse).join(Student).where(Student.name == 'Baby Huey')
Related
I'm having a hard time converting my database tables and foreign keys to a class diagram with classes and associations.
My question is:
"In in a composition relationship, should a child class always should have an ID field?".
In my CD, there are 2 compositor classes: PurchaseItem and PurchaseFinisher, which composite Purchase class. PurchaseItem already comes with an ID field from its table but, PurchaseFinisher doesn't because it is filtered by the id_purchase and id_payment_method foreign keys.
thanks in advance.
This is my DB diagram:
I can't see redundancy in between Purchase or Product, as you said. Could you, please, show me that based on my DB diagram? My tables are well modeled (hope so). My fault is in the classes definition.
In a class diagram, no class requires an id property: each class instance (aka object) has its own identity with or without explicit id property.
In a database, you need of course an explicit id property to uniquely identify the object among others in the database and find it back. By the way, you may annotate such properties with a trailing {id} . UML does not define any semantic for it, but it is in general sufficiently expressive to help database designers.
In the case of composition, the main question is whether a composed object can easily be identified by alternate means. There are several related ORM database techniques, for example:
you can use the owning object’s id together with another property if this is sufficient to identify the element. The two together would make a composite primary key in database.
you can use a unique id to identify the object (surrogate primary key) and use the id of the owning object as foreign key.
For PurchaseItem you have everything that is needed, although the diagram does not tell which of the two approaches you’ll use (e.g is the id unique globally, or unique within the purchase?).
But for PurchaseFinisher it is unclear if you could uniquely identify an occurence. If a payment method can only be used once per purchase, it’s fine as it may be used to identify the object.
If it would be allowed to pay two times the same amount (half of the overall price) in the same currency with the same payment methods, you’d have undistinguishable duplicates. So, some kind of identifier will be needed from the database point of view.
I have UserAccount table and other tables like Employee, Student etc. I want to have an audit like who created a student record or who created a certain employee record. Is it a good practice to have UserAccountId as foreign key in all other tables like Employee, Student etc? I am using hibernate if I mapped like this I have to maintain one to many relationship between UserAccount and All other Classes so code increases and for me that is a burden.
Well it breaks all normalisation rules. Have a link/href table instead. UserAccountID, EmployeeID(NULL), StudentID(NULL). Have one massive linked table like this. The foreign Keys needs to be nullable besides UserAccountID(Primary Key and Foreign Key).
"Good habit/practice" is subjective.
If the business domain includes the fact that the person who created an entity is a meaningful piece of information, and that this is likely to be a regular request by end users, then adding a "createdBy" attribute to your tables/classes is, indeed, good practice.
The best way to know whether this is true is to ask the product owner whether they would need a screen showing "all employees created by user x". If they say "no, only if something goes wrong", you have an audit requirement; if they say "yes, we'll use that regularly", it's an integral part of your business domain.
You may find, that your users want to know not just who created a row, but also who modified it. In that case, there are similar questions on SO.
I need to represent the following information in a data base (simplified):
A set of "Bills". (in a table Bills)
A set of "Representatives". (in a table MPs)
The authors of each bill.
If only "Representatives" could be authors this would be a simple link-table with FOREIGN KEYs to both Bills and MPs.
There is one (only one) other "Entity" that can also be an author of a bill so:
I can repurpose NULL to mean the "Entity" but this is ugly.
I can simply forget about FOREIGN KEY constraints but this is even uglier.
What is the correct way to represent this information according to database theory?
Would it be appropriate to have one table linktable_authors that links Bills and MPs for "Bills" authored by "Representatives", and one other table authored_by_The_Entity only for "Bills" authored by "Entity" (if I am dead sure that no other entities will emerge later in time)?
Bonus question:
What if there was also a table "Other_Entities" and both "Other_Entities" and "Representatives" can be authors of "Bills"?
Define an entity Authors with PK AuthorID. Have the PK of the sub-entities Representatives and OtherEntities be AuthorID, to create a type-of relationship. Add an attribute AuthorType to Authors to distinguish the sub-types from conflicting. Place all attributes common to authors in the Authors entity.
BTW: NULL never means "some other known value". It can sensibly mean "doesn't apply" or "not known yet", amongst a few other choices, but giving it the semantics of "some other known value" is a sure sign of poor design.
If this is a duplicate I am sorry, I tried looking but this is an odd question to word.
I have seen this convention in many databases, but is seems redundant to me. I have found a few answers that say it is to reduce confusion during complex joins, but this doesn't seem like a sufficient reason. If you are making complex joins, make aliases. Do joins really represent such a common task that we should make standard tasks like selects, inserts, and updates redundant?
I don't think there is actually a convention of prefixing column names with the table name.
As Philippe Grondier details, the 'proper' approach to data modelling is to first create a dictionary of data element names. Following the international standard ISO 11179 guidelines:
[Object] [Qualifier] Property RepresentationTerm
you end up with data elements that are fully qualified. Here the qualifier elements Object, Qualifier and sometimes Property are in combination what you consider to be the 'prefix'.
On implementation of the data model in SQL, the table name can provide the context and leads the designer to drop the qualifying terms from the column name. I think this is convention you prefer.**
In other words, in the convention you are questioning it is not that the table name has been prefixed to the column name, rather it is that the qualifying terms have been retained.
** whether or not yours or any other is a good convention is subjective and Stackoverflow is not the place for such discussion. However, I will mention in passing that retaining qualification terms does have a practical benefits (as well as being theoretically sound) e.g. consider that SQL's NATURAL JOIN lends itself to columns that are named consistently throughout the schema.
It is true that such "developped column names" methods are widely used for column naming where, for example, Tbl_Person will have an id_Person primary key column, and a personName text column.
Though it might seem at first quite painfull to write 'developped' column names like "id_Person", "personName", "personAdress", etc, everything gets clearer when you have to write SELECT's on multiple tables, which is something that happens each time you open a form or a report.
There is also a theoretical/historical dimension to this "developped column names" method. First relational databases theories and methods (like MERISE) were proposing, as a first step, to build the so-called "data dictionary", ie the list of all data to be manipulated by the app\database.
This dictionary has to be established even before any "Entity-Relation" model is proposed. data names/descriptions have then to be fully developped, this to avoid confusion between 'similar' data entries, like, for example "companyName" and "personName".
Thus, the "developped column names" convention reflects the fact that, at the data level, similar columns (such as a Company.name and a Person.name columns) are not as equivalent as they seem to be. Though they both look like being here to hold a name, one of them is made to hold a company name, while the other is made to hold a person's name!
This convention can then be considered as a way to reflect the exact meaning of each of the database's column, or to reflect the exact meaning of each entry in the data dictionary.
I've never seen the full table name prefixed, but usually at least an abbreviation. And you're exactly right, it's for simplicity in joins and the like. It's easier to write ur_id all the time than it is to write id sometimes and userrights.id other times, for example. It's not that uncommon to need to access more than one table at a time.
Join is part of a select, so that comparison doesn't hold.
That aside, I don't think you should prefix the field with the table name, except for primary keys. I like to give every table a surrogate key, which I rather name after the table. So the table 'Orders' will get an 'OrderId' PK. An order line will have a foreign key OrderId to point to the order. That way, the field names are the same across tables, and you can tell by the name, which data it presents. You could name the field just 'Id' in all tables, but you do have to read the alias to see which ID you mean. Some queries I wrote are over 400 lines. You don't want to rely on table aliases alone. A little context in the fieldname itself does help.
It's not a convention; some people do it, some people don't. More often I see an ID column prefixed with the table name, but no other columns. Some (all?) DBs also allow prefixing with the table name in queries, but it's neither required, nor part of the actual column name.
In addition to what others said, it is also makes things simpler in the presence of identifying relationships (a.k.a. identifying FOREIGN KEYs).
An identifying relationship "migrates" the parent's primary key into a part of child's primary key. Prefix ensures there will be no collision and you won't need to rename the migrated fields, even when there are multiple levels of identifying relationships. For example:
PARENT:
PARENT_NAME PK
CHILD:
PARENT_NAME PK, FK referencing PARENT
CHILD_NAME PK
GRANDCHILD:
PARENT_NAME PK, FK referencing CHILD
CHILD_NAME PK, FK referencing CHILD
GRANDCHILD_NAME PK
Keeping the same name throughout the whole data model avoids any confusion as to what the field means and where it came from.
On the other hand, prefixing can take a toll on readability, so I usually take a compromise: prefix primary key fields but leave other fields unprefixed.
I dislike such naming conventions. It encourages sloth, specifically the use of unqualified references in queries. Use an alias for each table in your query and qualify each column reference with the appropriate alias.
The only such naming convention I like has to do with primary/foreign keys:
I like to name primary keys something clever, like id.
I like to name prefix the names of foreign key columns with the name of the table containing the primary key.
It makes for much more legible SQL, IMHO. An example:
create table foo
(
id int not null primary key ,
...
)
create table bar
(
id int not null primary key ,
foo_id int not null foreign key references foo (id) ,
...
)
select *
from foo foo
join bar bar on bar.foo_id = foo.id
This scheme falls down, of course, when you get to compound keys. But I like it. YMMV.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In our dev group we have a raging debate regarding the naming convention for Primary and Foreign Keys. There's basically two schools of thought in our group:
1:
Primary Table (Employee)
Primary Key is called ID
Foreign table (Event)
Foreign key is called EmployeeID
or
2:
Primary Table (Employee)
Primary Key is called EmployeeID
Foreign table (Event)
Foreign key is called EmployeeID
I prefer not to duplicate the name of the table in any of the columns (So I prefer option 1 above). Conceptually, it is consistent with a lot of the recommended practices in other languages, where you don't use the name of the object in its property names. I think that naming the foreign key EmployeeID (or Employee_ID might be better) tells the reader that it is the ID column of the Employee Table.
Some others prefer option 2 where you name the primary key prefixed with the table name so that the column name is the same throughout the database. I see that point, but you now can not visually distinguish a primary key from a foreign key.
Also, I think it's redundant to have the table name in the column name, because if you think of the table as an entity and a column as a property or attribute of that entity, you think of it as the ID attribute of the Employee, not the EmployeeID attribute of an employee. I don't go an ask my coworker what his PersonAge or PersonGender is. I ask him what his Age is.
So like I said, it's a raging debate and we go on and on and on about it. I'm interested to get some new perspectives.
If the two columns have the same name in both tables (convention #2), you can use the USING syntax in SQL to save some typing and some boilerplate noise:
SELECT name, address, amount
FROM employees JOIN payroll USING (employee_id)
Another argument in favor of convention #2 is that it's the way the relational model was designed.
The significance of each column is
partially conveyed by labeling it with
the name of the corresponding domain.
It doesn't really matter. I've never run into a system where there is a real difference between choice 1 and choice 2.
Jeff Atwood had a great article a while back on this topic. Basically people debate and argue the most furiously those topics which they cannot be proven wrong on. Or from a different angle, those topics which can only be won through filibuster style endurance based last-man-standing arguments.
Pick one and tell them to focus on issues that actually impact your code.
EDIT: If you want to have fun, have them specify at length why their method is superior for recursive table references.
I think it depends on your how you application is put together. If you use ORM or design your tables to represent objects then option 1 may be for you.
I like to code the database as its own layer. I control everything and the app just calls stored procedures. It is nice to have result sets with complete column names, especially when there are many tables joined and many columns returned. With this stype of application, I like option 2. I really like to see column names match on joins. I've worked on old systems where they didn't match and it was a nightmare,
Have you considered the following?
Primary Table (Employee)
Primary Key is PK_Employee
Foreign table (Event)
Foreign key is called FK_Employee
Neither convention works in all cases, so why have one at all? Use Common sense...
e.g., for self-referencing table, when there are more than one FK column that self-references the same table's PK, you HAVE to violate both "standards", since the two FK columns can't be named the same... e.g., EmployeeTable with EmployeeId PK, SupervisorId FK, MentorId Fk, PartnerId FK, ...
I agree that there is little to choose between them. To me a much more significant thing about either standard is the "standard" part.
If people start 'doing their own thing' they should be strung up by their nethers. IMHO :)
If you are looking at application code, not just database queries, some things seem clear to me:
Table definitions usually directly map to a class that describes one object, so they should be singular. To describe a collection of an object, I usually append "Array" or "List" or "Collection" to the singular name, as it more clearly than use of plurals indicates not only that it is a collection, but what kind of a collection it is. In that view, I see a table name as not the name of the collection, but the name of the type of object of which it is a collection. A DBA who doesn't write application code might miss this point.
The data I deal with often uses "ID" for non-key identification purposes. To eliminate confusion between key "ID"s and non-key "ID"s, for the primary key name, we use "Key" (that's what it is, isn't it?) prefixed with the table name or an abbreviation of the table name. This prefixing (and I reserve this only for the primary key) makes the key name unique, which is especially important because we use variable names that are the same as the database column names, and most classes have a parent, identified by the name of the parent key. This also is needed to make sure that it is not a reserved keyword, which "Key" alone is. To facilitate keeping key variable names consistent, and to provide for programs that do natural joins, foreign keys have the same name as is used in the table in which they are the primary key. I have more than once encountered programs which work much better this way using natural joins. On this last point, I admit a problem with self-referencing tables, which I have used. In this case, I would make an exception to the foreign key naming rule. For example, I would use ManagerKey as a foreign key in the Employee table to point to another record in that table.
The convention we use where I work is pretty close to A, with the exception that we name tables in the plural form (ie, "employees") and use underscores between the table and column name. The benefit of it is that to refer to a column, it's either "employees _ id" or "employees.id", depending on how you want to access it. If you need to specify what table the column is coming from, "employees.employees _ id" is definitely redundant.
I like convention #2 - in researching this topic, and finding this question before posting my own, I ran into the issue where:
I am selecting * from a table with a large number of columns and joining it to a second table that similarly has a large number of columns. Both tables have an "id" column as the primary key, and that means I have to specifically pick out every column (as far as I know) in order to make those two values unique in the result, i.e.:
SELECT table1.id AS parent_id, table2.id AS child_id
Though using convention #2 means I will still have some columns in the result with the same name, I can now specify which id I need (parent or child) and, as Steven Huwig suggested, the USING statement simplifies things further.
I've always used userId as a PK on one table and userId on another table as a FK. 'm seriously thinking about using userIdPK and userIdFK as names to identify one from the other. It will help me to identify PK and FK quickly when looking at the tables and it seems like it will clear up code when using PHP/SQL to access data making it easier to understand. Especially when someone else looks at my code.
I use convention #2. I'm working with a legacy data model now where I don't know what stands for in a given table. Where's the harm in being verbose?
How about naming the foreign key
role_id
where role is the role the referenced entity has relativ to the table at hand. This solves the issue of recursive reference and multiple fks to the same table.
In many cases will be identical to the referenced table name. In this cases it becomes identically to one of your proposals.
In any case havin long arguments is a bad idea
"Where in "employee INNER JOIN order ON order.employee_id = employee.id" is there a need for additional qualification?".
There is no need for additional qualification because the qualification I talked of is already there.
"the reason that a business user refers to Order ID or Employee ID is to provide context, but at a dabase level you already have context because you are refereing to the table".
Pray, tell me, if the column is named 'ID', then how is that "refereing [sic] to the table" done exactly, unless by qualifying this reference to the ID column exactly in the way I talked of ?