Related
I am using a SQLAlchemy database to hold data for a flask application. I would like one column in my database to represent a category (e.g. the possible categories may be A, B or C).
I have seen in documentation that this can be achieved by a simple relationship which relates two tables. One table to hold some live data (inclusive of a category ID and a category) and another table to relate a category id to the associated category. http://flask-sqlalchemy.pocoo.org/2.3/quickstart/#simple-relationships
Would this method be considered good practice for including some kind of "category" column in my database? Or is there a simpler/better way. In this case my aim is to prioritise simplicity while maintaining good practice (don't really need best practice if it entails too much complexity).
Additionally, if my category names will never change, is it bad practice to use a constant list of category names to compare input data with in order to validate it? If so, why?
This is more of an SQL question and it isn't related to Python at all.
Anyways, it is actually better to use a reference table as you first suggested.
In this case, a Category table with one-to-many relationship. This allows you to change category name, and enrich Category with more details (like description) that might become useful in the future.
The other way, using constant list, is considered a bad practice - especially using Enums. You can read more about it in this article: 8 Reasons Why MySQL's ENUM Data Type Is Evil
You can read more about this dilemma here.
Hope it helps.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Let's say I have two tables:
Table: Color
Columns: Id, ColorName, ColorCode
Table: Shape
Columns: Id, ShapeName, VertexList
What should I call the table that maps color to shape?
Table: ???
Columns: ColorId, ShapeId
There are only two hard things in
Computer Science: cache invalidation
and naming things-- Phil Karlton
Coming up with a good name for a table that represents a many-to-many relationship makes the relationship easier to read and understand. Sometimes finding a great name is not trivial but usually it is worth to spend some time thinking about.
An example: Reader and Newspaper.
A Newspaper has many Readers and a Reader has many Newspapers
You could call the relationship NewspaperReader but a name like Subscription might convey better what the table is about.
The name Subscription also is more idiomatic in case you want to map the table to objects later on.
The convention for naming many-to-many tables is a concatenation of the names of both tables that are involved in the relation. ColourShape would be a sensible default in your case. That said, I think Nick D came up with two great suggestions: Style and Texture.
How about ColorShapeMap or Style or Texture.
Interesting about half of the answers give a general term for any table that implements a many-to-many relationship, and the other half of the answers suggest a name for this specific table.
I called these tables intersections tables generally.
In terms of naming conventions, most people give a name that is an amalgam of the two tables in the many-to-many relationship. So in this case, "ColorShape" or "ShapeColor." But I find this looks artificial and awkward.
Joe Celko recommends in his book "SQL Programming Style" to name these tables in some natural language manner. For instance, if a Shape is colored by a Color, then name the table ColoredBy. Then you could have a diagram that more or less reads naturally like this:
Shape <-- ColoredBy --> Color
Conversely, you could say a Color colors a Shape:
Color <-- Colors --> Shape
But this looks like the middle table is the same thing as Color with a plural naming convention. Too confusing.
Probably most clear to use the ColoredBy naming convention. Interesting that using the passive voice makes the naming convention more clear.
Name the table whatever you like, as long as it is informative:
COLOR_SHAPE_XREF
From a model perspective, the table is called a join/corrollary/cross reference table. I've kept the habit of using _XREF at the end to make the relationship obvious.
A mapping table is what this is usually called.
ColorToShape
ColorToShapeMap
This is an Associative Entity and is quite often significant in its own right.
For example, a many to many relationship between TRAINS and TIMES gives rise to a TIMETABLE.
If there's no obvious new entity (such as timetable) then the convention is to run the two words together, giving COLOUR_SHAPE or similar.
I've worked with DBAs that call it a join table.
Colour_Shape is fairly typical - unless the relationship has an explicit domain-specific name.
Junction table
OR Bridge Table
OR Join Table
OR Map Table
OR Link Table
OR Cross-Reference Table
This comes into use when we go for many-to-many relationships where the keys from both the tables forms the composite primary key of the junction table.
I recommend using a combination of the names of entities and put them in the plural. Thus the name of the table will express connection "many-to-many".
In your case:
Color + Shape = ColorsShapes
I usually hear that called a Junction Table. I name the table by what it joins, so in your case either ColorShape, or ShapeColor. I think it makes more sense for a Shape to have a color than for a Color to have a shape, so I would go with ShapeColor.
Intermediate Table or a Join Table
I would name it "ColorShapes" or "ColorShape", depending on your preference
I've also heard the term Associative table used.
a name for your table might be ColorShapeAssociations meaning that each row represents an association between that color and that shape. The existence of a row implies that the color comes in that shape, and that the shape comes in that color. All rows with a specific color would be the set of all shapes the color is associated with, and the rows for a specific shape would be the set of all colors that shape came in...
In general most databases have some sort of naming convention for indexes, primary key and so forth. In PostgreSQL the following naming has been suggested:
primary key: tablename_columnname_pkey
unique constraint: tablename_columnname_key
exclusive constraint: tablename_columnname_excl
index for other purposes: tablename_columnname_idx
foreign key: tablename_columnname_fkey
sequence: tablename_columnname_seq
triggers: tablename_actionname_after|before_trig
Your table is a linked table to me. To stay in line with the naming above I would choose the following:
linked table: tablename1_tablename2_lnk
In a list of table objects the linked table will be after tablename1. This might be visually more appealing. But you could also choose a name that describes the purpose of the link like others have suggested. This might help to keep the name of the id column short (if your link must have its own named id and is referenced in other tables).
or liked table: purposename_lnk
A convention I see a lot for joining tables that I personally like is 'Colour_v_Shape', which I've heard folk refer to colloquially as 'versus tables'.
It makes it very clear at a glance that the table represents a many-to-many relationship, and helps avoid that (albeit rare) confusing situation when you try to concatenate two words that might otherwise form a compound word, for example 'Butter' and 'Milk' may become 'ButterMilk', but what if you also needed to represent an entity called 'Buttermilk'?
Doing it this way, you'd have 'Butter_v_Milk' and 'Buttermilk' - no confusion.
Also, I like to think there's a Foo Fighters reference in the original question.
"Many-Many" table. I'd call it "ColourShape" or vice versa.
I've always been partial to the term "Hamburger Table". Don't know why - it just sounds good.
Oh, and I would call the table ShapeColor or ColorShape depending on which is the more commonly used table.
It's hard to answer something as arbitrary as this, but I tend to prefer tosh's idea of naming it after something in the actual domain instead of some generic description of the underlying relationships.
Quite often this sort of table will evolve into something richer for the domain model and will take on additional attributes above and beyond the linked foreign keys.
For example, what if you need to store a texture in addition to color? It might seem a bit funky to expand the SHAPE_COLOR table to hold its texture.
On the other hand, there's also something to be said for making a well-informed decision based on what requirements you have today and being prepared to refactor when additional requirements are introduced later.
All that said, I would call it SURFACE if I had insight that there would be additional surface-like properties introduced later. If not, I'd have no problems calling it SHAPE_COLOR or something of the sort and moving on to more pressing design problems.
Maybe just ColoredShape?
I'm not sure I get the question. Is this about this specific case or are you looking for general guidelines?
I would name it with the exact names of the tables being joined = ColorShape.
In adiction to what Developer Art has related,
ColorShape
would be a usual naming convention. In ER diagram, it would be a relation.
Call it a cross reference table.
XREF_COLOR_SHAPE
(
XCS_ID INTEGER
C_ID INTEGER
S_ID INTEGER
)
I'd use r_shape_colors or r_shape_color depending on its meaning.
r_ would a replacement for xref_ in this case.
My vote is for a name that describes the table best. In this case it might be ShapeColor but in many cases a name different from a concatenation is better. I like readability and for me, that means no suffixes, no underscores and no prefixes.
I would personally go for Colour_Shape, with the underscore: just because I have seen this convention turn up quite a bit. [but agree with the other posts here that there are probably more 'poetic' ways of doing this].
Bear in mind that the foreign keys should also be built on this join table which would reference both the Colour & Shape tables which would also help with identifying the relationship.
I want to design a table for items.
There are many types of items, all share several fields.
Each type of item has it's own fields.
I want to store the uncommon fields in a separate table.
I thought of something like :
----Items
+Item_id
+Item_Type_Id
+Item_Serial
...
----Item_types
+Item_Type_Id
+Item_Name
...
----Item_Fields
+Item_Field_Id
+Item_Type_Id
+Field_Name
...
----Field_Values
+Field_Value_Id
+Item_Field_Id
+Item_Id
+Value
...
The pro is having the ability to add fields and values without changing the tables.
The con is that i have to transpose the field names and values in order to see all info for an item.
Any better suggestions? Or perhaps a simple (not stored procedure) way to join the tables to get a flat info?
I tried to use PIVOT (I'm using SQL 2005) but with no luck.
Thanks.
I wrote a stored proc to make PIVOT more useful. Here is the source:
http://dot-dash-dot.com/files/pivot_query.sql
and some examples how to use it:
http://dot-dash-dot.com/files/pivot_query_examples.sql
For your data, the query would just be the raw data joining those tables above to produce a raw listing of:
set #myQuery = '
Select Item_Id, Item_Name, Field_Name, Value From ...
';
Then your call to pivot_query would be:
exec pivot_query #myQuery, 'Item_Id, Item_Name', 'Field_Name', 'max(Value)'
like that.
One other option is to store items in XML format in one single field. Depending on your usage scenario, it may work well. Or it may not.
I believe there has to be some grouping of values.
For example lets say your items are objects in a room. Then different types of objects have different attributes. For example books have publication date and number of pages, chairs have color pattern and height, etc.
In this example, you make an item table, a book table and a chair table.
You could make an "additional values" table that holds generic information as above, but what you really want to do is figure out the "types" of the different groups of attributes and then make every one of those types it's own table.
Is there a set of values that all items have? There has to be at least one which is a type field (this describes where the other information is stored. I expect every item will also have a name and a description. This is the information to go in the item table.
Then you make additional tables for the different types itembook, itemchair etc. There may even be some overlap. For example itembook, itemhardback, itempaperback would be 3 tables used to describe books.
I believe this is the best solution to your problem. It will still allow you to extend, but it does put a framework around your data.
Of course there are systems that do it the way you describe, but unless you are building a tool that others are going to reuse for many different projects, it makes sense to design the system for the task at hand. You end up falling into the over designing trap otherwise. (IMHO)
On the other hand, if you are going to go the totally generic direction I suggest you use one of the systems that already exist that work in this way (entity framework, app framework, etc) Use someone else's don't start from scratch.
I'm not too sure how you want to retrieve the info, but something like the below may work. (It's probably close to what Hogan mentioned.)
If you want to retrieve data for a type, you can just JOIN two tables.
If you want to retrieve data for all types (with all fields), you can LEFT JOIN all tables.
----Items
+Item_id
+Item_Type_Id
+Item_Common_Field1
+Item_Common_Field1
...
----Item_Type_A
+Item_id
+Item_Type_A_Specific_Field1
+Item_Type_A_Specific_Field2
...
----Item_Type_B
+Item_id
+Item_Type_B_Specific_Field1
...
If you add these columns to the table, you can make them sparse columns to avoid the space taken by unspecified uncommon fields.
But I would not call this a best practice. (see comments under your question)
I don't want to be accused of being the always-uses-the-latest-useless-technology guy, but depending on your use case, this might be a good case for a nosql database - Tokyo, Mongo, SimpleDB, etc. Or as Developer Art suggested, you could just serialize the different fields into a single column. It's not the worst thing in the world.
This question already has answers here:
Closed 13 years ago.
Exact Duplicate
Table Naming Dilemma: Singular vs. Plural Names
Is it better to use singular or plural database table names ? Is there an accepted standard ?
I've heard arguments for and against it, what do you guys think ?
Singular, so you can have:
Customer
CustomerAddress
CustomerAddressAuditTrail
etc.
IMHO, Table names should be plural like Customers.
Class names should be singular like Customer if it maps to a row in Customers table.
I like singular names but appear to be in the minority.
My personal philosophy is that using a plural database table name is redundant, unless you're only planning for the table to contain one row.
I like to use singular names like Agent that have PK names like AgentID.
But that's just me :o)
I like to use plural forms, simply because one table contains several entities, so it seems more natural to me.
Linq to SQL converts plural form table names to singular when creating data entities. I assume that Microsoft would not have implemented this functionality if they considered plural forms for table names bad practice.
At my current company we use Plural for table names. The reasoning is this: If we have a Customers table we consider each row a Customer, so the table itself is a collection of customers.
Well, obviously your database table names have absolutely got to be named in a "standard" fashion which I will hitherto arbitrarily define.
First, all tables names shall be prefixed with "t_". Following this, the singular entity name in StudlyCaps, e.g. "Customer". Immediately afterwards, this shall contain the number of columns created in the first version of the schema, for historical purposes, followed by an underscore, and the precise normal form of the data; either "1", "2", "3" or "B" for BCNF. Any higher normal forms shall be denoted by a "P".
Some examples of acceptable names are:
t_Customer_6_3
t_Order_5_B
t_OrderLine_4_2
I think my point is, it really doesn't matter, as long as the name is reasonably descriptive and naming is consistent.
The most important thing is to be consistent in your usage. It is annoying to have to remember which tables are plurals and which are not. Same thing with your field names, pick one stadard and use it. Don't make the poor developers have to determine if this table uses person_id or personid or peopleid or person$id, etc. It is amazing the amount of time you can waste when you don't have standards trying just to remember which table uses what.
There is no should or must be this way or that way correct answer to this question. It's up to the designer of the database and software.
As for me, I usually use singular names becouse when I do the E-R diagram I have an entity Customer , not Customers, so I keep it same as to not get confused.
Ofcourse some frameworks do favor one style or another, so you should be best of to follow those practices when you notice them.
There are many arguments for each, but it all boils down to what you feel comfortable with. Neither is wrong.
What's really important is that you are consistent. Choose one standard and stick to it, which one you choose is of less importance.
IMHO it doesn't really matter, just do whatever is comfortable with you and the people that are using the database.
I think I subconsciously list main data tables with an s and "pick list" or foreign key tables and singular.
As with lots of these types of questions the best answer is often "consistent". You can argue the table represents a single entity and as such deserves a singular name, or that it contains multiple instances of an entity so it should be plural. My advice is flip a coin and go with it for the entire database (or stick with the convention that already has a majority).
I have a table in a DB (Postgres based), which acts like a superclass in object-oriented programming. It has a column 'type' which determines, which additional columns should be present in the table (sub-class properties). But I don't want the table to include all possible columns (all properties of all possible types).
So I decided to make a table, containg the 'key' and 'value' columns (i.e. 'filename' = '/file', or 'some_value' = '5'), which contain any possible property of the object, not included in the superclass table. And also made one related table to contain the available 'key' values.
But there is a problem with such architecture - the 'value' column should be of a string data type by default, to be able to contain anything. But I don't think converting to and from strings is a good decision. What is the best way to bypass this limitation?
The design you're experimenting with is a variation of Entity-Attribute-Value, and it comes with a whole lot of problems and inefficiencies. It's not a good solution for what you're doing, except as a last resort.
What could be a better solution is what fallen888 describes: create a "subtype" table for each of your subtypes. This is okay if you have a finite number of subtypes, which sounds like what you have. Then your subtype-specific attributes can have data types, and also a NOT NULL constraint if appropriate, which is impossible if you use the EAV design.
One remaining weakness of the subtype-table design is that you can't enforce that a row exists in the subtype table just because the main row in the superclass table says it should. But that's a milder weakness than those introduced by the EAV design.
edit: Regarding your additional information about comments-to-any-entity, yes this is a pretty common pattern. Beware of a broken solution called "polymorphic association" which is a technique many people use in this situation.
How about this instead... each sub-type gets its own DB table. And the base/super table just has a varchar column that holds the name of the sub-type DB table. Then you can have something like this...
Entity
------
ID
Name
Type
SubTypeName (value of this column will be 'Dog')
Dog
---
VetName
VetNumber
etc
If you don't want your (sub-)table names to be varchar values in the base table, you can also just have a SubType table whose primary key will be in the base table.
The only workaround (while retaining your strucure) is to have separate tables:
create table IntProps(...);
create table StringProps(...);
create table CurrencyProps(...);
But I do not think that this is a good idea...
One common approach is having the key-value table contain multiple columns, one for each data type, i.e. StringValue, DecimalValue, etc.
Just know you're trading queryability and performance for a database schema you don't need to change. You could also consider ORM mapping or an object database.
You could have a per type key/value table. The available table would need to encode the availability of a specific key/type pair to point to the correctly typed key/value table.
This seems like a highly inefficient architecture in for a row based relational databases however.
Perhaps you should take a look at a column oriented relational database?
Thanks for the answers. I'll explain a little bit more specifically what i need.
There's a need to program a blog+forum website, and I've been looking at the WordPress DB structure.
There's a strong need for the ability to place comments to any kind of 'object', like a blog entry, or a video file attachment to it. The above DB structure being very easy to scale and to fulfill all our needs was the reason of its choice.
But that's not late to change it, cause this is in stage of early engineering. Also our model smells now like a completely tree-hierarchy based DB. For now I'll accept Bill Karwin's and fallen888 answers, but maybe I'm going in a totally wrong direction?
about the user being able to add a new field to the table:
I admire all these people making comments.
I used to be interested in this kind of thing a few years ago, but have written little code recently (apart from a little bit of PHP and MYSQL).
I think it's fine if you want to keep going - you may end up with something new.
Sorry to pour any cold water on the scheme - I admire your efforts. My personal belief is that if you go far enough in this direction, you will end up with a system that interprets more of natural language than SQL does. (Around 1970, SQL was actually spelt Sequel, and it actually stood for "structured english query language", but after they standardized it in the 1970's - I think someone said that Oracle was the first commercial implementation, 19079, the "English" got dropped off, because I guess they decided that it was only a tiny subset of English.
I have run out of steam in this area, because I haven't got a job. Without an easy job that pays the bills, where I can experiment with these ideas, it's a bit hard to concentrate on this area.
Best wishes to all.
sorry, I wrote 19079 above, I meant the year 1979. Oracle got their first contract writing a database for the CIA.