Does this kind of design come along with overhead or data redundancy?
The structure of tables should remain able to do CRUD on tag, for something like manga/anime tag, allowing specific resources found-able through selection of tags. * representing primary key.
tag (tagID*, tagName)
tagMap (tagSetID*, tagID*)
tagSet (tagSetID*)
announce (announceID*, tagSetID, title, content)
There is nothing at all wrong with your design. Most of the time, we might expect the tagSet table to also maybe have a name column, e.g.
tagSet (tagSetID*, tagSetName)
That you don't have one isn't really an issue. This is really a standard many to many relationship between tags and sets, with the tagMap table serving as the junction table.
Related
Background
I work for a real estate technology company. An upcoming project involves building out functionality to allow users to affix tags/labels (plural) to a MLS listing (real estate property). The second requirement is to allow a user to search by one or more tags. We won't be dealing with keeping track of counts or building word clouds or anything like that.
Solutions Researched
I found this SO Q&A and think the solution is pretty straightforward and have attempted to adapt some ideas from it below. Also, I understand that JSONB support is much better in 9.5 and it may be a possibility. If you have any insight here I'd love to hear your thoughts as well in an answer.
Attempted Solution
Table: Tags
Columns: ID, OwnerID, TagName, CreatedDate
Table: TaggedItems
Columns: ID, TagID (references above), PropertyID, CreatedDate, (Possibly some denormalized data to assist with presenting search results; property name, original listor, etc.)
Inserting new tags should be straightforward. Searching tags should also be straightforward since the user will select one or multiple tags from a searchable dropdown, thus affording me access to the actual TagID which I can use to query the TaggedItems table. When showing the full profile view for a listing, I can use it's PropertyID and the UserID to query my tables for the existence of one or more Tags to display in the view.
Edit: It's probably worth noting that we don't keep an entire database of properties, we access them via an API partner; hence the two table solution and not 3.
If you want to Nth normalize you would actually use 3 tables.
1 Property/Listing
2 Tags
3 CrossReferenceBetween the Two
The 3rd table creates a many to many relationship between the other 2 tables.
In this case only the 3 rd table would carry both the tagid and the property.
Going with 2 tables if fine too depending on how large of use you have as a small string won't bloat your databse too much.
I would say that it is strongly preferable to separate the tags to a separate table when you need to do lookups and more on it. Otherwise you have to have a delimited list which then what happens if a user injects a delimiter into their tag value? Also how do you plan on searching the delimited list? You will constantly expand that to a table or use regex and the regex might give you false positives as "some" will match "some" and "something" depending on how you write your code.......
I am interested in designing the database (well, I'm only concerned about one table really) for a site with the following requirements:
There is an items page, which lists items. items.xyz?id=t displays the item with ID t. I need the IDs of the items to be consecutive. The first item has ID 1, the second ID 2 and so on. Each item page has comments on that item.
There are other pages, such as objects, where objects.xyz?id=t displays the object with ID t. The IDs here need not necessarily be consecutive (and they can overlap with item IDs, but it's ok if you suggest something that forces them not to overlap). These also have comments.
My question is how to design the Comments table? If I have an EntityID in it that represents the page the comment should be displayed on (be it an item page or an object page), then should I make it so that the ItemID never overlaps the ObjectID by making all ObjectID start from, say, 109 and using a GUID table? (The ItemIDs increase very slowly). Is this acceptable practice?
Right now I'm doing it by having a bunch of nullable boolean fields in each comment: IsItem, IsObjectType1, IsObjectType2, ..., which allows me to know where each comment should be displayed. This isn't so bad since I only have a few objects, but it seems like an ugly hack.
What is the best way to go about this?
I see three solutions (assuming it is impossible or undesired to put Pages and Objects in one table). Either:
Tell the comment which it belongs to by giving it two columns: PageId and ObjectId.
That way you can also give these columns foreign keys to the respective tables and add proper indexes.
Introduce a table 'Entity' that has a unique id, a PageId and an ObjectId. Either columns are optional off course, exactly one of them must be filled, not 0 or both.
This way, you move all the potential garbage of having separate entities to this table, not polluting the Comments table, which should contain just comments. You isolate the mess.
Create a link table between Comments and Items and another table between Comments and Objects. Items and Objects are completely unrelated, and you don't have to pollute the Comments table with a lot of NULL values in multiple columns. When you create a comment, you decide if it links to an Item or an Object by inserting a link in either ItemComments or ObjectComments. Reading comments for an item or object is a matter of two simple joins.
The comments table can then contain only a single EntityId that refers to the Id in the Entity table.
The big advantage to this approach is twofold:
a) You can link other things to the same table too, whichout much hassle.
b) You can add other kinds of Entities and they will automatically support Comments and other things you might add, as mentioned in a).
I'm trying to finalize my design of the data model for my project, and am having difficulty figuring out which way to go with it.
I have a table of users, and an undetermined number of attributes that apply to that user. The attributes are in almost every case optional, so null values are allowed. Each of these attributes are one to one for the user. Should I put them on the same table, and keep adding columns when attributes are added (making the user table quite wide), or should I put each attribute on a separate table with a foreign key to the user table.
I have decided against using the EAV model.
Thanks!
Edit
Properties include thing like marital status, gender, age, first and last name, occupation, etc. All are optional.
Tables:
USERS
USER_PREFERENCE_TYPE_CODES
USER_PREFERENCES
USER_PREFERENCES is a many-to-many table, connecting the USERS and USER_PREFERENCE_TYPE_CODES tables. This will allow you to normalize the preference type attribute, while still being flexible to add preferences without needing an ALTER TABLE statement.
Could you give some examples of what kind of properties you'd want to add to the user table? As long as you stay below roughly 50 columns, it shouldn't be a big deal.
How ever, one way would be to split the data:
One table (users) for username, hashed_password, last_login, last_ip, current_ip etc, another table (profiles) for display_name, birth_day etc.
You'd link them either via the same id property or you'd add an user_id column to the other tables.
It depends.
You need to Look at what percentage of users will have that attribute. If the attribute is 'WalkedOnTheMoon' then split it out, if it is 'Sex' include it on the user's table. Also consider the number of columns on the base table, a few, 10-20, won't hurt that much.
If you have several related attributes you could group them into a common table: 'MedicalSchoolId', 'MedicalSpeciality', 'ResidencyHospitalId', etc. could be combined in UserMedical table.
Personally I would decide on whether there are natural groupings of attributes. You might put the most commonly queried in the user table and the others in a separate table with a one-to-one relationship to keep the table from being too wide (we usually call that something like User_Extended). If some of the attributes fall into natural groupings, they may call for a separate table because those attributes will usually be queried together.
In looking at the attributes, examine if some can be combined into one column (for instance if a user cannot simlutaneoulsy be three differnt things (say intern, resident, attending) but only one of them at a time, it is better to have one field and put the data into it rather than three bit fields that have to be transalted. This is especially true if you will need to use a case statement with all three fileds to get the information (say title) that you want in reporting. IN other words look over your attributes and see if they are truly separate or if they can be abstracted into a more general one.
I've been considering three similar database designs, but haven't come up with a very strong for or against regarding any of them.
One huge table for content
content: int id, enum type, int parent_id, varchar title, text body, text data
This would assign each row a type (news, blog, etc), and have fields for common/standard/searchable data, then any non-standard or trivial data is stored as serialized xml in the data field.
One table for ids, many tables for content
ids: int id, enum tableName, int parent_id
This has one large table for ids, then every other table references this id, making it easy to have hierarchical content.
A combination of the two above, where a main table stores all common info, but unimportant data is stored in a respective table.
Naturally it's easier to keep data consistent when everything has its own table, but the above ideas make it much easier to force standardization of common fields, and makes it a lot simpler to relate content to eachother (especially with tagging).
Any thoughts or links would be appreciated.
I think the idea of a master lookup table with secondary tables for actual content is the best solution. Drupal has a similar structure to what you describe and it has proven quite flexible.
http://projects.contentment.org/blog/84
Drupal has a main "node" database in a single table and references specialized tables when getting the actual content.
I'm not a fan of the idea of trying to tuck everything into a table as XML. That could prove to be a performance and flexibility dog over time.
Why not just have different tables for each type of content? News, blog, etc. It seems to me that would be the best and easiest to use option.
Different tables for different types of content. If you ever need to add a new content type or additional features to an existing content type you don't have to worry about breaking the world while you do it.
So, I'm building a website and I'm going to have standard CMS tables like Article, Blog, Poll, etc. I want to let users post comments to any of these items. So my question is, do I need to create separate comments tables for each (e.g. ArticleComment, BlogComment, PollComment), or can I just make a generic Comment table that could be used with any table? What has worked for people?
Method 1: Many Comment Tables
Article {ArticleID [PK], Title, FriendlyUrl}
ArticleComment {ArticleCommendID [PK], ArticleID [FK], Comment}
Blog {BlogID, Title, PubDate, Category}
BlogComment {BlogCommendID [PK], BlogID [FK], Comment}
Poll {PollID, Title, IsClosed}
PollComment {PollCommentID [PK], PollID [FK], Comment}
Method 2: Single Comment Table
Article {ArticleID [PK], Title, FriendlyUrl}
Blog {BlogID, Title, PubDate, Category}
Poll {PollID, Title, IsClosed}
Comment {CommentID [PK], ReferenceID [FK], Comment}
I'd go with the generic comment table. It will make a lot of things much simpler. I'd also tag comments with the ID of the user who created them, or other source-identifying information (IP address, etc.). Even if you don't display this it can be very handy when you have to clean up spam, etc.
There seem to be two major ways of mapping OO-inheritance to relational databases:
Take all the attributes from the parent class and all the child classes and put them in the table, together with a 'which class is this?' field. Each object is serialized as one row in one table.
Create one table for the parent class and one table for each child class. The table for the parent class table contains the 'which class is this?' field. The child class table contains a foreign key pointing to the parent class table. Each object is serialized as one row in the parent class table and one row in the child class table.
Method one doesn't really scale well: it quickly winds up with lots of nullable fields, almost always null, and scary CHECK constraints. But it is fairly simple for small class hierarchies.
Method two scales much better, but is more work. It also results in many more tables in your schema.
I suggest taking a look at method two for your Articles/Polls/Blogs tables — to me, they sound like child tables of a Content or something. You will then have a very clear and easy place to attach comments: to the Content.
Why do you want to keep all of your comments in the same table? Will you be treating all comments as a group? If you don't anticipate working with all of the comments on all items as a single group then there isn't really a reason to bunch them all together. Just because two entities in a database share the same attributes doesn't mean that they should be put in the same physical table.
I'd suggest just one comment table, adding an ItemID field telling which type of item is the comment for:
Article {ArticleID [PK], Title, FriendlyUrl}
Blog {BlogID, Title, PubDate, Category}
Poll {PollID, Title, IsClosed}
Comment {CommentID [PK], ReferenceID [FK], ItemID, Comment}
Item {ItemID, Type}
The last table would contain records such as (1, 'article'), (2, 'blog'), etc.
That way you'll be able to identify which content type each comment was made for.
I am working on a system where we used the following model for comments:
Data Table(s) Many-to-many Assoc Comment Table
CommentableId -> CommentableId/CommentId -> Comment_Id
Not my design, but I like the fexibility. It allows us to use one comment in many
different places. Since this is not trivial to implement in the UI, users don't get to see this feature (just a text box to type in a comment), but it is used when we do batch imports and legacy data processing in the database.