Database Design for "common concepts" - sql

In the project I'm developing I've got several "common objects" that span and associate several other tables.
Think, for example, at the object "Comment". It should be applicable to many kind of different objects: a photo, an action, an event... and it always haves the same structure (author, text, insertion_time, ...)
The first solution I adopted was to have separate tables for each kind of comment: PhotoComments, EventComments and relate those to pertinent objects with a one-to-many relationship with (for example) a photo_id column.
The second (and current one) consist in having a single Comments table (each with its own id) and have as many as needed "many-to-one" support tables to relate those comments with (ie) their photo.
Are there any downsides in having such a design?

If you load the data a comment belongs to, and then look for comments assigned to it, then a single comments, with a one to many works well.
If however you want to find the entity a comment belongs to then one to many tables becomes painful, as you have to look through all the link tables to find what a comment belong to. Of course you could add another column to your comments table to indicate what entity type it belongs to and then you know which link table to go to. From the sounds of things your comments don't belong to multiple entities, which removes that complication.
I'd go with the single comments table (and probably move author to a table of its own, so you can easily see which comments belong to one author, without duplicating the author information in each record)

One downside I can think of comes from table locking.
Say there is a query for a photo comment. Depending on your setup, the table could lock in order to retrieve this photo comment. Then say another query comes for an action comment. If the table is locked down for the photo comment, this new query has to wait for it to complete.
Depending on the size of the table and how often queries are done for data in it, this table could become a performance bottleneck within your schema. If you don't think this could become an issue, then doing a single table can be easier to maintain. However, if there will be a lot of contention for comments, then splitting up the tables will help you out.

Related

How can I create multiple relationships in MS Access at one time?

I feel that this should be a simple question, but I can't seem to find an answer anywhere.
I have an MS Access database where all the key fields have their proper key icon when I view the tables, but no relationships are defined. I need to create relationships between the "UnitID" key field for all the data tables. Some relationships are one-to-one and others are one-to-many (or one to none), but that doesn't matter, I don't need to enforce referential integrity. I just need to query the database, and worked with the query result tables, not add anything or change the data. All the UnitID fields have the same name.
Right now, I am just pulling up the relationship tab and dragging-and-dropping the names for each table, which takes forever. I can use the edit relationships icon that brings up a form, but it still needs to be re-opened for each table.
I am working with a government, publicly downloadable Access database. I realize Access isn't ideal, but that is the format it comes in and the program I'm am supposed to use for my job.
If there is a way to do it in the interface, that would be the best, since I can share it directly with others in my office who are unfamiliar with macros. But I have used VBA before for Excel and know some basic SQL. I've never used macros in Access, so I don't know what their capacities are; can this be done if there is no in-built functionality?
So are you talking about the Relationship Designer Window (Database Tools | Relationships menu option) in MS Access as pictured? With all the tables added, it takes about 5 seconds to click UnitID on one table, drag/drop to UnitID on another table and click Create. I guess it might take an hour or two to do them all?
Why must you have Relationships created at all? They don't define what Queries you can run. And if you don't need Referential Integrity, then I don't see much practical use for them anyhow.
If you can't get your Queries to run, then I would look elsewhere for the root of the problem.
By the way, once you get this problem solved, consider this: you may not need to actually create any Query result Tables if they are used as intermediate results. Since the result of a Query is a Table, then anywhere that the syntax mentions "Table", you can insert a Query. That is, Queries can be nested inside of other Queries. I mention this because you seem to be saying that you need a whole lot of result Tables, which in itself is going to get messy, not to mention that they will take up and lot of space and, worse, will be redundant and will have to recreated whenever your source Tables change (liable to be a maintenance nightmare).

should this be two database tables or one?

I have a database table called interviews and the interviewer and the interviewee will both have to review how the interview went. The review will have similar fields (rating on a scale) but different questions.
Option 1 is to have them both in the same table and have it be 1..N back to the interview table (storing the ID of the writer and the one being reviewed as well). and only limiting which fields can be input at the application level.
Option 2 is to have two tables (one specifically for interviewer reviews and one specifically for interviewee reviews.
What is your opinion of the best way to model this?
Although this is dangerously close to being opinion-based, I have a comment that is too long for comments.
Handling surveys is rather complicated. Surveys change over time because questions are added, removed, and modified and answers are added, removed, and modified. And yet, people often want to use survey questions and track the results over time.
So, the data model for a survey is much more complicated than "one table" or "two tables". There are tables for surveys, questions, answers, and the relationships and values can change over time.
One big table is often a poor choice. If you index properly and write fine tuned queries, they are going to perform fine. Having multiple table can help you in multiple ways like
Access particular data ,
Easy queries etc
Two review tables. Those are TWO bona fide separate entities.
Here's the deal:
Designing a single table that "works" for two different purposes can be done but it's challenging: on the database level, and on your application.
But then... a few months later new requirements come in, that makes it more challenging. You'll need to implement weird logic to keep using one table. Code becomes convoluted, and testing becomes a nightmare.
And then, more changes come in. It becomes unmanageable. At some point you'll realise they were different things from the start that will EVOLVE differently.
Bottom line, it's better to keep them separate from the start to avoid huge cost in the future. Even if they have near-identical columns in the beginning.

Is there a database design pattern name for reducing duplicate join table data?

I have two tables with a join table to allow a many-to-many relationship.
It's a very familiar design pattern. It indicates which Branches each Member has access to.
As the number of members and branches increases I end up with a lot of data in the join table that is duplicated across members. Members tend to have access to the same groups of Branches as other Members.
So I'm looking at normalizing my data by creating a MemberProfile table that is effectively immutable. And rather than creating MemberBranch records for every Member I check for a matching MemberProfile, use if it already exists, or create one if it doesn't:
The idea being if I have a million Members with only a hundred access profiles this will save me a lot of space in my database.
I'm happy that it all works and that the development effort is worth is.
My question is "Is this a standard database design pattern, and if so, what is it called?"
EDIT: It's been pointed out that this is compressing the data not normalizing it. Which is the intent behind the design.
Unless your many:many table is always the join of particular other base tables, one is not normalizing. You aren't normalizing here. Normalization does not introduce new column names. It just rearranges the current ones among different base tables.
You are just compressing/encoding your data. There is not necessarily any benefit in this, since now some queries and updates will be slower although your database is smaller. (You have reported that it is worth it in your case.)
I understand you'd like to put a label on that precise transformation, but unfortunately, there aren't many books that discuss database design or refactoring patterns. One of the few is Martin Fowler's Refactoring Databases, which you may know for his work on analysis patterns (he also has a great blog, worth following!). In that book, Martin presents a bunch of refactoring patterns that can be applied to databases and has put a name on common database transformations, including the one you have presented, which he called Split Table.
Split Table. Vertically split (e.g. by columns) an existing table into one or more tables.
A catalog of the database refactorings presented in that book are available here.
Hi I don't know about a pattern name but I've used the same principle before.
To keep this performing well, introduce a checksum to memberProfile based upon the branches for the profile, this way a lookup for an existing profile is plain easy and fast.
But do remember that the checksum is not necessarily unique, in case of collisions you will still have to check the branches, but only for the profiles sharing the same checksum.
Cleanup can be a scheduled task is is nothing more then deleting the profiles without users.

Store files and comments for a web application. Table design

I have a web appliaction with several entities (tables).Each one has his CRUD pages.
I'd like to add for some the, the ability to add comments and attach files.
I was thinking of two scenarios.
One table for all comments/files - table would have some id for the entity and the particular record.
For each entity a separate comments/files table.
The files would be stored on the disk in a directory.In the table would be the name of the file and some additional info.
In term of application Design having one unique table for all coments seems to make sense. In term of application code that mean the same SQL will be reused for all entities. It's the 'classical way' used by most applications, extending on having the same acitive records and controllers used to handle comments and attachments for all objects.
In term of SQL thesecond solution could be usefull in some databases like MySQL to get more Memory Cache benefit. Every comment/attachmlent added in the 1st solution would drop from the memory cache all requests impacting the comment table. With individual tables a comment on one entity would not invalidate queries on other entities. But you would alos require more file descriptors and a bigger table cache.... so to choose this solution you would need a decision based on real-life, precise, case, where you would be able to compare the benefits in database access speed. And when you will add new entities you'll certainly find your each-entity-have-a-comment-table solution boring, things could have been automated by using 1st solution.
It's a tradeoff. With a single comments table, you get a simple, DRY (don't repeat yourself) schema, but you don't get foreign key constraints and thus no cascade deletion. Thus, if you delete an entity with comments, you must also remember to delete the comments!
If you go with multiple comment tables, you get FK constraints and cascade deletion, but you have a "wet" schema (you are repeating yourself). For example, each comment table might have a commentbody column. If you change that column definition, you have to change it in every comment table!
One interesting solution for a DRY-er schema could involve table inheritance (see http://www.postgresql.org/docs/9.0/interactive/ddl-inherit.html) but please read section 5.8.1. Caveats, as there are some "gotchas" regarding indexing, at least in postgres.
Either way, kudos to you for thinking carefully about your database design!

How do I structure my database so that two tables that constitute the same "element" link to another?

I read up on database structuring and normalization and decided to remodel the database behind my learning thingie to reduce redundancy.
I have different types of entries that can be learned. Gap texts/cloze tests (one text, many gaps) and simple known-unknown (one question, one answer) types.
Now I'm in a bit of a pickle:
gaps need exactly the same columns in the user table as question-answer types
but they need less columns than question-answer types (all that info is in the clozetests table)
I'm wishing for a "magic" foreign key that can point both to the gap and the terms table. Of course their ids would overlap though. I don't like having both a term_id and gap_id in the user_terms, that seems unelegant (but is the most elegant I can come up with after googling for a while, not knowing what name this pickle goes by).
I don't want a user_gaps analogue to user_terms, because then I'd be in the same pickle when it comes to the table user_terms_answers.
I put up this cardboard cutout collage of my schema. I didn't remove the stuff that isn't relevant for this question, but I can do that if anyone's confusion can be remedied like that. I think it looks super tidy already. Tidier than my mental concept of this at least.
Did I say any help would be greatly appreciated? Answerers might find themselves adulated for their wisdom.
Background story if you care, it's not really relevant to the question.
Before remodeling I had them all in one table (because I added the gap texts in a hurry), so that the gap texts were "normal" items without answers, while the gaps where items without questions. The application linked them together.
Edit
I added an answer after SO coughed up some helpful posts. I'm not yet 100% satisfied. I try to write views for common queries to this set up now and again I feel like I'll have to pull application logic for something that is database turf.
As mentioned in the comment, it is hard to answer without knowing the whole story. So, here is a story and a model to match. See if you can adapt this to you example.
School of (foreign) languages offers exams for several levels of language proficiency. The school maintains many pre-made tests for each level of each language (LangLevelTestNo).
Each test contains several (many) questions. Each question can be simple or of the close-text-type. Correct answers are stored for each simple question. Correct terms are stored for each gap of each close-text question.
Student can take an exam for a language level and is presented with one of the pre-made tests. For each student exam, the exam form is maintained which stores students answers for each question of the exam. Like a question, an answer may be of a simple of of a close-text-type.
After editing my question some Stackoverflow started relating the right questions to me.
I knew this was a common problem, but I really couldn't find it, just couldn't come up with the right search terms, I guess.
The following threads address similar problems and I'll try to apply that logic to my own design. They all propose adding a higher-level description for (in my case terms and gaps) like items. That makes sense and reflects the logic behind my application.
Relation Database Design
Foreign Key on multiple columns in one of several tables
Foreign Key refering to primary key across multiple tables
And this good person illustrates how to retrieve the data once it's broken up across tables. He also clues me to the keyword class table inheritance, so now I know what to google.
I'll post back with my edited schema once I've applied this. It does seem more elegant like this.
Edited schema