When to use one to many vs many to many in right situation? - sql

i quite confuse when, or not to use one to many vs many to many. ex, user roles. in such situation many to many have advantage in reduce data size cause it just point to integer, maybe it save 1-10bytes each row, ex, senior developer char with id 7, it consume 2 bytes in smallint, instead 16 bytes. but, it makes bloat table. if such situation use many to many. why one to many should exists if many to many have the advantage? is it not always good to many to many?
Users table
id
username
password
Users_Roles table
user_id
role
Versus
Users table
Users_Roles table
user_id
role_id
Roles table
id
role

You're prematurely optimizing. A few integers here and there is unlikely to impact your data size nor your performance. If it does, the schema can be changed later, but usually there is much bigger bloat to be concerned with.
One-to-many vs many-to-many is not an optimizing issue. It's about the relationship between the tables.
If one and only one user can have a role, use one-to-many.
If many users can have the same role, use many-to-many.
For example, if you have an admin role and there can ever only be one admin user, use one-to-many. If there can be many admins, use many-to-many. You have to decide what the relationship is between users and roles.
Note: Use bigints for ids. 4 billion might seem like a lot, but it comes up fast and one of the worst things that can happen is to run out of IDs.

This is a data modeling question, and it's answer comes out of and is dictated by the analysis of the relationships of the entities involved. You have identified 2 entities you want to store data about, users and roles. Now describe their relationship in spoken language terms, looking at the relationship from both directions.
Can a user have more than one role? Can a role be held by more than one user? If the answer to both is yes, than it's a many to many relationship. Take the primary keys of both entities and bring them together as the composite primary key of an associative table. It may not have any attribute unless there is data about the relationship of a user/role itself that needs to be captured.
However, what if you are modeling entities of invoices and line items? Can an invoice have more than one line item? Yes. Can an instance of a line item on an invoice belong to more than one invoice? No (note I'm modeling a line item, not a product or part number as a line item could include special pricing for this invoice, color, logo, etc). So this is clearly a one to many relationship in the direction of one invoice can have many line items.
For more information, do some searching on data modeling, it will be a huge help in your database design efforts and you will end up with a better design for more efficient queries by designing the database correctly.
Looks like Schwern and I were typing at the same time :-)

Related

What is the optimal relational database design for storing an unknown number of similar but unique entities

The database we are designing allows users to authenticate with multiple 3rd party services, mostly social media (twitter, facebook, etc). There will be an unknown and growing number of these services. Each service requires a unique set of data for authentication that is not standard with the other services.
One user may authenticate many services, but they may only authenticate with one of each type of service.
Possible Solutions:
A) The most direct solution to this issue is to simply add a column for each service to the user table which contains the JSON authentication for that service. However, this violates normalization by leaving a large number of nulls in the database. What happens when there are 50 of these integrations for instance?
B) Each service gets its own table in the database. JSON is no longer needed as each field can be properly described. Then a lookup table is needed "user_has_service" for each service. This is a table which contains only two foreign keys, one for the user and one for the service, linking them together. This option seems the most correct but is very inefficient and will take many operations to determine what services a user has, increasing with the number of services. I believe also in this case, the ID field for the lookup table would need to be some kind of hash of the user and service together so that duplicate inserts are not possible.
Not at all a database expert and I have been grappling with this one for quite a while. Any thoughts?
A) The most direct solution ... JSON
You are right, option A is grossly incorrect. It breaks Codds' First Normal Form, thus it is not Relational. NULL in the database is an indication of incomplete Normalisation, which leads to complex SQL code. To be avoided at all costs.
similar but unique
To be clear, that they are unique to the Service is true. That {LoginName; UserName; Email; UserId; etc} are all similar is true in the implementation sense only, not in the data.
I may need to sketch this out.
That is a great idea. A visual data model is far more effective, because (a) the mind can comprehend it much better than text, and (b) therefore work out details; contradictions; missing bits; etc. Much easier to progress each iteration visually, than with text.
Second, we have had visual modelling tools since 1987 (1984 for a closed group), which have been made a Standard in 1993. Hopefully you appreciate that a standard-compliant model is better than a home-grown or corporate-supplied one. It displays all technical details rather than a small subset.
Is there a name for this strategy
It is plain old Relational Data Modelling, which includes Normalisation (ensuring compliance with Codd's Normal Forms, as opposed to the insanity of implementing the NFs is fragmented progressive steps).
Obstacle
One problem that needs to be understood and eliminated is this. The "theoreticians" market and propagate 1960's Record Filing Systems under the banner of "relational". That is characterised by a Record IDs in every file. That method ensures the database remains physical, not logical, the very thing that Codd overcame with his Relational Model: a database that is logical and therefore extremely easy to navigate, by any querying party, current; planned; or unplanned.
The essential difference between 1960's RFS and post-1970 Relational Databases is:
whereas the RFS maintains references between Files by physical pointer (Record ID), the Relational Database maintains references between Tables by logical Key.
A logical Key is "made up from the data" as per Codd
(A datum that is fabricated by the system is not "made up from the data")
(Use of the SQL command PRIMARY KEY does not magically anoint the datum with the properties and qualities of a Relational Key: if you use PRIMARY KEY RecordID you are in 1960's physical paradigm, not the post-1970 Relational paradigm)
Logical Keys provide Relational Integrity (as distinct from Referential Integrity, which is an ordinary function of SQL), which is far superior to that obtained by 1960's RFS
As well as far superior Speed and Power (far less JOINs, and smaller sets)
Relational Database
Therefore I will give you the answer as a Relational Data Model, as per Codd.
Just one example of Relational Integrity:
the ServiceProperty FK elements in UserServiceProperty is constrained to PK (particular combination) in ServiceProperty
a UserServiceProperty row with Facebook.Email is prevented
A Record ID based 1960's RFS that the "theoreticians" promote as "relational" cannot do that, various errors such as that one are allowed.
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993
My IDEF1X Introduction is essential reading for beginners.
The IDEF1X Anatomy is a refresher for those who have lapsed.
If you have trouble reading the Predicates directly from the Data Model, let me know and I will produce them in text form.
Please feel free to ask questions, the more specific the better.
You could set up:
a referential table called services to list all the available services, with columns like service_id (primary key), service_name and descriptions and so on. Each service is represented as one record in this table.
a table called services_properties to store the properties of the services; this table has 3 columns: service_id (foreign key to the primary key of services), property_name and property_value. A unique constraint can be set up on service_id/propery_value tuples to avoid duplicates. Each service has several records in the services_properties table. This flexible structure lets you store as many different properties as needed for each service without creating a new table for each service
a mapping table called user_services, that relates users to services. Columns would be service_id and user_id, as foreign keys to the primary keys of the services table and users table. You can query this table to easily list the services subscribed by each user.

SQL - Contacts, Companies DB Design

Im working on a db to manage customer data for a small company.The customers are companies and institutions (schools..etc) and of course people/contacts. There will be a lot more scope added in time, but right now I'm looking for any input on the core design itself and if there's anything I'm missing here that could cause issues down the road. The image doesn't include the additional lookup tables for items like; country, teltype..etc. I'm kinda worried that I've over-normalised it and it is going to make the queries much more complicated in the long-term. Any input appreciated.
Update - 13/12/2016
I have since created a superclass in my structure called entity, which helps me merge all 3 into one as such. I'm still working on the rest as it has grown quite a lot today, so again any input is appreciated.
The first impression I get looking at the diagram is that you have over-normalised the data (unless that was your aim).
Consider the Company <-> Telephone relationship you have created:-
Creating a relationship like this reads:
A Company can have one to many Telephone Numbers
A Telephone Number can belong to one to many Companies
Evaluating this for a minute; is it likely that a telephone number is shared by more than one Company in your structure? (real-world suggests it wouldn't)
Expanding upon this, I believe the main reason you may have headed down this course would be to allow the same telephone number to apply to one or many contacts as well as a business?
Personally, in my experience, I would suggest a duplication of data (telephone number) maybe easier to maintain and manage from a development perspective. This will make you data structure and application logic less complex, and should make searching less taxing on the system.
However, it will also mean you could end up with stale data, for example, if all of your contacts used the company phone number and the company number was updated, all of the contacts data would now need updating too.
One way round that from an application perspective would be to display the company number with a company contact, then you would not need to duplicate data.
Here is an example of a de-normalised view of this relationship:
You could also apply this to email addresses, where the same concept applies.
Do you need to have bridge tables for telephone, email, and location? If there is no need to have multiple sites, e-mails, or telephone numbers; you can add the attributes to the primary entity.

Database Design - Linking two users

I need some help with some database design. I am a FE developer by trade and have only dealt with very basic DBs. I am just starting to branch out into more "advanced" web apps and would like some pointers in the right direction for the schema.
What I am looking for is an account system that can basically link two accounts. I will give you the scenario I had imagined off the top of my head.
A user signs up in a regular way, just providing name, email, password for simplicity of this question. After they have signed up, the user can then link their account to another user by entering the others email and having it accepted by the other user.
Once this link has been created, the two users can CRUD tasks together.
The bit I am struggling with is how to create the link between the two users. I obviously have my users table.
USERS:
id
name
email
password
Now, I believe I need to create another table that holds the two linked accounts, that has its own unique ID that we can use to CRUD tasks. Something like:
LINKED_USERS:
id
user1id
user2id
verified
TASKS
id
lu_id (FK, Linked_Users id)
// Any other fields for the two combined here.
Is this correct? If so, how would I setup the relationships between the users table and the linked_users table? This is the bit that is confusing me because I need the relationship to reference two users IDs. Say I wanted to display user1id and user2id names, how would the relationship work? Just really need a bit of help wrapping my head around this.
I hope this makes sense, if you need any more information I will just edit the question.
Thanks for any help in advance!
Your question in not entirely clear as to the requirements. My design assumes the following about requirements:
People are linked together in pairs
Each pair owns zero, one, or more task records.
Each person can be assigned to zero, one, or more pairs. If not currently, then perhaps over time (past pairs, current pairs, future pairs).
I think your confusion revolves around the pairing. Instead think of it as teams. The fact that a team can have at most two people is beside the point; 2, 10, 100 does not matter because any number is handled the same way. That way is a Team table that has members assigned. Each person can belong to one or more teams, and each team can have one or more members. That means we have a Many-To-Many relationship between Person and Team. A many-to-many is a problem in relational design that is always solved by adding a third intermediate or "bridge" table. In this case, that bridge table is membership_.
Each team owns zero, one, or more tasks. Each task is owned by one and only one team. This is a simple One-To-Many relationship between Team and Task.
If these assumptions and constraints are correct, then you would have the following table design in a relational database such as Postgres.
I added a start_ and stop_ pair of fields on membership_ to show the idea that people may have past, present, or future assignments to teams.

archiving strategies and limitations of data in a table

Environment: Jboss, Mysql, JPA, Hibernate
Our web application will be catering to a large amount of users (~ 1,000,000) and there are a lots of child table where user specific data are stored (e.g. personal, health, forum contributions ...).
What would be the best practice to archive user & user specific information.
[a] Would it be wise to move the archived user & user specific information to their respective tables within the same database (e.g. user_archive, user_forum_comments_archive ...) OR
[b] Would you just mark the database entries with a flag in the original table(s) and just query only non archived entries.
We have a unique constraint on User.loginid, how do you handle this requirement if the users are archived via 1-[a] (i.e if a user with loginid 'samuel' gets moved into the archive table and if a new user gets added with the same name in the original table, how would you prevent this. What would be the best strategy to address the unique key constraints.
We have a requirement to selectively archive records and bring it back if necessary, will you rely on database tools are would you handle this via your persistence APIs exposed by the JPA entity model.
Personally, I'd go for solution "[a]".
Having things split on two table sets (current and archived) would make things a bit hard to manage in terms of common RDBMS concepts (example: forum comment author would be a foreign key pointing to the user's table... but you can't have a field behave as a foreign key to two different tables).
You could go for a compromise (users table uses solution "a", all the other tables like profile get archived to a twin table like per solution "b") but this would make things unnecessarily complicated for your code (in some cases you have to look at the non-archived, in some to the archived only, in some other cases to the union of both).
Solution A would easily solve #2 and #3 requirements, too. Uniqueness of user name is easy to enforce if everything is in the same table, and resurrecting archived users is just a matter of flipping a bit (Archived=Y/N) on the main user table.
10% is not much, I doubt that the difference in terms of performance would really justify the extra complexity (and risk of bugs).
I would put an archived flag on the table and then create a view to use when you don't want to see archived records. That way people will be more consistent in applying the archive flag I suspect.

Database for microblogging startup

I will do microblogging web service (for school, so don't blast me for lack of new idea) and I worry that DB could be often be overloaded (user could following other users or even tag so I suppouse that SELECT will be heavy - check 20 latest messages which contains all observing tags and user).
My idea is create another table, and store in it only statusID and userID (who should pick up message). Danger of that is, if some tag or user has many followers there will be a lot of record with that status ID. So, is it good idea? Or maybe better is used M2M relation? (one status -> many receivers)
I think most databases can easily handle large record sets. The responsibility to have it preform lies in your design with properly setting up the indexes. If you create the right indexes the select clauses should perform really well.
I'd go with a users table, a table to have the m2m relationship between users and messages table.
You can then do one select to find all of the users a user is following and then a second select in to get all of the messages of interest (sorting and limiting the results as appropriate). Extending this to tagging should be pretty simple.
This design should be fine for large numbers of users and messages as long as you index the right columns. If you got massive then you could also run the users tables and messages tables to different servers or have read only replicates. I wouldn't even worry about that for the moment - you'd need to be huge.
When implementing Collabinate (http://www.collabinate.com), a service-based engine for microblogging and shared activity streams, I used a graph database. The fact that people create posts and follow other people lends itself to a graph structure. With the right relationships and algorithms, this can be a very efficient and performant solution.